Module importUKAQ

Functions

def get_metadata(source='aurn', force_update=False) ‑> None

get the meta data file for use in the project. Meta data files contain information on the location of different sites as well as the data available and the dates that data acquisition has taken place between. This should not need to be fetched more than once in a session therefore the data is only fetched if it does not exist, unless force_update is set to True. end date of 'ongoing is set to current date and an adidtional ongoing flag column set ratified_to values of 'Never' are set to 1-1-1900 to enable date columns to be handled as datetime objects

Args

source : str, optional
The network required. Defaults to 'aurn'.
force_update : bool, optional
Whether to force an update of the network metadata. Defaults to False.
def get_parameters(source) ‑> list

returns a list of available parameter names. Note not all parameters will be available at all sites

Args

source : str
the source to query

Returns

list
a list of avaiable parameter names
def get_sites(source) ‑> pandas.core.frame.DataFrame

network metadata contains multiple entries for each site as it lists the pollutants measured. This function returns a dataframe the location, and id information only. Note that the presence of a site in the list does nto mean data will be available as the years of operation and the pollutants measuredd at each site differ.

Args

source : str, optional
The source network for the data.

Returns

pd.DataFrame
A data frame of the core locationa ttributes of the sites
def guess_all(year, source, pollutant_list)

for a given source, year range and pollutant list returns the list of sites with data for the given range. For readign hdf file on nly the year is checked not the pollutants

Args

year : list
years to be covered
source : str
data source identifier eg 'aurn'
pollutant_list : list
list of target polutants

Returns

list
list of available sites with required data
def importUKAQ(site,
year,
source,
data_type='hourly',
pollutant='any',
to_narrow=False,
verbose=False,
progress=True,
write_raw_toHDF=False) ‑> pandas.core.frame.DataFrame

Downloads and parses RData files from open air quaity networks into a dataframe. If multiple monitoring sites are specified along with multiple years all years for all sites are downloaded. Also contains options to save raw downloads to an hdf5 format which can be reloaded with teh same function. Which subtype of data is to be loaded can also be defined

Args

site : str or list
A code or list of codes representing the monitoring site(s) eg my1. Alternatively 'all' will get all sites that meet the year and pollutant criteria
If more than one source is being used then the site list will attempt to be coerced in a manner that the sites are linked to teh source.site 1 and site 2 from source 1
and all sites from source2
year : int, str or list
The year(s) to be recovered. Can be a single year, a list of years. Alternatively
a string of eg '2000:2005' will be translated into an inclusive list.
source : str or list
the source of the data, eg 'aurn'. Alternatively set to a local file path to read from local hdf5 file, previously created.
If a list of sources is provided then the site argument must also be a list of the same length (which can be a list of lists if more than one site is needed from each source)
data_type : str, optional
The data type (frequency of observation) to recover. Not all frequencies are available from all sources. Defaults to 'hourly'.
pollutant (str or list, optional) - list of pollutants to . defaults to 'any'be returned. Also used in guessign sites if 'all' sites are specified. A value of 'any' keeps all data. Note all data is downloaded regardless
of this value - the selection takes place after download. Defaults to 'any;
to_narrow (bool, optional). Melt the output dataframe so that the dataframe is returned with a
column identifying the pollutant name and a column containing the corresponding concentration/statistic. Defaults to False
verbose : bool, optional
if True will print which data is being downloaded Defaults to False.
progress : bool, optional
Shows a progress bar for downloads. Defaults to True.
write_raw_toHDF : bool or string, optional
If a string file path is provided then an hdf5 file containing all the downloaded data will

be written locally.Useful to enable rereading of all data without redownloading Defaults to False.

Raises

TypeError
for incorrectly defined sites or year
ValueError
if source or data type not allowed values

Returns

pd.DataFrame
concatenated pandas dataframe of the data_type requested