Module `importUKAQ`

Functions

def get_metadata(source='aurn', force_update=False) ‑> None

get the meta data file for use in the project. Meta data files contain information on the location of different sites as well as the data available and the dates that data acquisition has taken place between. This should not need to be fetched more than once in a session therefore the data is only fetched if it does not exist, unless force_update is set to True. end date of 'ongoing is set to current date and an adidtional ongoing flag column set ratified_to values of 'Never' are set to 1-1-1900 to enable date columns to be handled as datetime objects

Args

source : str, optional: The network required. Defaults to 'aurn'.
force_update : bool, optional: Whether to force an update of the network metadata. Defaults to False.

def get_parameters(source) ‑> list

returns a list of available parameter names. Note not all parameters will be available at all sites

Args

source : str: the source to query

Returns

list: a list of avaiable parameter names

def get_sites(source) ‑> pandas.core.frame.DataFrame

network metadata contains multiple entries for each site as it lists the pollutants measured. This function returns a dataframe the location, and id information only. Note that the presence of a site in the list does nto mean data will be available as the years of operation and the pollutants measuredd at each site differ.

Args

source : str, optional: The source network for the data.

Returns

pd.DataFrame: A data frame of the core locationa ttributes of the sites

def guess_all(year, source, pollutant_list)

for a given source, year range and pollutant list returns the list of sites with data for the given range. For readign hdf file on nly the year is checked not the pollutants

Args

year : list: years to be covered
source : str: data source identifier eg 'aurn'
pollutant_list : list: list of target polutants

Returns

list: list of available sites with required data

def importUKAQ(site, year, source, data_type='hourly', pollutant='any', to_narrow=False, verbose=False, progress=True, write_raw_toHDF=False) ‑> pandas.core.frame.DataFrame

Downloads and parses RData files from open air quaity networks into a dataframe. If multiple monitoring sites are specified along with multiple years all years for all sites are downloaded. Also contains options to save raw downloads to an hdf5 format which can be reloaded with teh same function. Which subtype of data is to be loaded can also be defined

Args

site : str or list: A code or list of codes representing the monitoring site(s) eg my1. Alternatively 'all' will get all sites that meet the year and pollutant criteria
If more than one source is being used then the site list will attempt to be coerced in a manner that the sites are linked to teh source.site 1 and site 2 from source 1
and all sites from source2
year : int, str or list: The year(s) to be recovered. Can be a single year, a list of years. Alternatively
a string of eg '2000:2005' will be translated into an inclusive list.
source : str or list: the source of the data, eg 'aurn'. Alternatively set to a local file path to read from local hdf5 file, previously created.
If a list of sources is provided then the site argument must also be a list of the same length (which can be a list of lists if more than one site is needed from each source)
data_type : str, optional: The data type (frequency of observation) to recover. Not all frequencies are available from all sources. Defaults to 'hourly'.
pollutant (str or list, optional) - list of pollutants to . defaults to 'any'be returned. Also used in guessign sites if 'all' sites are specified. A value of 'any' keeps all data. Note all data is downloaded regardless
of this value - the selection takes place after download. Defaults to 'any;
to_narrow (bool, optional). Melt the output dataframe so that the dataframe is returned with a
column identifying the pollutant name and a column containing the corresponding concentration/statistic. Defaults to False
verbose : bool, optional: if True will print which data is being downloaded Defaults to False.
progress : bool, optional: Shows a progress bar for downloads. Defaults to True.
write_raw_toHDF : bool or string, optional: If a string file path is provided then an hdf5 file containing all the downloaded data will

be written locally.Useful to enable rereading of all data without redownloading Defaults to False.

Raises

TypeError: for incorrectly defined sites or year
ValueError: if source or data type not allowed values

Returns

pd.DataFrame: concatenated pandas dataframe of the data_type requested