.. _api_tutorial: Using the OpenAQ API ==================== The ``openaq`` api is an easy-to-use wrapper built around the `OpenAQ Api `__. Complete API documentation can be found on their website. There are no keys or rate limits (as of March 2017), so working with the API is straight forward. If building a website or app, you may want to just use the python wrapper and interact with the data in json format. However, the rest of this tutorial will assume you are interested in analyzing the data. To get more out of it, I recommend installing ``seaborn`` for manipulating the asthetics of plots, and working with data as DataFrames using ``pandas``. For more information on these, check out the installation section of this documentation. From this point forward, I assume you have at least a basic knowledge of python and matplotlib. This documentation was built using the following versions of all packages: .. code:: ipython3 import pandas as pd import seaborn as sns import matplotlib as mpl import matplotlib.pyplot as plt import openaq import warnings warnings.simplefilter('ignore') %matplotlib inline # Set major seaborn asthetics sns.set("notebook", style='ticks', font_scale=1.0) # Increase the quality of inline plots mpl.rcParams['figure.dpi']= 500 print ("pandas v{}".format(pd.__version__)) print ("matplotlib v{}".format(mpl.__version__)) print ("seaborn v{}".format(sns.__version__)) print ("openaq v{}".format(openaq.__version__)) .. parsed-literal:: pandas v0.21.0 matplotlib v2.1.0 seaborn v0.8.1 openaq v1.1.0 OpenAQ API ---------- The OpenAQ APi has only eight endpoints that we are interested in: - cities: provides a simple listing of cities within the platforms - countries: provides a simple listing of countries within the platform - fetches: providing data about individual fetch operations that are used to populate data in the platform - latest: provides the latest value of each available parameter for every location in the system - locations: provides a list of measurement locations and their meta data - measurements: provides data about individual measurements - parameters: provides a simple listing of parameters within the platform - sources: provides a list of data sources For detailed documentation about each one in the context of this API wrapper, please check out the API documentation. Your First Request ~~~~~~~~~~~~~~~~~~ Real quick, let’s go ahead and initiate an instance of the ``openaq.OpenAQ`` class so we can begin looking at data: .. code:: ipython3 api = openaq.OpenAQ() Cities ~~~~~~ The cities API endpoint lists the cities available within the platform. Results can be subselected by country and paginated to retrieve all results in the database. Let’s start by performing a basic query with an increased limit (so we can get all of them) and return it as a DataFrame: .. code:: ipython3 resp = api.cities(df=True, limit=10000) # display the first 10 rows resp.info() .. parsed-literal:: RangeIndex: 2020 entries, 0 to 2019 Data columns (total 4 columns): city 2020 non-null object count 2020 non-null int64 country 2020 non-null object locations 2020 non-null int64 dtypes: int64(2), object(2) memory usage: 63.2+ KB So we retrieved 1400+ entries from the database. We can then take a look at them: .. code:: ipython3 print (resp.head(10)) .. parsed-literal:: city count country locations 0 Escaldes-Engordany 13324 AD 2 1 unused 314 AD 1 2 Abu Dhabi 477 AE 1 3 Buenos Aires 14976 AR 4 4 Amt der K�rntner Landesregierung 104663 AT 16 5 Austria 121987 AT 174 6 Amt der Ober�sterreichischen Landesregierung 154329 AT 16 7 Amt der Burgenl�ndischen Landesregierung 35608 AT 3 8 Amt der Salzburger Landesregierung 94197 AT 12 9 Amt der Burgenländischen Landesregierung 471 AT 1 Let’s try to find out which ones are in India: .. code:: ipython3 print (resp.query("country == 'IN'")) .. parsed-literal:: city count country locations 841 Mumbai 309477 IN 3 842 Kota 23264 IN 2 843 Delhi 1140259 IN 35 844 Ghaziabad 99087 IN 2 845 Barddhaman 2470 IN 3 846 Asansol 1590 IN 2 847 Lucknow 271912 IN 5 848 Muzaffarpur 116841 IN 1 849 Hyderabad 465962 IN 15 850 Visakhapatnam 208237 IN 8 851 Amritsar 77849 IN 1 852 Bhiwadi 20846 IN 1 853 Kolkata 168542 IN 7 854 Bengaluru 371649 IN 8 855 Howrah 50263 IN 4 856 Navi Mumbai 7725 IN 1 857 Ahmedabad 57714 IN 2 858 Faridabad 113546 IN 2 859 Nashik 76217 IN 4 860 Haldia 115284 IN 2 861 Thiruvananthapuram 46124 IN 2 862 Rohtak 95050 IN 1 863 Medak 2671 IN 1 864 Pune 145450 IN 1 865 Tirupati 159117 IN 4 866 Ajmer 25266 IN 2 867 Vijayawara 34902 IN 1 868 Durgapur 78761 IN 2 869 Jorapokhar 35558 IN 1 870 NOIDA 12061 IN 1 .. ... ... ... ... 873 Gaya 76810 IN 1 874 Chandrapur 232244 IN 2 875 Chennai 290415 IN 4 876 Siliguri 30 IN 2 877 Thane 130025 IN 3 878 Nagpur 72328 IN 5 879 Mandideep 7847 IN 1 880 Patna 75338 IN 1 881 Dhanbad 3 IN 1 882 Aurangabad 113529 IN 1 883 Kanpur 159678 IN 2 884 Moradabad 24595 IN 1 885 Chittoor 2013 IN 1 886 Alwar 14017 IN 1 887 Vijayawada 11624 IN 2 888 Udaipur 25699 IN 1 889 Jaipur 190441 IN 6 890 Ludhiana 72308 IN 1 891 Varanasi 181673 IN 1 892 Pali 23627 IN 2 893 Ujjain 16924 IN 1 894 Singrauli 14636 IN 1 895 Jodhpur 151172 IN 1 896 Agra 84301 IN 1 897 Dewas 11671 IN 1 898 Amaravati 11432 IN 1 899 Mandi Gobindgarh 48579 IN 1 900 Solapur 253940 IN 1 901 Pithampur 12173 IN 1 902 Gurgaon 147910 IN 1 [62 rows x 4 columns] Great! For the rest of the tutorial, we are going to focus on Delhi, India. Why? Well..because there are over 500,000 data points and my personal research is primarily in India. We will also take a look at some :math:`SO_2` data from Hawai’i later on (another great research locale). Countries --------- Similar to the ``cities`` endpoint, the ``countries`` endpoint lists the countries available. The only parameters we have to play with are the limit and page number. If we want to grab them all, we can just up the limit to the maximum (10000). .. code:: ipython3 res = api.countries(limit=10000, df=True) print (res.head()) .. parsed-literal:: cities code count locations name 0 2 AD 13638 3 Andorra 1 1 AR 14976 4 Argentina 2 18 AU 3248142 99 Australia 3 16 AT 1521351 306 Austria 4 1 BH 13820 1 Bahrain Fetches ------- If you are interested in getting information pertaining to the individual data fetch operations, go ahead and use this endpoint. Most people won’t need to use this. This API method does not allow the ``df`` parameter; if you would like it to be added, drop me a message. Otherwise, here is how you can access the json-formatted data: .. code:: ipython3 status, resp = api.fetches(limit=1) # Print out the meta info resp['meta'] .. parsed-literal:: {'found': 92506, 'license': 'CC BY 4.0', 'limit': 1, 'name': 'openaq-api', 'page': 1, 'pages': 92506, 'website': 'https://docs.openaq.org/'} Parameters ---------- The ``parameters`` endpoint will provide a listing off all the parameters available: .. code:: ipython3 res = api.parameters(df=True) print (res) .. parsed-literal:: description id name \ 0 Black Carbon bc BC 1 Carbon Monoxide co CO 2 Nitrogen Dioxide no2 NO2 3 Ozone o3 O3 4 Particulate matter less than 10 micrometers in... pm10 PM10 5 Particulate matter less than 2.5 micrometers i... pm25 PM2.5 6 Sulfur Dioxide so2 SO2 preferredUnit 0 µg/m³ 1 ppm 2 ppm 3 ppm 4 µg/m³ 5 µg/m³ 6 ppm Sources ------- The ``sources`` endpoint will provide a list of the sources where the raw data came from. .. code:: ipython3 res = api.sources(df=True) # Print out the first one res.ix[0] .. parsed-literal:: active True adapter arpalazio city NaN contacts [info@openaq.org] country IT description Air quality data from Lazio region, Italy location NaN name ARPALAZIO organization NaN region Lazio resolution NaN sourceURL http://www.arpalazio.net/ timezone NaN url http://www.arpalazio.net/main/aria/sci/annoinc... Name: 0, dtype: object Locations --------- The ``locations`` endpoint will return the list of measurement locations and their meta data. We can do quite a bit of querying with this one: Let’s see what the data looks like: .. code:: ipython3 res = api.locations(df=True) res.info() .. parsed-literal:: RangeIndex: 100 entries, 0 to 99 Data columns (total 11 columns): city 100 non-null object coordinates.latitude 100 non-null float64 coordinates.longitude 100 non-null float64 count 100 non-null int64 country 100 non-null object firstUpdated 100 non-null datetime64[ns] lastUpdated 100 non-null datetime64[ns] location 100 non-null object parameters 100 non-null object sourceName 100 non-null object sourceNames 100 non-null object dtypes: datetime64[ns](2), float64(2), int64(1), object(6) memory usage: 8.7+ KB .. code:: ipython3 # print out the first one res.ix[0] .. parsed-literal:: city Ulaanbaatar coordinates.latitude 47.9329 coordinates.longitude 106.921 count 294682 country MN firstUpdated 2015-09-01 00:00:00 lastUpdated 2018-01-24 13:15:00 location 100 ail parameters [pm10, no2, so2, o3, co] sourceName Agaar.mn sourceNames [Agaar.mn] Name: 0, dtype: object What if we just want to grab the locations in Delhi? .. code:: ipython3 res = api.locations(city='Delhi', df=True) res.ix[0] .. parsed-literal:: city Delhi coordinates.latitude 28.6508 coordinates.longitude 77.3152 count 102326 country IN distance 6.32199e+06 firstUpdated 2015-06-29 14:30:00 lastUpdated 2017-11-28 10:15:00 location Anand Vihar parameters [pm10, pm25, so2, o3, co, no2] sourceName CPCB sourceNames [Anand Vihar, CPCB] Name: 0, dtype: object What about just figuring out which locations in Delhi have :math:`PM_{2.5}` data? .. code:: ipython3 res = api.locations(city='Delhi', parameter='pm25', df=True) res.ix[0] .. parsed-literal:: city Delhi coordinates.latitude 28.6508 coordinates.longitude 77.3152 count 23891 country IN firstUpdated 2015-06-29 14:30:00 lastUpdated 2017-11-28 10:15:00 location Anand Vihar parameters [pm25] sourceName CPCB sourceNames [CPCB, Anand Vihar] Name: 0, dtype: object Latest ------ Grab the latest data from a location or locations. What was the most recent :math:`PM_{2.5}` data in Delhi? .. code:: ipython3 res = api.latest(city='Delhi', parameter='pm25', df=True) res.head() .. raw:: html
averagingPeriod.unit averagingPeriod.value city country location parameter sourceName unit value
lastUpdated
2017-11-28 10:15:00 hours 0.25 Delhi IN Anand Vihar pm25 CPCB b'\xc2\xb5g/m\xc2\xb3' 70.00
2018-01-24 09:45:00 hours 0.25 Delhi IN Anand Vihar, Delhi - DPCC pm25 CPCB b'\xc2\xb5g/m\xc2\xb3' 160.00
2018-01-24 01:15:00 hours 0.25 Delhi IN Aya Nagar, Delhi - IMD pm25 CPCB b'\xc2\xb5g/m\xc2\xb3' 192.84
2018-01-24 01:15:00 hours 0.25 Delhi IN Burari Crossing, Delhi - IMD pm25 CPCB b'\xc2\xb5g/m\xc2\xb3' 53.45
2018-01-24 01:15:00 hours 0.25 Delhi IN CRRI Mathura Road, Delhi - IMD pm25 CPCB b'\xc2\xb5g/m\xc2\xb3' 185.60
What about the most recent :math:`SO_2` data in Hawii? .. code:: ipython3 res = api.latest(city='Hilo', parameter='so2', df=True) res .. raw:: html
averagingPeriod.unit averagingPeriod.value city country location parameter sourceName unit value
lastUpdated
2018-01-24 06:00:00 hours 1 Hilo US Hawaii Volcanoes NP so2 AirNow ppm 0.000
2018-01-24 06:00:00 hours 1 Hilo US Hilo so2 AirNow ppm 0.001
2018-01-24 06:00:00 hours 1 Hilo US Kona so2 AirNow ppm 0.003
2018-01-24 06:00:00 hours 1 Hilo US Ocean View so2 AirNow ppm 0.002
2018-01-24 06:00:00 hours 1 Hilo US Pahala so2 AirNow ppm 0.021
2017-01-26 17:00:00 hours 1 Hilo US Puna E Station so2 AirNow ppm 0.002
Measurements ------------ Finally, the endpoint we’ve all been waiting for! Measurements allows you to grab all of the dataz! You can query on a whole bunhc of parameters listed in the API documentation. Let’s dive in: Let’s grab the past 10000 data points for :math:`PM_{2.5}` in Delhi: .. code:: ipython3 res = api.measurements(city='Delhi', parameter='pm25', limit=10000, df=True) # Print out the statistics on a per-location basiss res.groupby(['location'])['value'].describe() .. raw:: html
count mean std min 25% 50% 75% max
location
Anand Vihar, Delhi - DPCC 430.0 259.625581 147.207770 49.00 127.0000 217.000 345.0000 644.00
Aya Nagar, Delhi - IMD 372.0 149.792312 63.647824 3.00 106.8725 133.675 185.5775 362.89
Burari Crossing, Delhi - IMD 363.0 130.473774 54.497222 41.82 92.0150 119.260 157.5000 333.67
CRRI Mathura Road, Delhi - IMD 361.0 154.678227 106.200845 2.22 80.3300 141.950 200.2300 842.68
Delhi Technological University, Delhi - CPCB 1095.0 297.557991 154.132360 59.00 167.5000 286.000 401.0000 764.00
IGI Airport Terminal-3, Delhi - IMD 361.0 151.609806 73.086274 5.62 95.0700 143.700 200.3100 375.96
IHBAS, Delhi - CPCB 922.0 122.903471 45.207670 0.00 88.9000 118.400 147.4000 308.30
Income Tax Office, Delhi - CPCB 1095.0 210.021918 94.767962 0.00 141.5000 190.000 277.5000 477.00
Lodhi Road, Delhi - IMD 339.0 160.147699 72.395593 9.69 107.4550 151.380 208.2750 383.10
Mandir Marg, Delhi - DPCC 264.0 198.306818 87.983323 42.00 132.0000 186.000 254.0000 443.00
NSIT Dwarka, Delhi - CPCB 1030.0 217.039612 82.230388 0.00 155.9250 197.600 262.2000 527.10
North Campus, Delhi - IMD 368.0 212.198560 112.838311 0.41 123.1100 190.870 275.0175 633.12
Punjabi Bagh, Delhi - DPCC 318.0 227.411950 116.249854 47.00 125.0000 208.500 314.0000 559.00
Pusa, Delhi - IMD 382.0 125.018351 58.187861 28.51 77.6475 114.430 163.2325 320.82
R K Puram, Delhi - DPCC 400.0 208.190000 98.169711 64.00 139.0000 188.000 245.7500 593.00
Shadipur, Delhi - CPCB 1055.0 165.118104 104.136933 0.20 94.5000 139.000 214.8500 798.70
Sirifort, Delhi - CPCB 439.0 213.398633 98.462672 0.00 145.5000 195.000 277.5000 979.00
US Diplomatic Post: New Delhi 406.0 213.339901 139.081139 -999.00 144.0000 202.000 269.7500 1985.00
Clearly, we should be doing some serious data cleaning ;) Why don’t we go ahead and plot all of these locations on a figure. .. code:: ipython3 fig, ax = plt.subplots(1, figsize=(10, 6)) for group, df in res.groupby('location'): # Query the data to only get positive values and resample to hourly _df = df.query("value >= 0.0").resample('1h').mean() _df.value.plot(ax=ax, label=group) ax.legend(loc='best') ax.set_ylabel("$PM_{2.5}$ [$\mu g m^{-3}$]", fontsize=20) ax.set_xlabel("") sns.despine(offset=5) plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.) plt.show() .. image:: api_files/api_34_0.png Don’t worry too much about how ugly and uninteresting the plot above is…we’ll take care of that in the next tutorial! Let’s go ahead and look at the distribution of :math:`PM_{2.5}` values seen in Delhi by various sensors. This is the same data as above, but viewed in a different way. .. code:: ipython3 fig, ax = plt.subplots(1, figsize=(14,7)) ax = sns.boxplot( x='location', y='value', data=res.query("value >= 0.0"), fliersize=0, palette='deep', ax=ax) ax.set_ylim([0, 750]) ax.set_ylabel("$PM_{2.5}\;[\mu gm^{-3}]$", fontsize=18) ax.set_xlabel("") sns.despine(offset=10) plt.xticks(rotation=90) plt.show() .. image:: api_files/api_36_0.png If we remember from above, there was at least one location where many parameters were measured. Let’s go ahead and look at that location and see if there is any correlation among parameters! .. code:: ipython3 res = api.measurements(city='Delhi', location='Anand Vihar', limit=1000, df=True) # Which params do we have? res.parameter.unique() .. parsed-literal:: array(['o3', 'no2', 'so2', 'pm10', 'pm25'], dtype=object) .. code:: ipython3 df = pd.DataFrame() for u in res.parameter.unique(): _df = res[res['parameter'] == u][['value']] _df.columns = [u] # Merge the dataframes together df = pd.merge(df, _df, left_index=True, right_index=True, how='outer') # Get rid of rows where not all exist df.dropna(how='any', inplace=True) g = sns.PairGrid(df, diag_sharey=False) g.map_lower(sns.kdeplot, cmap='Blues_d') g.map_upper(plt.scatter) g.map_diag(sns.kdeplot, lw=3) plt.show() .. image:: api_files/api_39_0.png For kicks, let’s go ahead and look at a timeseries of :math:`SO_2` data in Hawai’i. Quiz: What do you expect? Did you know that Hawai’i has a huge :math:`SO_2` problem? .. code:: ipython3 res = api.measurements(city='Hilo', parameter='so2', limit=10000, df=True) # Print out the statistics on a per-location basiss res.groupby(['location'])['value'].describe() .. raw:: html
count mean std min 25% 50% 75% max
location
Hawaii Volcanoes NP 355.0 0.007392 0.035338 0.000 0.000 0.000 0.000 0.408
Hilo 438.0 0.002486 0.004487 0.000 0.001 0.001 0.002 0.050
Kona 450.0 0.003273 0.004264 0.001 0.001 0.002 0.004 0.039
Ocean View 455.0 0.011490 0.021762 0.000 0.001 0.004 0.011 0.182
Pahala 422.0 0.039166 0.070299 0.000 0.004 0.010 0.040 0.554
.. code:: ipython3 fig, ax = plt.subplots(1, figsize=(10, 5)) for group, df in res.groupby('location'): # Query the data to only get positive values and resample to hourly _df = df.query("value >= 0.0").resample('6h').mean() # Convert from ppm to ppb _df['value'] *= 1e3 # Multiply the value by 1000 to get from ppm to ppb _df.value.plot(ax=ax, label=group) ax.legend(loc='best') ax.set_ylabel("$SO_2 \; [ppb]$", fontsize=18) ax.set_xlabel("") sns.despine(offset=5) plt.show() .. image:: api_files/api_42_0.png **NOTE:** These values are for 6h means. The local readings can actually get much, much higher (>5 ppm!) when looking at 1min data.