Plotting Functions#

Atmospy is a library for making useful, professional figures for atmospheric chemistry and air quality professionals. It is built on top of matplotlib, pandas, and seaborn. Below, we will walk through examples of how to use atmospy and how you can extend it with seaborn and matplotlib.

import atmospy
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

atmospy.set_theme()

Comparing Data from Multiple Sensors#

Often, we find the need to compare multiple variables against one another. In the air sensor world, this may be the regression of two variables against one another or one air sensor against a reference sensor. Using the atmospy.regplot, we can easily plot the regression between two variables, fit a linear model, and display the fit parameters on the figure itself:

df = atmospy.load_dataset("air-sensors-pm")

# plot the Reference measurement vs Sensor A using defaults
atmospy.regplot(df, x="Reference", y="Sensor A");
../_images/plots_3_0.png

Under the hood, atmospy.regplot is simply making a call to seaborn’s jointplot function with a few added pieces including adding a unity line and displaying the fit parameters for the linear model in the legend.

The call returns a seaborn JointGrid which is a group of subplots. As seen above, the joint axis shows the relationship between the two variables with a unity line (1:1), the best fit line (shown in solid blue above), and the distributions for each variables shown on the marginal axes.

If for some reason, you don’t want to fit a linear model to the data, you can turn that functionality off by setting fit_reg=False:

atmospy.regplot(df, x="Reference", y="Sensor A", fit_reg=False);
../_images/plots_5_0.png

Because we’re just making a call to seaborn’s jointplot function, you can send along most configuration options as keyword arguments to atmospy.regplot. For example, if you wanted to change what is shown on the marginal axes, you can:

atmospy.regplot(
    df, x="Reference", y="Sensor A", marginal_kws={"bins": 25, "fill": False}
);
../_images/plots_7_0.png

A JointGrid object is returned, which allows you to continue customizing as you’d like:

g = atmospy.regplot(df, x="Reference", y="Sensor A")
g.plot_marginals(sns.rugplot, color="r", height=-0.15, clip_on=False);
../_images/plots_9_0.png

You can easily edit the marker choice, color of marker, or anything else as well:

atmospy.regplot(df, x="Reference", y="Sensor A", color="g", marker="^", alpha=0.15);
../_images/plots_11_0.png

Identifying Sources#

Understanding from which direction a pollutant originated is a common question in air pollution work. A pollution rose - a variant of the more common wind rose - is one way to visualize this. A pollution rose is a polar plot that visualizes the originating direction and intensity of a specific pollutant. In atmospy, you can use the atmospy.pollutionroseplot to visualize this:

# load an example dataset with MET info
met = atmospy.load_dataset("air-sensors-met")

atmospy.pollutionroseplot(
    data=met,
    ws="ws",
    wd="wd",
    pollutant="pm25",
    calm=0.0,
);
../_images/plots_30_0.png

Data for the pollutant of choice are grouped by both the range of pollutant (as set by bins) and by wind direction (as set by segments). To ensure that data is not included when the wind speed is calm (as set by calm), we first remove that data and indicate on the figure above using the blank center point at the middle of the figure - we assign calm periods evenly across all directions. In the figure above, something like a few percent of data records are registered as calm.

This is important because wind direction data during calm winds isn’t really relevant and/or statistically significant depending on the instrument used to measure the wind speed and wind direction.

In the figure above, the longer the bar, the more data records that are associated with it - in other words, that is the direction from which most data came. The color of a bar indicates the pollutant concentration. In the figure above, most of the PM2.5 is coming from the south-west and south-east. With the pollutant bin sizes set so large, it’s hard to see a pollution differential, so let’s go ahead and modify that:

atmospy.pollutionroseplot(
    data=met,
    ws="ws",
    wd="wd",
    pollutant="pm25",
    bins=[0, 8, 15, 25, 35, 50, 100],
    segments=32,
    calm=0.0,
    suffix="$µgm^{-3}$",
    title="$PM_{2.5}$ by Direction at an Unknown Location",
);
../_images/plots_32_0.png

The above figure has quite a bit more resolution along both the theta and radii as we modified the bins and segments parameters. You can define bins to be any array-like structure so long as they’re numeric. pollutionroseplot will always add inf (infinity) at the end if you didn’t include it so that there is always a catch-all bin for values higher than the max specified. You can manually define this list to be whatever resolution or chunkiness you’d like, or, you can create an evenly-spaced array using NumPy: np.linspace(0, 100, 10).

The segments parameter is a bit different - you define an integer number of segments to divide the 360 degrees into and the code will automatically handle it.

Above, we went ahead and showed some of the additional configuration options that you can include such as the suffix, which labels the legend, and the title which adds a title to the figure.

Facets and other Fun Things#

Now that you’ve seen how to use the individual plots above, we’re going to go over some of the advanced features that are available by leveraging seaborn’s great grid functionality.

Often, it can be useful to draw multiple versions of the same figure with slight differences in what’s being plotted. For example, you may want to plot a pollution rose, but plot each month of data separately. Or, you may want to plot a diel profile, but show the results for weekdays vs weekends on separate plots so that you can easily see the difference caused by traffic patterns.

To do so, we can use seaborn’s FacetGrid. Seaborn has a great tutorial on how to use FacetGrid’s here that we advise you read. Next, we’ll go over several examples that showcase some of the cool things you can do with atmospy and seaborn together.

FacetGrid and the pollutionroseplot#

Using the example dataset (air-sensors-met), let’s plot the pollution rose separetely for each month (the dataset contains data for April through November). To do so, we will use the FacetGrid function from seaborn. First, we will add a column that will serve as the dimension of the figure. In this case, we want to extract the month:

# load the example dataset
met = atmospy.load_dataset("air-sensors-met")

# add a column that extracts the month from the timestamp_local column
met.loc[:, "Month"] = met["timestamp_local"].dt.month_name()

# print the first 5 records
met.head()
timestamp_local wd ws pm1 pm25 pm10 temp rh Month
4 2023-05-01 00:00:00 229.296667 0.862333 2.189933 2.370022 5.517010 14.603333 53.111667 May
5 2023-05-01 01:00:00 233.001667 0.911500 2.360152 2.516538 4.623340 13.715000 54.445000 May
6 2023-05-01 02:00:00 230.731667 0.930667 2.499550 2.682507 4.816372 13.310000 58.595000 May
7 2023-05-01 03:00:00 218.756667 0.892167 2.629282 2.806970 4.851060 13.125000 59.406667 May
8 2023-05-01 04:00:00 206.043333 0.791333 2.974715 3.184287 5.859437 12.210000 62.248333 May

As we can see above, we now have a column with the month name. In order to use the FacetGrid properly, we need to convert our wide-form dataframe into a long-form dataframe. For a much better explanation of the difference than I can provide, please read through the seaborn explanation here.

We can easily convert our dataframe to long-form by using the Pandas melt function:

met_long_form = met.melt(
    id_vars=["timestamp_local", "Month", "ws", "wd"], value_vars=["pm25"]
)

# print the first 5 records
met_long_form.head()
timestamp_local Month ws wd variable value
0 2023-05-01 00:00:00 May 0.862333 229.296667 pm25 2.370022
1 2023-05-01 01:00:00 May 0.911500 233.001667 pm25 2.516538
2 2023-05-01 02:00:00 May 0.930667 230.731667 pm25 2.682507
3 2023-05-01 03:00:00 May 0.892167 218.756667 pm25 2.806970
4 2023-05-01 04:00:00 May 0.791333 206.043333 pm25 3.184287

Next, we will set up our FacetGrid and tell it to use the Month column as the dimension and to wrap every 3 so that it all fits into one nice figure:

# set up the FacetGrid
g = sns.FacetGrid(
    data=met_long_form,
    col="Month",
    col_wrap=3,
    subplot_kws={"projection": "polar"},
    despine=False,
)

# map the dataframe using the pollutionroseplot function
g.map_dataframe(
    atmospy.pollutionroseplot,
    ws="ws",
    wd="wd",
    pollutant="value",
    faceted=True,
    segments=20,
    suffix="$µgm^{-3}$",
)

# add the legend and place it where it looks nice
g.add_legend(
    title="$PM_{2.5}$", bbox_to_anchor=(0.535, 0.2), handlelength=1, handleheight=1
);
../_images/plots_40_0.png

Not bad for 3 lines of code (yes, they are split across more than 3 lines for readability, but still - 3 lines!)!

FacetGrid and the dielplot#

Next, we’re going to go ahead and explore what we can do with the dielplot on a FacetGrid. Like above, we can simply plot a slightly different subset of the data in each column - let’s go ahead and walk through an example. Let’s plot the diel profile for black carbon on weekdays versus weekends.

First, we need to load our example dataset and modify it a bit to provide the information we want to facet by:

# load the data
bc = atmospy.load_dataset("us-bc")

# select just one random location for now
bc_single_site = bc[bc["Local Site Name"] == bc["Local Site Name"].unique()[0]]

# create a column that sets a bool if the date is a weekend
bc_single_site.loc[:, "Is Weekend"] = (
    bc_single_site["Timestamp Local"].dt.day_name().isin(["Saturday", "Sunday"])
)

# convert to long-form for faceting
bc_long_form = bc_single_site.melt(
    id_vars=["Timestamp Local", "Is Weekend"], value_vars=["Sample Measurement"]
)

# print the first 5 records
bc_long_form.head()
Timestamp Local Is Weekend variable value
0 2023-01-01 00:00:00 True Sample Measurement 2.76
1 2023-01-01 01:00:00 True Sample Measurement 2.55
2 2023-01-01 02:00:00 True Sample Measurement 3.18
3 2023-01-01 03:00:00 True Sample Measurement 1.64
4 2023-01-01 04:00:00 True Sample Measurement 1.79

Now that we have our data prepared, we can set up a FacetGrid like above and define the column to facet by as the new column we just created:

g = sns.FacetGrid(
    data=bc_long_form,
    col="Is Weekend",
    # let's adjust the aspect ratio for funsies
    aspect=1.25,
)
g.map_dataframe(atmospy.dielplot, x="Timestamp Local", y="value");
../_images/plots_45_0.png

While this isn’t the greatest example, we can see there is difference between weekday’s and weekend’s in the early morning, though the IQR band is quite wide. At some point, we will add some better example datasets so that these figures are more impressive. For now, they work!

Now, what if we had two locations that we wanted to compare? Let’s go ahead and pull data for two sites and show the difference by site rather than by weekday/weekend:

# load the data
bc = atmospy.load_dataset("us-bc")

# select just one random location for now
bc_multi_site = bc[bc["Local Site Name"].isin(bc["Local Site Name"].unique()[0:2])]

# create a column that sets a bool if the date is a weekend
bc_multi_site.loc[:, "Is Weekend"] = (
    bc_multi_site["Timestamp Local"].dt.day_name().isin(["Saturday", "Sunday"])
)

# convert to long-form for faceting
bc_long_form = bc_multi_site.melt(
    id_vars=["Timestamp Local", "Is Weekend", "Local Site Name"],
    value_vars=["Sample Measurement"],
)

# print the first 5 records
bc_long_form.head()
Timestamp Local Is Weekend Local Site Name variable value
0 2023-01-01 00:00:00 True I-25 Sample Measurement 2.76
1 2023-01-01 01:00:00 True I-25 Sample Measurement 2.55
2 2023-01-01 02:00:00 True I-25 Sample Measurement 3.18
3 2023-01-01 03:00:00 True I-25 Sample Measurement 1.64
4 2023-01-01 04:00:00 True I-25 Sample Measurement 1.79

Let’s set up the FacetGrid and plot the diel trend by location:

g = sns.FacetGrid(
    bc_long_form,
    row="Local Site Name",
    hue="Local Site Name",
    aspect=1.25,
)

g.map_dataframe(atmospy.dielplot, x="Timestamp Local", y="value")

# update the y-axis limit to force to zero
g.set(ylim=(0, None));
../_images/plots_49_0.png

Rather than plotting on the column, we plotted different locations on the row, which sets up the next figure nicely - let’s go ahead and use the same data set, but plot by weekday/weekend AND two different locations together:

g = sns.FacetGrid(
    bc_long_form,
    row="Local Site Name",
    col="Is Weekend",
    hue="Local Site Name",
    aspect=1.25,
)

g.map_dataframe(atmospy.dielplot, x="Timestamp Local", y="value")

# update the y-axis limit to force to zero
g.set(ylim=(0, None), ylabel="Black Carbon")

# update the titles to take up less space
g.set_titles("{row_name} | Weekend = {col_name}");
../_images/plots_51_0.png

FacetGrid and the calendarplot#

To complete our introduction to faceting atmospy, we will go over the calendarplot function. We don’t advise faceting by unique sensors or location - those would likely be better off as individual figures. However, since both variants of the plot (e.g., by month or by year) only plot a single unit of time, we can facet on this. In other words, we can plot multiple months at a time and view the hourly-averaged data. Let’s take ozone as an example:

ozone = atmospy.load_dataset("us-ozone")

# we only want to use a single site for now
single_site = ozone[ozone["Location UUID"] == ozone["Location UUID"].unique()[0]]

# add the month name to facet on
single_site.loc[:, "Month"] = single_site["Timestamp Local"].dt.month_name()

# set up the facetgrid
g = sns.FacetGrid(data=single_site, col="Month", col_wrap=3, height=4)

# map the dataframe to the grid
g.map_dataframe(
    atmospy.calendarplot,
    x="Timestamp Local",
    y="Sample Measurement",
    freq="hour",
    cmap="YlGn",
    units="ppb",
    linewidths=0.1,
    cbar=False,
    faceted=True,
)

# update the labels
g.set(xlabel="Day of Month", ylabel="Time of Day");
../_images/plots_53_0.png

For now, we will avoid showing the yearly plot in a faceted manner while we work on finding a suitable dataset.