Plotting Functions#
Atmospy is a library for making useful, professional figures for atmospheric chemistry and air quality professionals. It is built on top of matplotlib, pandas, and seaborn. Below, we will walk through examples of how to use atmospy and how you can extend it with seaborn and matplotlib.
import atmospy
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
atmospy.set_theme()
Comparing Data from Multiple Sensors#
Often, we find the need to compare multiple variables against one
another. In the air sensor world, this may be the regression of two
variables against one another or one air sensor against a reference
sensor. Using the atmospy.regplot
, we can easily plot the regression
between two variables, fit a linear model, and display the fit
parameters on the figure itself:
df = atmospy.load_dataset("air-sensors-pm")
# plot the Reference measurement vs Sensor A using defaults
atmospy.regplot(df, x="Reference", y="Sensor A");

Under the hood, atmospy.regplot
is simply making a call to seaborn’s
jointplot
function with a few added pieces including adding a unity
line and displaying the fit parameters for the linear model in the
legend.
The call returns a seaborn JointGrid
which is a group of subplots.
As seen above, the joint axis shows the relationship between the two
variables with a unity line (1:1), the best fit line (shown in solid
blue above), and the distributions for each variables shown on the
marginal axes.
If for some reason, you don’t want to fit a linear model to the data,
you can turn that functionality off by setting fit_reg=False
:
atmospy.regplot(df, x="Reference", y="Sensor A", fit_reg=False);

Because we’re just making a call to seaborn’s jointplot
function,
you can send along most configuration options as keyword arguments to
atmospy.regplot
. For example, if you wanted to change what is shown
on the marginal axes, you can:
atmospy.regplot(
df, x="Reference", y="Sensor A", marginal_kws={"bins": 25, "fill": False}
);

A JointGrid
object is returned, which allows you to continue
customizing as you’d like:
g = atmospy.regplot(df, x="Reference", y="Sensor A")
g.plot_marginals(sns.rugplot, color="r", height=-0.15, clip_on=False);

You can easily edit the marker choice, color of marker, or anything else as well:
atmospy.regplot(df, x="Reference", y="Sensor A", color="g", marker="^", alpha=0.15);

Evaluating Trends#
While we often look at timeseries data to visualize trends, it can be
difficult to easily detect trends with large - or long-running -
datasets. Often, it can be easier to look at hourly or daily averages.
The calendarplot
function makes it easy to visualize hourly data by
month or daily data by year.
Visualizing hourly data by month#
Some pollutants - such as ozone - have known diurnal patterns due to the photochemical processing of precursor pollutants during the day. Others, such as carbon monoxide and other combustion-related pollutants, may show strong patterns associated with traffic patterns if near a source. Visualizing hourly data can make it easy to identify trends.
ozone = atmospy.load_dataset("us-ozone")
# we only want to use a single site for now
single_site = ozone[ozone["Location UUID"] == ozone["Location UUID"].unique()[0]]
atmospy.calendarplot(
data=single_site,
x="Timestamp Local",
y="Sample Measurement",
freq="hour",
);

The figure above shows the time of day on the y-axis and the day of month on the x-axis. By looking across the figure from left-to-right, you can quickly see that ozone - for the most part - is higher between 12-6PM than during the early morning. However, the figure above - using the defaults - doesn’t look super appealing, so let’s go ahead and modify it a bit to make it larger, explicitly set the min and max values, and chage to a different color palette:
atmospy.calendarplot(
data=single_site,
x="Timestamp Local",
y="Sample Measurement",
freq="hour",
xlabel="Day of Month",
height=4,
cmap="flare",
vmin=0,
vmax=80,
title="Ozone in [Month]",
);

Much better! By default, we aggregate data by the mean
value. We can
also change that up to plot by another aggregate, such as the max
:
atmospy.calendarplot(
data=single_site,
x="Timestamp Local",
y="Sample Measurement",
freq="hour",
xlabel="Day of Month",
height=4,
cmap="flare",
vmin=0,
vmax=80,
title="Peak Ozone in [Month]",
agg="max",
);

In this instance, there isn’t much of a difference as the data is reported hourly, so the mean and max will be the same (oops). With a more highly time-resolved dataset, this would look a bit different.
Visualizing daily data over the course of a year#
We can also visualize daily data over the course of the year - be default, we plot the daily average for the pollutant of choice:
atmospy.calendarplot(
data=single_site,
x="Timestamp Local",
y="Sample Measurement",
freq="day",
cbar=False,
height=2.5,
linewidths=0.1,
);
/home/runner/work/atmospy/atmospy/atmospy/trends.py:75: UserWarning: FixedFormatter should only be used together with FixedLocator
ax.xaxis.set_ticklabels([

You end up with a nice visualization of long-term data that looks a lot like a GitHub contribution graph. It would be even better if we chose a dataset that is complete!
Visualizing data completeness#
While we mostly plot the trend in pollutant concentration, we can easily
adapt the same figures to plot data completeness by simply switching the
agg
argument from mean
to count
:
atmospy.calendarplot(
data=single_site,
x="Timestamp Local",
y="Sample Measurement",
freq="hour",
xlabel="Day of Month",
height=4,
cmap="flare",
vmin=0,
vmax=1,
title="Ozone Data Completeness in [Month]",
agg="count",
);

Visualizing diel trends#
A diel cycle is a pattern that recurs every 24h. Often, we see diel
trends with pollutants that are driven by daytime chemistry such as
ozone or are associated with traffic patterns such as carbon monoxide,
nitrogen oxides, and other combustion tracers. We can easily visualize
the diel trend using the atmospy.dielplot
function:
atmospy.dielplot(data=single_site, x="Timestamp Local", y="Sample Measurement")

By default, we plot the mean value as a solid line and the interquartile range as the shaded region. We can also customize the figure to adjust the axis, add a title, and otherwise configure the plot via keyword arguments:
atmospy.dielplot(
data=single_site,
x="Timestamp Local",
y="Sample Measurement",
title="Diel Trend in Ozone",
ylim=(0, None),
ylabel="$O_3\;[ppb]$",
plot_kws={
"lw": 4,
},
);

Identifying Sources#
Understanding from which direction a pollutant originated is a common
question in air pollution work. A pollution rose - a variant of the more
common wind rose - is one way to visualize this. A pollution rose is a
polar plot that visualizes the originating direction and intensity of a
specific pollutant. In atmospy, you can use the
atmospy.pollutionroseplot
to visualize this:
# load an example dataset with MET info
met = atmospy.load_dataset("air-sensors-met")
atmospy.pollutionroseplot(
data=met,
ws="ws",
wd="wd",
pollutant="pm25",
calm=0.0,
);

Data for the pollutant of choice are grouped by both the range of
pollutant (as set by bins
) and by wind direction (as set by
segments
). To ensure that data is not included when the wind speed
is calm (as set by calm
), we first remove that data and indicate on
the figure above using the blank center point at the middle of the
figure - we assign calm periods evenly across all directions. In the
figure above, something like a few percent of data records are
registered as calm.
This is important because wind direction data during calm winds isn’t really relevant and/or statistically significant depending on the instrument used to measure the wind speed and wind direction.
In the figure above, the longer the bar, the more data records that are associated with it - in other words, that is the direction from which most data came. The color of a bar indicates the pollutant concentration. In the figure above, most of the PM2.5 is coming from the south-west and south-east. With the pollutant bin sizes set so large, it’s hard to see a pollution differential, so let’s go ahead and modify that:
atmospy.pollutionroseplot(
data=met,
ws="ws",
wd="wd",
pollutant="pm25",
bins=[0, 8, 15, 25, 35, 50, 100],
segments=32,
calm=0.0,
suffix="$µgm^{-3}$",
title="$PM_{2.5}$ by Direction at an Unknown Location",
);

The above figure has quite a bit more resolution along both the theta
and radii as we modified the bins
and segments
parameters. You
can define bins
to be any array-like structure so long as they’re
numeric. pollutionroseplot
will always add inf
(infinity) at the
end if you didn’t include it so that there is always a catch-all bin for
values higher than the max specified. You can manually define this list
to be whatever resolution or chunkiness you’d like, or, you can create
an evenly-spaced array using NumPy: np.linspace(0, 100, 10)
.
The segments
parameter is a bit different - you define an integer
number of segments to divide the 360 degrees into and the code will
automatically handle it.
Above, we went ahead and showed some of the additional configuration
options that you can include such as the suffix
, which labels the
legend, and the title
which adds a title to the figure.
Facets and other Fun Things#
Now that you’ve seen how to use the individual plots above, we’re going to go over some of the advanced features that are available by leveraging seaborn’s great grid functionality.
Often, it can be useful to draw multiple versions of the same figure with slight differences in what’s being plotted. For example, you may want to plot a pollution rose, but plot each month of data separately. Or, you may want to plot a diel profile, but show the results for weekdays vs weekends on separate plots so that you can easily see the difference caused by traffic patterns.
To do so, we can use seaborn’s FacetGrid
. Seaborn has a great
tutorial on how to use FacetGrid
’s
here that we
advise you read. Next, we’ll go over several examples that showcase some
of the cool things you can do with atmospy
and seaborn
together.
FacetGrid
and the pollutionroseplot
#
Using the example dataset (air-sensors-met
), let’s plot the
pollution rose separetely for each month (the dataset contains data for
April through November). To do so, we will use the FacetGrid
function from seaborn
. First, we will add a column that will serve
as the dimension of the figure. In this case, we want to extract the
month:
# load the example dataset
met = atmospy.load_dataset("air-sensors-met")
# add a column that extracts the month from the timestamp_local column
met.loc[:, "Month"] = met["timestamp_local"].dt.month_name()
# print the first 5 records
met.head()
timestamp_local | wd | ws | pm1 | pm25 | pm10 | temp | rh | Month | |
---|---|---|---|---|---|---|---|---|---|
4 | 2023-05-01 00:00:00 | 229.296667 | 0.862333 | 2.189933 | 2.370022 | 5.517010 | 14.603333 | 53.111667 | May |
5 | 2023-05-01 01:00:00 | 233.001667 | 0.911500 | 2.360152 | 2.516538 | 4.623340 | 13.715000 | 54.445000 | May |
6 | 2023-05-01 02:00:00 | 230.731667 | 0.930667 | 2.499550 | 2.682507 | 4.816372 | 13.310000 | 58.595000 | May |
7 | 2023-05-01 03:00:00 | 218.756667 | 0.892167 | 2.629282 | 2.806970 | 4.851060 | 13.125000 | 59.406667 | May |
8 | 2023-05-01 04:00:00 | 206.043333 | 0.791333 | 2.974715 | 3.184287 | 5.859437 | 12.210000 | 62.248333 | May |
As we can see above, we now have a column with the month name. In order
to use the FacetGrid
properly, we need to convert our wide-form
dataframe into a long-form dataframe. For a much better explanation of
the difference than I can provide, please read through the seaborn
explanation
here.
We can easily convert our dataframe to long-form by using the Pandas
melt
function:
met_long_form = met.melt(
id_vars=["timestamp_local", "Month", "ws", "wd"], value_vars=["pm25"]
)
# print the first 5 records
met_long_form.head()
timestamp_local | Month | ws | wd | variable | value | |
---|---|---|---|---|---|---|
0 | 2023-05-01 00:00:00 | May | 0.862333 | 229.296667 | pm25 | 2.370022 |
1 | 2023-05-01 01:00:00 | May | 0.911500 | 233.001667 | pm25 | 2.516538 |
2 | 2023-05-01 02:00:00 | May | 0.930667 | 230.731667 | pm25 | 2.682507 |
3 | 2023-05-01 03:00:00 | May | 0.892167 | 218.756667 | pm25 | 2.806970 |
4 | 2023-05-01 04:00:00 | May | 0.791333 | 206.043333 | pm25 | 3.184287 |
Next, we will set up our FacetGrid
and tell it to use the Month
column as the dimension and to wrap every 3 so that it all fits into one
nice figure:
# set up the FacetGrid
g = sns.FacetGrid(
data=met_long_form,
col="Month",
col_wrap=3,
subplot_kws={"projection": "polar"},
despine=False,
)
# map the dataframe using the pollutionroseplot function
g.map_dataframe(
atmospy.pollutionroseplot,
ws="ws",
wd="wd",
pollutant="value",
faceted=True,
segments=20,
suffix="$µgm^{-3}$",
)
# add the legend and place it where it looks nice
g.add_legend(
title="$PM_{2.5}$", bbox_to_anchor=(0.535, 0.2), handlelength=1, handleheight=1
);

Not bad for 3 lines of code (yes, they are split across more than 3 lines for readability, but still - 3 lines!)!
FacetGrid
and the dielplot
#
Next, we’re going to go ahead and explore what we can do with the
dielplot
on a FacetGrid
. Like above, we can simply plot a
slightly different subset of the data in each column - let’s go ahead
and walk through an example. Let’s plot the diel profile for black
carbon on weekdays versus weekends.
First, we need to load our example dataset and modify it a bit to provide the information we want to facet by:
# load the data
bc = atmospy.load_dataset("us-bc")
# select just one random location for now
bc_single_site = bc[bc["Local Site Name"] == bc["Local Site Name"].unique()[0]]
# create a column that sets a bool if the date is a weekend
bc_single_site.loc[:, "Is Weekend"] = (
bc_single_site["Timestamp Local"].dt.day_name().isin(["Saturday", "Sunday"])
)
# convert to long-form for faceting
bc_long_form = bc_single_site.melt(
id_vars=["Timestamp Local", "Is Weekend"], value_vars=["Sample Measurement"]
)
# print the first 5 records
bc_long_form.head()
Timestamp Local | Is Weekend | variable | value | |
---|---|---|---|---|
0 | 2023-01-01 00:00:00 | True | Sample Measurement | 2.76 |
1 | 2023-01-01 01:00:00 | True | Sample Measurement | 2.55 |
2 | 2023-01-01 02:00:00 | True | Sample Measurement | 3.18 |
3 | 2023-01-01 03:00:00 | True | Sample Measurement | 1.64 |
4 | 2023-01-01 04:00:00 | True | Sample Measurement | 1.79 |
Now that we have our data prepared, we can set up a FacetGrid
like
above and define the column to facet by as the new column we just
created:
g = sns.FacetGrid(
data=bc_long_form,
col="Is Weekend",
# let's adjust the aspect ratio for funsies
aspect=1.25,
)
g.map_dataframe(atmospy.dielplot, x="Timestamp Local", y="value");

While this isn’t the greatest example, we can see there is difference between weekday’s and weekend’s in the early morning, though the IQR band is quite wide. At some point, we will add some better example datasets so that these figures are more impressive. For now, they work!
Now, what if we had two locations that we wanted to compare? Let’s go ahead and pull data for two sites and show the difference by site rather than by weekday/weekend:
# load the data
bc = atmospy.load_dataset("us-bc")
# select just one random location for now
bc_multi_site = bc[bc["Local Site Name"].isin(bc["Local Site Name"].unique()[0:2])]
# create a column that sets a bool if the date is a weekend
bc_multi_site.loc[:, "Is Weekend"] = (
bc_multi_site["Timestamp Local"].dt.day_name().isin(["Saturday", "Sunday"])
)
# convert to long-form for faceting
bc_long_form = bc_multi_site.melt(
id_vars=["Timestamp Local", "Is Weekend", "Local Site Name"],
value_vars=["Sample Measurement"],
)
# print the first 5 records
bc_long_form.head()
Timestamp Local | Is Weekend | Local Site Name | variable | value | |
---|---|---|---|---|---|
0 | 2023-01-01 00:00:00 | True | I-25 | Sample Measurement | 2.76 |
1 | 2023-01-01 01:00:00 | True | I-25 | Sample Measurement | 2.55 |
2 | 2023-01-01 02:00:00 | True | I-25 | Sample Measurement | 3.18 |
3 | 2023-01-01 03:00:00 | True | I-25 | Sample Measurement | 1.64 |
4 | 2023-01-01 04:00:00 | True | I-25 | Sample Measurement | 1.79 |
Let’s set up the FacetGrid
and plot the diel trend by location:
g = sns.FacetGrid(
bc_long_form,
row="Local Site Name",
hue="Local Site Name",
aspect=1.25,
)
g.map_dataframe(atmospy.dielplot, x="Timestamp Local", y="value")
# update the y-axis limit to force to zero
g.set(ylim=(0, None));

Rather than plotting on the column, we plotted different locations on the row, which sets up the next figure nicely - let’s go ahead and use the same data set, but plot by weekday/weekend AND two different locations together:
g = sns.FacetGrid(
bc_long_form,
row="Local Site Name",
col="Is Weekend",
hue="Local Site Name",
aspect=1.25,
)
g.map_dataframe(atmospy.dielplot, x="Timestamp Local", y="value")
# update the y-axis limit to force to zero
g.set(ylim=(0, None), ylabel="Black Carbon")
# update the titles to take up less space
g.set_titles("{row_name} | Weekend = {col_name}");

FacetGrid
and the calendarplot
#
To complete our introduction to faceting atmospy
, we will go over
the calendarplot
function. We don’t advise faceting by unique
sensors or location - those would likely be better off as individual
figures. However, since both variants of the plot (e.g., by month or by
year) only plot a single unit of time, we can facet on this. In other
words, we can plot multiple months at a time and view the
hourly-averaged data. Let’s take ozone as an example:
ozone = atmospy.load_dataset("us-ozone")
# we only want to use a single site for now
single_site = ozone[ozone["Location UUID"] == ozone["Location UUID"].unique()[0]]
# add the month name to facet on
single_site.loc[:, "Month"] = single_site["Timestamp Local"].dt.month_name()
# set up the facetgrid
g = sns.FacetGrid(data=single_site, col="Month", col_wrap=3, height=4)
# map the dataframe to the grid
g.map_dataframe(
atmospy.calendarplot,
x="Timestamp Local",
y="Sample Measurement",
freq="hour",
cmap="YlGn",
units="ppb",
linewidths=0.1,
cbar=False,
faceted=True,
)
# update the labels
g.set(xlabel="Day of Month", ylabel="Time of Day");

For now, we will avoid showing the yearly plot in a faceted manner while we work on finding a suitable dataset.