4.7 Article

Improving accuracy of air pollution exposure measurements: Statistical correction of a municipal low-cost airborne particulate matter sensor network

Journal

ENVIRONMENTAL POLLUTION
Volume 268, Issue -, Pages -

Publisher

ELSEVIER SCI LTD
DOI: 10.1016/j.envpol.2020.115833

Keywords

Air pollution; Low-cost sensor; Machine learning; On-the-fly calibration; Plantower sensor; Cross-validation

Funding

  1. 2018 Bloomberg Mayors Challenge from Bloomberg Philanthropies
  2. University of Colorado Boulder's Undergraduate Research Opportunities Program
  3. Earth Lab through the University of Colorado Boulder's Grand Challenge Initiative

Ask authors/readers for more resources

This study investigated linear and random forest models to correct PM2.5 measurements from low-cost air quality sensors in Denver against measurements from higher-end instruments. The random forest model with all time-varying covariates was found to be the most accurate for long-term data, while a multiple linear regression model using past sensor data plus additional variables performed best for on-the-fly correction.
Low-cost air quality sensors can help increase spatial and temporal resolution of air pollution exposure measurements. These sensors, however, most often produce data of lower accuracy than higher-end instruments. In this study, we investigated linear and random forest models to correct PM2.5 measurements from the Denver Department of Public Health and Environment (DDPHE)'s network of low-cost sensors against measurements from co-located U.S. Environmental Protection Agency Federal Equivalence Method (FEM) monitors. Our training set included data from five DDPHE sensors from August 2018 through May 2019. Our testing set included data from two newly deployed DDPHE sensors from September 2019 through mid-December 2019. In addition to PM2.5, temperature, and relative humidity from the low-cost sensors, we explored using additional temporal and spatial variables to capture unexplained variability in sensor measurements. We evaluated results using spatial and temporal cross-validation techniques. For the long-term dataset, a random forest model with all time-varying covariates and length of arterial roads within 500 m was the most accurate (testing RMSE = 2.9 mg/m(3) and R-2 = 0.75; leave-one-location-out (LOLO)-validation metrics on the training set: RMSE = 2.2 mg/m(3) and R-2 = 0.93). For on-the-fly correction, we found that a multiple linear regression model using the past eight weeks of low-cost sensor PM2.5, temperature, and humidity data plus a near-highway indicator predicted each new week of data best (testing RMSE = 3.1 mu g/m(3) and R-2 = 0.78; LOLO-validation metrics on the training set: RMSE = 2.3 mg/m(3) and R-2 = 0.90). The statistical methods detailed here will be used to correct low-cost sensor measurements to better understand PM2.5 pollution within the city of Denver. This work can also guide similar implementations in other municipalities by highlighting the improved accuracy from inclusion of variables other than temperature and relative humidity to improve accuracy of low-cost sensor PM2.5 data. (C) 2020 The Author(s). Published by Elsevier Ltd.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available