4.6 Article

Non-linear probabilistic calibration of low-cost environmental air pollution sensor networks for neighborhood level spatiotemporal exposure assessment

Journal

Publisher

SPRINGERNATURE
DOI: 10.1038/s41370-022-00493-y

Keywords

Exposure modeling; Air pollution; Sensors; Geospatial analyses

Funding

  1. U.S. Environmental Protection Agency [RD835871]
  2. U.S. Centers for Disease Control and Prevention, National Institute for Occupational Safety and Health [T42 OH0008428]
  3. National Science Foundation [DMS-1915803]
  4. National Institute of Environmental Health Sciences (NIEHS) [DMS-1915803, R01ES033739]
  5. National Science Foundation Graduate Research Fellowship Program [DGE1752134]
  6. National Institute of Environmental Health Sciences of the National Institutes of Health [K99ES029116, R00ES029116]

Ask authors/readers for more resources

This study presents a framework for direct field-calibration of low-cost air pollution sensors using probabilistic gradient boosted decision trees (GBDT). The results show that the probabilistic GBDT model improves point and distribution accuracies compared to linear regression models, particularly at high concentrations and on monitors not included in the training set. The study also demonstrates the use of the GBDT model in conducting probabilistic spatial assessments of human exposure on a neighborhood level.
BACKGROUND: Low-cost sensor networks for monitoring air pollution are an effective tool for expanding spatial resolution beyond the capabilities of existing state and federal reference monitoring stations. However, low-cost sensor data commonly exhibit non-linear biases with respect to environmental conditions that cannot be captured by linear models, therefore requiring extensive lab calibration. Further, these calibration models traditionally produce point estimates or uniform variance predictions which limits their downstream in exposure assessment. OBJECTIVE: Build direct field-calibration models using probabilistic gradient boosted decision trees (GBDT) that eliminate the need for resource-intensive lab calibration and that can be used to conduct probabilistic exposure assessments on the neighborhood level. METHODS: Using data from Plantower A003 particulate matter (PM) sensors deployed in Baltimore, MD from November 2018 through November 2019, a fully probabilistic NGBoost GBDT was trained on raw data from sensors co-located with a federal reference monitoring station and compared against linear regression trained on lab calibrated sensor data. The NGBoost predictions were then used in a Monte Carlo interpolation process to generate high spatial resolution probabilistic exposure gradients across Baltimore. RESULTS: We demonstrate that direct field-calibration of the raw PM2.5 sensor data using a probabilistic GBDT has improved point and distribution accuracies compared to the linear model, particularly at reference measurements exceeding 25 mu g/m(3), and also on monitors not included in the training set. SIGNIFICANCE: We provide a framework for utilizing the GBDT to conduct probabilistic spatial assessments of human exposure with inverse distance weighting that predicts the probability of a given location exceeding an exposure threshold and provides percentiles of exposure. These probabilistic spatial exposure assessments can be scaled by time and space with minimal modifications. Here, we used the probabilistic exposure assessment methodology to create high quality spatial-temporal PM2.5 maps on the neighborhood-scale in Baltimore, MD.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available