4.7 Article

Estimating PM2.5 concentrations via random forest method using satellite, auxiliary, and ground-level station dataset at multiple temporal scales across China in 2017

Journal

SCIENCE OF THE TOTAL ENVIRONMENT
Volume 778, Issue -, Pages -

Publisher

ELSEVIER
DOI: 10.1016/j.scitotenv.2021.146288

Keywords

PM2.5; Machine learning; Multiple data sources; Cross-validation; Mapping

Funding

  1. National Natural Science Foundation of China [41771576]
  2. Fund Project of Shaanxi Key Laboratory of Land Consolidation [2019JC11]

Ask authors/readers for more resources

The RF model calibrated PM2.5 concentrations at a 1 km resolution scale across China in 2017 using ground-level monitoring data, AOD, meteorological data, and auxiliary data. Validation using three ten-folded cross-validation methods and different temporal scales showed good performance, with the RF model outperforming most statistical regression models for calibrating PM2.5 concentrations. The findings provide an important dataset for epidemiological and air pollutants exposure risk studies in China.
Fine particulate matter with aerodynamic diameters less than 2.5 mu m (PM2.5) poses adverse impacts on public health and the environment. It is still a great challenge to estimate high-resolution PM2.5 concentrations at moderate scales. The current study calibrated PM2.5 concentrations at a 1 km resolution scale using ground-level monitoring data, Aerosol Optical Depth (AOD), meteorological data, and auxiliary data via Random Forest (RF) model across China in 2017. The three ten-folded cross-validations (CV) methods including sample-based, time-based, and spatial-based validation combined with Coefficient Square (R-2), Root-Mean-Square Error (RMSE), and Mean Predictive Error (MPE) have been used for validation at different temporal scales in terms of daily, monthly, heating seasonal, and non-heating seasonal. Finally, the distribution map of PM2.5 concentrations was illustrated based on the RF model. Some findings were achieved. The RF model performed well, with a relatively high sample-based cross-validation R-2 of 0.74, a low RMSE of 16.29 mu g x m(-3), and a small MPE of -0.282 mu g x m(-3). Meanwhile, the performance of the RF model in inferring the PM2.5 concentrations was well at urban scales except for Chengyu (CY). North China, the CY urban agglomeration, and the northwest of China exhibited relatively high PM2.5 pollution features, especially in the heating season. The robustness of the RF model in the present study outperformed most statistical regression models for calibrating PM2.5 concentrations. The outcomes can supply an up-to-date scientific dataset for epidemiological and air pollutants exposure risk studies across China. (C) 2021 Elsevier B.V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available