4.8 Article

An Ensemble Machine-Learning Model To Predict Historical PM2.5 Concentrations in China from Satellite Data

期刊

ENVIRONMENTAL SCIENCE & TECHNOLOGY
卷 52, 期 22, 页码 13260-13269

出版社

AMER CHEMICAL SOC
DOI: 10.1021/acs.est.8b02917

关键词

-

资金

  1. NASA Applied Sciences Program [NNX16AQ28G]
  2. National Institutes of Health [R01ES027892]
  3. NASA [896367, NNX16AQ28G] Funding Source: Federal RePORTER

向作者/读者索取更多资源

The long satellite aerosol data record enables assessments of historical PM2.5 level in regions where routine PM2.5 monitoring began only recently. However, most previous models reported decreased prediction accuracy when predicting PM2.5 levels outside the model-training period. In this study, we proposed an ensemble machine learning approach that provided reliable PM2.5 hindcast capabilities. The missing satellite data were first filled by multiple imputation. Then the modeling domain, China, was divided into seven regions using a spatial clustering method to control for unobserved spatial heterogeneity. A set of machine learning models including random forest, generalized additive model, and extreme gradient boosting were trained in each region separately. Finally, a generalized additive ensemble model was developed to combine predictions from different algorithms. The ensemble prediction characterized the spatiotemporal distribution of daily PM2.5 well with the cross-validation (CV) R-2 (RMSE) of 0.79 (21 mu g/m(3)). The cluster-based subregion models outperformed national models and improved the CV R-2 by similar to 0.05. Compared with previous studies, our model provided more accurate out-of-range predictions at the daily level (R-2 = 0.58, RMSE = 29 mu g/m(3)) and monthly level (R-2 = 0.76, RMSE 16 mu g/m(3)). Our hindcast modeling system allows for the construction of unbiased historical PM2.5 levels.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据