4.7 Article

Deep Ensemble Machine Learning Framework for the Estimation of PM2.5 Concentrations

期刊

ENVIRONMENTAL HEALTH PERSPECTIVES
卷 130, 期 3, 页码 -

出版社

US DEPT HEALTH HUMAN SCIENCES PUBLIC HEALTH SCIENCE
DOI: 10.1289/EHP9752

关键词

-

资金

  1. Australian Research Council [DP210102076]
  2. Australian National Health and Medical Research Council (NHMRC) [APP2000581]
  3. NHMRC [APP1163693, APP2008813, APP2009866]
  4. Monash Graduate Scholarship
  5. Monash International Tuition Scholarship
  6. CAR PhD Top-up Scholarship
  7. China Scholarship Council [201806010405, 201906320051]

向作者/读者索取更多资源

A multiple-level stacked ensemble machine learning framework was developed to improve the estimation of daily ground-level PM2.5 concentrations. The framework outperformed benchmark models in predicting PM2.5 concentrations in Italy based on data from monitoring stations.
BACKGROUND: Accurate estimation of historical PM2.5 (particle matter with an aerodynamic diameter of less than 2.5 mu m) is critical and essential for environmental health risk assessment. OBJECTIVES: The aim of this study was to develop a multiple-level stacked ensemble machine learning framework for improving the estimation of the daily ground-level PM2.5 concentrations. METHODS: An innovative deep ensemble machine learning framework (DEML) was developed to estimate the daily PM2.5 concentrations. The framework has a three-stage structure: At the first stage, four base models [gradient boosting machine (GBM), support vector machine (SVM), random forest (RF), and eXtreme gradient boosting (XGBoost)] were used to generate a new data set of PM2.5 concentrations for training the next-stage learners. At the second stage, three meta-models [RF, XGBoost, and Generalized Linear Model (GLM)] were used to estimate PM2.5 concentrations using a combination of the original data set and the predictions from the first-stage models. At the third stage, a nonnegative least squares (NNLS) algorithm was employed to obtain the optimal weights for PM2.5 estimation. We took the data from 133 monitoring stations in Italy as an example to implement the DEML to predict daily PM(2)(.5 )at each 1 km x 1 km grid cell from 2015 to 2019 across Italy. We evaluated the model performance by performing 10-fold cross-validation (CV) and compared it with five benchmark algorithms [GBM, SVM, RF, XGBoost, and Super Learner (SL)]. RESULTS: The results revealed that the PM2.5 prediction performance of DEML [coefficients of determination (R-2) = 0.87 and root mean square error (RMSE) =5.38 mu g/m(3)] was superior to any benchmark models (with R-2 of 0.51, 0.76, 0.83, 0.70, and 0.83 for GBM, SVM, RF, XGBoost, and SL approach. respectively). DEML displayed reliable performance in capturing the spatiotemporal variations of PM2.5 in Italy. DISCUSSION: The proposed DEML framework achieved an outstanding performance in PM(2)(.5 )estimation, which could be used as a tool for more accurate environmental exposure assessment.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据