4.6 Article

A data ensemble approach for real-time air quality forecasting using extremely randomized trees and deep neural networks

Journal

NEURAL COMPUTING & APPLICATIONS
Volume 32, Issue 11, Pages 7563-7579

Publisher

SPRINGER LONDON LTD
DOI: 10.1007/s00521-019-04287-6

Keywords

Machine learning; Urban air quality; Atmospheric chemistry; Statistical analysis; Extra-trees method

Funding

  1. Department of Earth and Atmospheric Science (EAS Research Grant) of the University of Houston
  2. National Institute of Environmental Research (NIER)

Ask authors/readers for more resources

Six generalized machine learning (ML) ensemble models were developed to predict the real-time hourly ozone concentration of the following day. These models were used to forecast hourly ozone concentrations of the following day for all of 2017 in the city of Seoul, South Korea. To prepare the training dataset, it was referred to observed meteorology and air pollution parameters of the 2014-2016 period. The ensemble models fuse two regression models: a low-ozone peak model and a high-ozone model. For both, extremely randomized trees and deep neural networks were used. A regularization approach was also adopted that adjusts the model toward capturing higher ozone peaks by resampling the training dataset based on the peaks. Adopting the proposed ML ensemble forecasting method over single-model ML techniques as a part of mainstream practice for air quality forecasting will be beneficial for several reasons. For one, the proposed method, which captures daily maximum ozone concentrations during the high-ozone season (April-September), reduces the ozone peak prediction error by 5 to 30 ppb. In addition, compared to station-specific (independent) ML models with more frequent low-ozone values, models are trained with a uniformly distributed dataset, so they are more generalizable in nature. As a result, unlike station-specific models, they retain their accuracy (yearly IOA=0.84-0.89) in all stations with an IOA increment. Proposed models also make predictions several times faster, requiring only one-time training for predicting an entire station network. Based on a categorical analysis of the training dataset, an algorithm was proposed for selecting the most suitable model for each month. The best model further improves the accuracy of both the ML ensemble and individual models by up to 2.4%. This study shows that the ML ensemble modeling approach is a fast, reliable, and robust technique that can benefit environmental decision-makers in urban regions.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available