☆ 4.6 Article

A data ensemble approach for real-time air quality forecasting using extremely randomized trees and deep neural networks

NEURAL COMPUTING & APPLICATIONS (2020)

Journal

NEURAL COMPUTING & APPLICATIONS

Volume 32, Issue 11, Pages 7563-7579

Publisher

SPRINGER LONDON LTD

DOI: 10.1007/s00521-019-04287-6

Keywords

Machine learning; Urban air quality; Atmospheric chemistry; Statistical analysis; Extra-trees method

Funding

Department of Earth and Atmospheric Science (EAS Research Grant) of the University of Houston
National Institute of Environmental Research (NIER)

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Six generalized machine learning (ML) ensemble models were developed to predict the real-time hourly ozone concentration of the following day. These models were used to forecast hourly ozone concentrations of the following day for all of 2017 in the city of Seoul, South Korea. To prepare the training dataset, it was referred to observed meteorology and air pollution parameters of the 2014-2016 period. The ensemble models fuse two regression models: a low-ozone peak model and a high-ozone model. For both, extremely randomized trees and deep neural networks were used. A regularization approach was also adopted that adjusts the model toward capturing higher ozone peaks by resampling the training dataset based on the peaks. Adopting the proposed ML ensemble forecasting method over single-model ML techniques as a part of mainstream practice for air quality forecasting will be beneficial for several reasons. For one, the proposed method, which captures daily maximum ozone concentrations during the high-ozone season (April-September), reduces the ozone peak prediction error by 5 to 30 ppb. In addition, compared to station-specific (independent) ML models with more frequent low-ozone values, models are trained with a uniformly distributed dataset, so they are more generalizable in nature. As a result, unlike station-specific models, they retain their accuracy (yearly IOA=0.84-0.89) in all stations with an IOA increment. Proposed models also make predictions several times faster, requiring only one-time training for predicting an entire station network. Based on a categorical analysis of the training dataset, an algorithm was proposed for selecting the most suitable model for each month. The best model further improves the accuracy of both the ML ensemble and individual models by up to 2.4%. This study shows that the ML ensemble modeling approach is a fast, reliable, and robust technique that can benefit environmental decision-makers in urban regions.

A data ensemble approach for real-time air quality forecasting using extremely randomized trees and deep neural networks

Journal

NEURAL COMPUTING & APPLICATIONS

Publisher

SPRINGER LONDON LTD

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

A data ensemble approach for real-time air quality forecasting using extremely randomized trees and deep neural networks

Journal

NEURAL COMPUTING & APPLICATIONS

Publisher

SPRINGER LONDON LTD

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper