4.5 Article

Assessment of Machine Learning Algorithms in Short-term Forecasting of PM10 and PM2.5 Concentrations in Selected Polish Agglomerations

Journal

AEROSOL AND AIR QUALITY RESEARCH
Volume 21, Issue 7, Pages -

Publisher

TAIWAN ASSOC AEROSOL RES-TAAR
DOI: 10.4209/aaqr.200586

Keywords

PM10; PM2.5; Air quality; Machine learning; Short-term forecasting

Ask authors/readers for more resources

Air pollution remains a significant issue for Europeans living in urban areas, with Poland being one of the most polluted countries in Europe. Accurate PMx forecasting is essential to alert residents to pollution episodes. Machine learning, particularly the XGBoost model, shows promise for short-term PMx prediction in urban areas.
Air pollution continues to have a significant impact on Europeans living in urban areas, and episodes of elevated PMx are responsible for a large number of premature deaths (mostly due to heart disease and stroke) each year. According to the annual EEA reports, Poland is one of the most polluted countries in Europe, experiencing high PMx concentrations during winter that mostly result from large emissions and unfavourable weather conditions in combination with environmental features. Thus, in addition to implementing municipal mitigation strategies, alerting residents to pollution episodes through accurate PMx forecasting is necessary. This research aimed to assess the feasibility of short-term PMx forecasting via machine learning (ML) and the subsequent identification of the primary meteorological covariates. The data comprised 10 years of hourly winter PM10 and PM2.5 concentrations measured at 11 urban air quality monitoring stations, including background, traffic, and industrial sites, in four large Polish agglomerations, viz., Poznan, Krakow, Lodz, and Gdansk, which cover areas with high population density and diverse environments that extend from the Baltic Sea coast (Tricity) through the lowlands (Poznan and Lodz) to the highlands (Krakow). We tested four ML models: AIC-based stepwise regression, two tree-based algorithms (random forests and XGBoost), and neural networks. Employing analysis and cross-validation, we found that XGBoost performed the best, followed by random forests and neural networks, and stepwise regression performed the worst. This ranking was apparent in the threshold exceedance values of the binary forecasts obtained via regression. Overall, our results confirm the high applicability of ML to short-term air quality prediction with the perfect prog approach.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available