☆ 4.7 Article

Comparison of data-driven methods for linking extreme precipitation events to local and large-scale meteorological variables

STOCHASTIC ENVIRONMENTAL RESEARCH AND RISK ASSESSMENT (2023)

Journal

STOCHASTIC ENVIRONMENTAL RESEARCH AND RISK ASSESSMENT

Volume -, Issue -, Pages -

Publisher

SPRINGER

DOI: 10.1007/s00477-023-02511-3

Keywords

Extreme precipitation; Meteorological drivers; Machine learning; Logistic regression; ROC curve

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

Extreme precipitation events can have severe negative consequences for society, the economy, and the environment, so it is important to understand when and why they occur. This study compares the performance of logistic regression and three commonly used supervised machine learning algorithms in determining whether extreme events occur locally. The results show that logistic regression performs similarly to more complex machine learning algorithms, highlighting the value of comparing different modeling approaches.

Extreme precipitation events can lead to severe negative consequences for society, the economy, and the environment. It is therefore crucial to understand when such events occur. In the literature, there are a vast number of methods for analyzing their connection to meteorological drivers. However, there has been recent interest in using machine learning methods instead of classic statistical models. While a few studies in climate research have compared the performance of these two approaches, their conclusions are inconsistent. To determine whether an extreme event occurred locally, we trained models using logistic regression and three commonly used supervised machine learning algorithms tailored for discrete outcomes: random forests, neural networks, and support vector machines. We used five explanatory variables (geopotential height at 500 hPa, convective available potential energy, total column water, sea surface temperature, and air surface temperature) from ERA5, and local data from the Danish Meteorological Institute. During the variable selection process, we found that convective available potential energy has the strongest relationship with extreme events. Our results showed that logistic regression performs similarly to more complex machine learning algorithms regarding discrimination as measured by the area under the receiver operating characteristic curve (ROC AUC) and other performance metrics specialized for unbalanced datasets. Specifically, the ROC AUC for logistic regression was 0.86, while the best-performing machine learning algorithm achieved a ROC AUC of 0.87. This study emphasizes the value of comparing machine learning and classical regression modeling, especially when employing a limited set of well-established explanatory variables.

Comparison of data-driven methods for linking extreme precipitation events to local and large-scale meteorological variables

Journal

STOCHASTIC ENVIRONMENTAL RESEARCH AND RISK ASSESSMENT

Publisher

SPRINGER

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Comparison of data-driven methods for linking extreme precipitation events to local and large-scale meteorological variables

Journal

STOCHASTIC ENVIRONMENTAL RESEARCH AND RISK ASSESSMENT

Publisher

SPRINGER

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper