4.7 Article

Comparison of data-driven methods for linking extreme precipitation events to local and large-scale meteorological variables

Publisher

SPRINGER
DOI: 10.1007/s00477-023-02511-3

Keywords

Extreme precipitation; Meteorological drivers; Machine learning; Logistic regression; ROC curve

Ask authors/readers for more resources

Extreme precipitation events can have severe negative consequences for society, the economy, and the environment, so it is important to understand when and why they occur. This study compares the performance of logistic regression and three commonly used supervised machine learning algorithms in determining whether extreme events occur locally. The results show that logistic regression performs similarly to more complex machine learning algorithms, highlighting the value of comparing different modeling approaches.
Extreme precipitation events can lead to severe negative consequences for society, the economy, and the environment. It is therefore crucial to understand when such events occur. In the literature, there are a vast number of methods for analyzing their connection to meteorological drivers. However, there has been recent interest in using machine learning methods instead of classic statistical models. While a few studies in climate research have compared the performance of these two approaches, their conclusions are inconsistent. To determine whether an extreme event occurred locally, we trained models using logistic regression and three commonly used supervised machine learning algorithms tailored for discrete outcomes: random forests, neural networks, and support vector machines. We used five explanatory variables (geopotential height at 500 hPa, convective available potential energy, total column water, sea surface temperature, and air surface temperature) from ERA5, and local data from the Danish Meteorological Institute. During the variable selection process, we found that convective available potential energy has the strongest relationship with extreme events. Our results showed that logistic regression performs similarly to more complex machine learning algorithms regarding discrimination as measured by the area under the receiver operating characteristic curve (ROC AUC) and other performance metrics specialized for unbalanced datasets. Specifically, the ROC AUC for logistic regression was 0.86, while the best-performing machine learning algorithm achieved a ROC AUC of 0.87. This study emphasizes the value of comparing machine learning and classical regression modeling, especially when employing a limited set of well-established explanatory variables.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available