4.7 Article

Machine learning-based landslide susceptibility assessment with optimized ratio of landslide to non-landslide samples

Journal

GONDWANA RESEARCH
Volume 123, Issue -, Pages 198-216

Publisher

ELSEVIER
DOI: 10.1016/j.gr.2022.05.012

Keywords

Landslide susceptibility assessment; Machine learning; Sample ratio; Bayesian optimization

Ask authors/readers for more resources

Machine learning models have been widely used in landslide susceptibility assessment, and the proper ratio of landslide to nonlandslide samples is crucial for model accuracy. This paper proposes a Bayesian optimization method to optimize the sample ratio and improve the performance of machine learning models. The results show that the optimized ratio enhances the performance of support vector machine, random forest, and gradient boost decision tree models.
Machine learning models have been widely used for landslide susceptibility assessment (LSA) in recent years. The accuracy of machine learning-based LSA often hinges on the ratio of landslide to nonlandslide (or positive/negative, P/N) samples. A proper ratio of the P/N samples will significantly improve the performance of machine learning-based LSA, but an improper ratio can cause inadequate training or data pollution. Conventionally, the determination of the P/N sample ratio is based on experience or by trials and errors, which has substantial uncertainties. This paper proposes a Bayesian optimization method to optimize the P/N sample ratio for machine learning models. Firstly, Anhua County in Hunan province of China is selected as the study area because of numerous landslide disasters that occurred in recent years. Secondly, three representative machine learning models of the support vector machine (SVM), the random forest (RF) and the gradient boost decision tree (GBDT) are adopted to assess the landslide susceptibility. Subsequently, a Bayesian optimization algorithm is used to obtain the optimal P/N sample ratio, considering the effects of various ratios of training/test set. Finally, the improved models and the corresponding landslide susceptibility maps are established using the obtained optimal P/N sample ratio. The results show that the performance of SVM, RF and GBDT are all improved with the optimized P/N sample ratio. The highest AUC value is for the RF model (0.840, improved by 1.3%), followed by GBDT (0.831, improved by 1.3%), and SVM (0.775, improved by 0.7%). However, the RF and GBDT are more suitable than SVM to address sample unbalance issues in LSA. It is suggested to use the Bayesian optimization algorithm to optimize the P/N sample ratio in machine learning-based LSA model. (c) 2022 International Association for Gondwana Research. Published by Elsevier B.V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available