4.6 Article

The dilemma of determining the superiority of data mining models: optimal sampling balance and end users' perspectives matter

Journal

BULLETIN OF ENGINEERING GEOLOGY AND THE ENVIRONMENT
Volume 79, Issue 4, Pages 1707-1720

Publisher

SPRINGER HEIDELBERG
DOI: 10.1007/s10064-019-01687-9

Keywords

MaxEnt; SVM; ANFIS-ICA; ROC curve; False negative; False positive

Ask authors/readers for more resources

This work pinpoints two main understated issues in landslide susceptibility modeling: (1) how assumptions regarding data sampling balances can significantly affect models' performances and (2) how different modeling perspectives and, in particular, craving for specific attributes in the models can considerably influence the sieving process of the models. Three data mining models and their two-mode ensembles were selected as the basis of our experiment, namely, support vector machine (SVM), maximum entropy (MaxEnt), the ensemble of the adaptive neuro-fuzzy inference system and the imperialistic competitive algorithm (ANFIS-ICA), and their addition/multiplicity ensemble modes (WAE and WME). Further, we imitated four community groups and the main goals they aspire, namely, a speculative builder or a financial risk analyst (seeking the highest economic opportunities), people or NGOs (seeking the lowest human casualties and economic losses), the government (seeking a trade-off between the two latter goals), and a mechanical engineering supervisor (seeking the most robust and stable model design). Results revealed that, in contrast to some assumptions made by several researchers in different literature, the 70:30% partitioned training/validation samples would not give satisfactory results in our study area but, instead, 60:40% partition seems to be a good trade-off for the models' learning and prediction powers. Moreover, the area under the receiver operating characteristic (AUROC) curves suggested that the hybrid of ANFIS-ICA shows excellent results compared with its counterparts. Regarding the model selection stage at the optimal sample balance of 60:40%, it was conceived that although the WME model showed the lowest error type II (false negative) in both training and validation stages, it manifested the highest error type I (false positive) while other models placed somewhere in between. Conversely, the WAE outperformed other models in terms of the lowest error type I. Further, the robustness analysis suggested that SVM and MaxEnt models can provide more stable results compared with their counterparts. Hence, in the process of model selection, perspectives matter the most as there is no one model that performs best for every problem.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available