4.7 Article

The use of maximum entropy and ecological niche factor analysis to decrease uncertainties in samples for urban gain models

期刊

GISCIENCE & REMOTE SENSING
卷 60, 期 1, 页码 -

出版社

TAYLOR & FRANCIS LTD
DOI: 10.1080/15481603.2023.2222980

关键词

Uncertainty; Urban gain modeling; Data-driven models; Maximum entropy; Imbalance learning; Data mining

向作者/读者索取更多资源

This study aims to present and develop novel strategies for sampling and building training datasets in order to enhance the performance of data-driven models in urban gain modeling (UGM). The maximum entropy (ME) and ecological niche factor analysis (ENFA) models were used to select pure non-change samples with minimal uncertainty for training datasets in Isfahan and Tabriz cities in Iran. The results showed that the ME model was able to identify relatively pure non-change samples and properly remove impure non-change samples from the training dataset.
Uncertainty is a common problem in spatial modeling and geographical information systems (GIS). Furthermore, urban gain modeling (UGM) contains various dimensions and components of uncertainties. Data sampling is important in UGM, and may cause the results of the models to contain many uncertainties as well as affects their precision and accuracy. A poorly sampled or biased dataset can lead to inaccurate predictions and decreased performance of the models. This paper aims to present and develop novel strategies for sampling and building training datasets that can enhance the performance of data-driven models. In other words, the present study used maximum entropy (ME) and ecological niche factor analysis (ENFA) models to select pure non-change samples with minimal uncertainty for training datasets in UGM of Isfahan and Tabriz cities in Iran. The urban gain of two time intervals of 1992-2002 and 2002-2012 were used for Tabriz City and two time intervals of 1994-2004 and 2004-2014 for Isfahan City. Nine and 14 urban gain drivers were used in the UGM of Isfahan and Tabriz cities, respectively. After the ME and ENFA models produced a training dataset with change and non-change samples with the lowest uncertainty, three well-known models, namely random forest (RF), artificial neural network (ANN), and support vector machine (SVM) were used for the modeling. Moreover, the ME and ENFA models that were used to investigate the uncertainty of the sampling procedure were used as the one-class prediction models. Compared to extant studies, the proposed ME - based sampling strategy increased the area under the receiver operating characteristic curve (AUROC), figure of merit, producer's accuracy, and overall accuracy by 5.5%, 5%, 5%, and 3%, respectively, in the validation phase of Isfahan City and by 5%, 6%, 14%, and 17%, respectively, for Tabriz City. For Isfahan, the accuracies of ME (AUROC = 0.649) and ENFA (AUROC = 0.661) one - class models were closer to that of the ANN - ME (AUROC = 0.646), ANN - ENFA (AUROC = 0.619), and RF - ENFA (AUROC = 0.631) models but differed significantly from that of the RF - ME (AUROC = 0.737) model. For Tabriz, the accuracies of ME (AUROC = 0.657) and ENFA (AUROC = 0.688) one - class models were lower than that of the two class RF-ME (AUROC = 0.852), and ANN-ME (AUROC = 0.778) models. The results showed that the ME model was able to identify relatively pure non-change samples and properly remove impure non-change samples from the training dataset. This study discovered that binary models are preferable to one-class models, and showed that an optimal sampling strategy is an essential step in UGM as it can decrease uncertainty. As such, modelers must adopt efficient sampling methods.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据