☆ 4.5 Article

Hyperparameter tuning and performance assessment of statistical and machine-learning algorithms using spatial data

ECOLOGICAL MODELLING (2019)

期刊

ECOLOGICAL MODELLING

卷 406, 期 -, 页码 109-120

出版社

ELSEVIER

DOI: 10.1016/j.ecolmodel.2019.06.002

关键词

Spatial modeling; Machine-learning; Spatial autocorrelation; Hyperparameter tuning; Spatial cross-validation

类别

Ecology

资金

EU LIFE Healthy Forest project [LIFE14 ENV/ES/000179]
German Scholars Organization/Carl Zeiss Foundation

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

While the application of machine-learning algorithms has been highly simplified in the last years due to their well-documented integration in commonly used statistical programming languages (such as R or Python), there are several practical challenges in the field of ecological modeling related to unbiased performance estimation. One is the influence of spatial autocorrelation in both hyperparameter tuning and performance estimation. Grouped cross-validation strategies have been proposed in recent years in environmental as well as medical contexts to reduce bias in predictive performance. In this study we show the effects of spatial autocorrelation on hyperparameter tuning and performance estimation by comparing several widely used machine-learning algorithms such as boosted regression trees (BRT), k-nearest neighbor (KNN), random forest (RF) and support vector machine (SVM) with traditional parametric algorithms such as logistic regression (GLM) and semi-parametric ones like generalized additive models (GAM) in terms of predictive performance. Spatial and non-spatial cross-validation methods were used to evaluate model performances aiming to obtain bias-reduced performance estimates. A detailed analysis on the sensitivity of hyperparameter tuning when using different resampling methods (spatial/non-spatial) was performed. As a case study the spatial distribution of forest disease (Diplodia sapinea) in the Basque Country (Spain) was investigated using common environmental variables such as temperature, precipitation, soil and lithology as predictors. Random Forest (mean Brier score estimate of 0.166) outperformed all other methods with regard to predictive accuracy. Though the sensitivity to hyperparameter tuning differed between the ML algorithms, there were in most cases no substantial differences between spatial and non-spatial partitioning for hyperparameter tuning. However, spatial hyperparameter tuning maintains consistency with spatial estimation of classifier performance and should be favored over non-spatial hyperparameter optimization. High performance differences (up to 47%) between the bias-reduced (spatial crossvalidation) and overoptimistic (non-spatial cross-validation) cross-validation settings showed the high need to account for the influence of spatial autocorrelation. Overoptimistic performance estimates may lead to false actions in ecological decision making based on biased model predictions.

Hyperparameter tuning and performance assessment of statistical and machine-learning algorithms using spatial data

期刊

ECOLOGICAL MODELLING

出版社

ELSEVIER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Hyperparameter tuning and performance assessment of statistical and machine-learning algorithms using spatial data

期刊

ECOLOGICAL MODELLING

出版社

ELSEVIER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文