☆ 4.6 Article

Predictive performance of presence-only species distribution models: a benchmark study with reproducible code

ECOLOGICAL MONOGRAPHS (2022)

期刊

ECOLOGICAL MONOGRAPHS

卷 92, 期 1, 页码 -

出版社

WILEY

DOI: 10.1002/ecm.1486

关键词

boosted regression trees; down sampling; ecological niche model; ensemble modeling; imbalanced data; independent test data; machine learning; maxent; model evaluation; point process weighting; presence-background; random forest

类别

Ecology

资金

Australian Government Research Training Program Scholarship
Rowden White Scholarship
Australian Research Council (ARC) Discovery Early Career Researcher Award [DE160100904]
ARC Discovery Project [160101003]
National Centre for Ecological Analysis and Synthesis, Santa Barbara, California [4980]
NCEAS

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This study reanalyzed a dataset of 225 species from six different regions to explore patterns in predictive performance across different methods. It found that the way models are fitted matters, with some emerging methods outperforming traditional regression algorithms.

Species distribution modeling (SDM) is widely used in ecology and conservation. Currently, the most available data for SDM are species presence-only records (available through digital databases). There have been many studies comparing the performance of alternative algorithms for modeling presence-only data. Among these, a 2006 paper from Elith and colleagues has been particularly influential in the field, partly because they used several novel methods (at the time) on a global data set that included independent presence-absence records for model evaluation. Since its publication, some of the algorithms have been further developed and new ones have emerged. In this paper, we explore patterns in predictive performance across methods, by reanalyzing the same data set (225 species from six different regions) using updated modeling knowledge and practices. We apply well-established methods such as generalized additive models and MaxEnt, alongside others that have received attention more recently, including regularized regressions, point-process weighted regressions, random forests, XGBoost, support vector machines, and the ensemble modeling framework biomod. All the methods we use include background samples (a sample of environments in the landscape) for model fitting. We explore impacts of using weights on the presence and background points in model fitting. We introduce new ways of evaluating models fitted to these data, using the area under the precision-recall gain curve, and focusing on the rank of results. We find that the way models are fitted matters. The top method was an ensemble of tuned individual models. In contrast, ensembles built using the biomod framework with default parameters performed no better than single moderate performing models. Similarly, the second top performing method was a random forest parameterized to deal with many background samples (contrasted to relatively few presence records), which substantially outperformed other random forest implementations. We find that, in general, nonparametric techniques with the capability of controlling for model complexity outperformed traditional regression methods, with MaxEnt and boosted regression trees still among the top performing models. All the data and code with working examples are provided to make this study fully reproducible.

Predictive performance of presence-only species distribution models: a benchmark study with reproducible code

期刊

ECOLOGICAL MONOGRAPHS

出版社

WILEY

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Predictive performance of presence-only species distribution models: a benchmark study with reproducible code

期刊

ECOLOGICAL MONOGRAPHS

出版社

WILEY

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文