4.6 Article

Predictive performance of presence-only species distribution models: a benchmark study with reproducible code

期刊

ECOLOGICAL MONOGRAPHS
卷 92, 期 1, 页码 -

出版社

WILEY
DOI: 10.1002/ecm.1486

关键词

boosted regression trees; down sampling; ecological niche model; ensemble modeling; imbalanced data; independent test data; machine learning; maxent; model evaluation; point process weighting; presence-background; random forest

类别

资金

  1. Australian Government Research Training Program Scholarship
  2. Rowden White Scholarship
  3. Australian Research Council (ARC) Discovery Early Career Researcher Award [DE160100904]
  4. ARC Discovery Project [160101003]
  5. National Centre for Ecological Analysis and Synthesis, Santa Barbara, California [4980]
  6. NCEAS

向作者/读者索取更多资源

This study reanalyzed a dataset of 225 species from six different regions to explore patterns in predictive performance across different methods. It found that the way models are fitted matters, with some emerging methods outperforming traditional regression algorithms.
Species distribution modeling (SDM) is widely used in ecology and conservation. Currently, the most available data for SDM are species presence-only records (available through digital databases). There have been many studies comparing the performance of alternative algorithms for modeling presence-only data. Among these, a 2006 paper from Elith and colleagues has been particularly influential in the field, partly because they used several novel methods (at the time) on a global data set that included independent presence-absence records for model evaluation. Since its publication, some of the algorithms have been further developed and new ones have emerged. In this paper, we explore patterns in predictive performance across methods, by reanalyzing the same data set (225 species from six different regions) using updated modeling knowledge and practices. We apply well-established methods such as generalized additive models and MaxEnt, alongside others that have received attention more recently, including regularized regressions, point-process weighted regressions, random forests, XGBoost, support vector machines, and the ensemble modeling framework biomod. All the methods we use include background samples (a sample of environments in the landscape) for model fitting. We explore impacts of using weights on the presence and background points in model fitting. We introduce new ways of evaluating models fitted to these data, using the area under the precision-recall gain curve, and focusing on the rank of results. We find that the way models are fitted matters. The top method was an ensemble of tuned individual models. In contrast, ensembles built using the biomod framework with default parameters performed no better than single moderate performing models. Similarly, the second top performing method was a random forest parameterized to deal with many background samples (contrasted to relatively few presence records), which substantially outperformed other random forest implementations. We find that, in general, nonparametric techniques with the capability of controlling for model complexity outperformed traditional regression methods, with MaxEnt and boosted regression trees still among the top performing models. All the data and code with working examples are provided to make this study fully reproducible.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据