4.7 Article

Effects of simulated observation errors on the performance of species distribution models

Journal

DIVERSITY AND DISTRIBUTIONS
Volume 25, Issue 3, Pages 400-413

Publisher

WILEY
DOI: 10.1111/ddi.12868

Keywords

artificial data; AUC; ecological niche models; evaluation metric; habitat suitability models; Kappa; model fit; predictive accuracy; TSS; uncertainty

Funding

  1. SNF project SESAM'ALP-Challenges in simulating alpine species assemblages under global change [31003A-1528661]

Ask authors/readers for more resources

Aim Species distribution information is essential under increasing global changes, and models can be used to acquire such information but they can be affected by different errors/bias. Here, we evaluated the degree to which errors in species data (false presences-absences) affect model predictions and how this is reflected in commonly used evaluation metrics. Location Western Swiss Alps. Methods Using 100 virtual species and different sampling methods, we created observation datasets of different sizes (100-400-1,600) and added increasing levels of errors (creating false positives or negatives; from 0% to 50%). These degraded datasets were used to fit models using generalized linear model, random forest and boosted regression trees. Model fit (ability to reproduce calibration data) and predictive success (ability to predict the true distribution) were measured on probabilistic/binary outcomes using Kappa, TSS, MaxKappa, MaxTSS and Somers'D (rescaled AUC). Results The interpretation of models' performance depended on the data and metrics used to evaluate them, with conclusions differing whether model fit, or predictive success were measured. Added errors reduced model performance, with effects expectedly decreasing as sample size increased. Model performance was more affected by false positives than by false negatives. Models with different techniques were differently affected by errors: models with high fit presenting lower predictive success (RFs), and vice versa (GLMs). High evaluation metrics could still be obtained with 30% error added, indicating that some metrics (Somers'D) might not be sensitive enough to detect data degradation. Main conclusions Our findings highlight the need to reconsider the interpretation scale of some commonly used evaluation metrics: Kappa seems more realistic than Somers'D/AUC or TSS. High fits were obtained with high levels of error added, showing that RF overfits the data. When collecting occurrence databases, it is advisory to reduce the rate of false positives (or increase sample sizes) rather than false negatives.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available