4.6 Article

Predicting into unknown space? Estimating the area of applicability of spatial prediction models

Journal

METHODS IN ECOLOGY AND EVOLUTION
Volume 12, Issue 9, Pages 1620-1633

Publisher

WILEY
DOI: 10.1111/2041-210X.13650

Keywords

machine learning; model transferability; predictive modelling; Random Forest; remote sensing; spatial mapping; uncertainty

Categories

Funding

  1. Projekt DEAL

Ask authors/readers for more resources

Machine learning algorithms are popular for spatial mapping due to their ability to fit complex relationships, but their use is limited to data similar to the training set. The study proposes a method to assess the area where a prediction model can be reliably applied, using a Dissimilarity Index (DI) to define the Area of Applicability (AOA) and map estimated performance. Simulation studies show comparable prediction errors within the AOA to cross-validation errors, emphasizing the importance of considering the relationship between DI and cross-validation performance.
Machine learning algorithms have become very popular for spatial mapping of the environment due to their ability to fit nonlinear and complex relationships. However, this ability comes with the disadvantage that they can only be applied to new data if these are similar to the training data. Since spatial mapping requires predictions to new geographic space which in many cases goes along with new predictor properties, a method to assess the area to which a prediction model can be reliably applied is required. Here, we suggest a methodology that delineates the 'area of applicability' (AOA) that we define as the area where we enabled the model to learn about relationships based on the training data, and where the estimated cross-validation performance holds. We first propose a 'dissimilarity index' (DI) that is based on the minimum distance to the training data in the multidimensional predictor space, with predictors being weighted by their respective importance in the model. The AOA is then derived by applying a threshold which is the (outlier-removed) maximum DI of the training data derived via cross-validation. We further use the relationship between the DI and the cross-validation performance to map the estimated performance of predictions. We illustrate the approach in a simulated case study chosen to mimic ecological realities and test the credibility by using a large set of simulated data. The simulation studies showed that the prediction error within the AOA is comparable to the cross-validation error of the trained model, while the cross-validation error does not apply outside the AOA. This applies to models being trained with randomly distributed training data, as well as when training data are clustered in space and where spatial cross-validation is applied. Using the relationship between DI and cross-validation performance showed potential to limit predictions to the area where a user-defined performance applies. We suggest to add the AOA computation to the modeller's standard toolkit and to present predictions for the AOA only. We further suggest to report a map of DI-dependent performance estimates alongside prediction maps and complementary to (cross-)validation performance measures and the common uncertainty estimates.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available