4.7 Article

Improving performance of spatio-temporal machine learning models using forward feature selection and target-oriented validation

Journal

ENVIRONMENTAL MODELLING & SOFTWARE
Volume 101, Issue -, Pages 1-9

Publisher

ELSEVIER SCI LTD
DOI: 10.1016/j.envsoft.2017.12.001

Keywords

Cross-validation; Feature selection; Over-fitting; Random forest; Spatio-temporal; Target-oriented validation

Funding

  1. Federal Ministry of Education and Research (BMBF) within the IDESSA project, SPACES-program (Science Partnership for the Assessment of Complex Earth System processes) [01LL1301]
  2. Ross Sea Region Terrestrial Data Analysis research program - Ministry of Business and Innovation, New Zealand [CO9X1413]

Ask authors/readers for more resources

Importance of target-oriented validation strategies for spatio-temporal prediction models is illustrated using two case studies: (1) modelling of air temperature (T-air) in Antarctica, and (2) modelling of volumetric water content (VW) for the R.J. Cook Agronomy Farm, USA. Performance of a random k-fold cross-validation (CV) was compared to three target-oriented strategies: Leave-Location-Out (LLO), Leave-Time-Out (LTO), and Leave-Location-and-Time-Out (LLTO) CV. Results indicate that considerable differences between random k-fold (R-2 = 0.9 for T-air and 0.92 for VW) and target-oriented CV (LLO R-2 = 0.24 for T-air and 0.49 for VW) exist, highlighting the need for target-oriented validation to avoid an overoptimistic view on models. Differences between random k-fold and target-oriented CV indicate spatial over-fitting caused by misleading variables. To decrease over-fitting, a forward feature selection in conjunction with target-oriented CV is proposed. It decreased over-fitting and simultaneously improved target-oriented performances (LLO CV R-2 = 0.47 for T-air and 0.55 for VW). (C) 2017 Elsevier Ltd. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available