4.4 Article

Effects of numbers of observations and predictors for various model types on the performance of forest inventory with airborne laser scanning

Journal

CANADIAN JOURNAL OF FOREST RESEARCH
Volume 52, Issue 3, Pages 385-395

Publisher

CANADIAN SCIENCE PUBLISHING
DOI: 10.1139/cjfr-2021-0192

Keywords

LiDAR; machine learning; remote sensing; area-based approach; sampling size

Categories

Funding

  1. Forest Research Centre
  2. FundacAo para a Ciencia e a Tecnologia I.P. (FCT) , Portugal [UIDB/00239/2020]
  3. FundacAo para Ciencia e Tecnologia I.P. (FCT) [PD/BD/128489/2017]
  4. Academy of Finland through the project Unmanned Aerial Vehicles in Forest Remote Sensing under the UNITE flagship ecosystem [323484, 337655]
  5. Fundação para a Ciência e a Tecnologia [PD/BD/128489/2017] Funding Source: FCT

Ask authors/readers for more resources

This study examines the limits of predictor and training plot numbers for accurate prediction without overfitting in various models used in area-based approach. The findings suggest that some models tend to overfit when the number of predictors approaches the number of training plots. However, for most models, using larger datasets results in more accurate predictions.
Y Semi- and nonparametric models are popular in the area-based approach (ABA) using airborne laser scanning. It is unclear, however, how many predictors and training plots are needed to provide accurate predictions without overfitting. This work aims to explore these limits for various approaches: ordinary least squares regression (OLS), generalized additive models (GAM), least absolute shrinkage and selection operator (LASSO), random forest (RF), support vector machine (SVM), and Gaussian process regression (GPR). We modeled timber volume (m(3).ha(-1)) for four boreal sites using ABA with 2-39 predictors and 20-500 training plots. OLS, GAM, LASSO, and SVM overfitted as the number of predictors approached the number of training plots. They required >= 15 plots per predictor to provide accurate predictions (RMSE <= 30%). GAM required >= 250 plots regardless of the number of predictors. The number of predictors only mildly affected RF and GPR, but they required >= 200 and >= 250 training plots, respectively. RF did not overfit in any circumstances, whereas GPR overfit even with 500 training plots. Overall, using up to 39 predictors did not generally result in overfit, and for most model types, it resulted in better accuracy for sufficiently large datasets (>= 250 plots).

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.4
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available