4.5 Article

Comparison of linear regression, k-nearest neighbour and random forest methods in airborne laser-scanning-based prediction of growing stock

Journal

FORESTRY
Volume 94, Issue 2, Pages 311-323

Publisher

OXFORD UNIV PRESS
DOI: 10.1093/forestry/cpaa034

Keywords

-

Categories

Funding

  1. Forest Research Centre
  2. Fundacao para a Ciencia e a Tecnologia I.P. (FCT), Portugal [UIDB/00239/2020]
  3. FCT [PD/BD/128489/2017]
  4. Fundação para a Ciência e a Tecnologia [PD/BD/128489/2017] Funding Source: FCT

Ask authors/readers for more resources

In this study, the performances of OLS, kNN, and RF in forest yield modeling were compared, revealing that OLS and RF had similar and higher accuracies compared to kNN. Variable selection did not significantly impact RF performance, while heuristic and exhaustive selection methods had similar effects on OLS. Caution is advised when building kNN models for volume prediction, with a preference for OLS with variable selection or RF with all variables included.
In this study, for five sites around the world, we Look at the effects of different model types and variable selection approaches on forest yield modelling performances in an area-based approach (ABA). We compared ordinary Least squares regression (OLS), k-nearest neighbours (kNN) and random forest (RF). Our objective was to test if there are systematic differences in accuracy between OLS, kNN and RF in ABA predictions of growing stock volume. The analyses are based on a 5-fold cross-validation at five study sites: an eucalyptus plantation, a temperate forest and three different boreal forests. Two completely independent validation datasets were also available for two of the boreal sites. For the kNN, we evaluated multiple measures of distance including Euclidean, Mahalanobis, most similar neighbour (MSN) and an RF-based distance metric. The variable selection approaches we examined included a heuristic approach (for OLS, kNN and RF), exhaustive search among all combinations (OLS only) and all variables together (RF only). Performances varied by model type and variable selection approaches among sites. OLS and RF had similar accuracies and were more efficient than any of the kNN variants. Variable selection did not affect RF performance. Heuristic and exhaustive variable selection performed similarly for OLS. kNN fared the poorest amongst model types, and kNN with RF distance was prone to overfitting when compared with a validation dataset. Additional caution is therefore required when building kNN models for volume prediction though ABA, being preferable instead to opt for models based on OLS with some variable selection, or RF with all variables together.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available