4.7 Article

Combination of Feature Selection and CatBoost for Prediction: The First Application to the Estimation of Aboveground Biomass

期刊

FORESTS
卷 12, 期 2, 页码 -

出版社

MDPI
DOI: 10.3390/f12020216

关键词

feature selection; machine learning algorithms; ensemble learning; CatBoost; XGBoost; forest type

类别

资金

  1. National Natural Science Foundation of China [31870620]
  2. National Technology Extension Fund of Forestry, Forest Vegetation Carbon Storage Monitoring Technology Based on Watershed Algorithm [[2019]06]

向作者/读者索取更多资源

Increasing the number of explanatory variables in quantitative remote sensing of forest aboveground biomass may lead to information redundancy and dimensional disaster. Feature selection, particularly using the CatBoost algorithm, improves the accuracy of AGB estimates. Different combinations of feature selection methods and machine learning algorithms can significantly impact the performance of AGB estimation models.
Increasing numbers of explanatory variables tend to result in information redundancy and dimensional disaster in the quantitative remote sensing of forest aboveground biomass (AGB). Feature selection of model factors is an effective method for improving the accuracy of AGB estimates. Machine learning algorithms are also widely used in AGB estimation, although little research has addressed the use of the categorical boosting algorithm (CatBoost) for AGB estimation. Both feature selection and regression for AGB estimation models are typically performed with the same machine learning algorithm, but there is no evidence to suggest that this is the best method. Therefore, the present study focuses on evaluating the performance of the CatBoost algorithm for AGB estimation and comparing the performance of different combinations of feature selection methods and machine learning algorithms. AGB estimation models of four forest types were developed based on Landsat OLI data using three feature selection methods (recursive feature elimination (RFE), variable selection using random forests (VSURF), and least absolute shrinkage and selection operator (LASSO)) and three machine learning algorithms (random forest regression (RFR), extreme gradient boosting (XGBoost), and categorical boosting (CatBoost)). Feature selection had a significant influence on AGB estimation. RFE preserved the most informative features for AGB estimation and was superior to VSURF and LASSO. In addition, CatBoost improved the accuracy of the AGB estimation models compared with RFR and XGBoost. AGB estimation models using RFE for feature selection and CatBoost as the regression algorithm achieved the highest accuracy, with root mean square errors (RMSEs) of 26.54 Mg/ha for coniferous forest, 24.67 Mg/ha for broad-leaved forest, 22.62 Mg/ha for mixed forests, and 25.77 Mg/ha for all forests. The combination of RFE and CatBoost had better performance than the VSURF-RFR combination in which random forests were used for both feature selection and regression, indicating that feature selection and regression performed by a single machine learning algorithm may not always ensure optimal AGB estimation. It is promising to extending the application of new machine learning algorithms and feature selection methods to improve the accuracy of AGB estimates.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据