4.5 Article

SOKNL: A novel way of integrating K-nearest neighbours with adaptive random forest regression for data streams

期刊

DATA MINING AND KNOWLEDGE DISCOVERY
卷 36, 期 5, 页码 2006-2032

出版社

SPRINGER
DOI: 10.1007/s10618-022-00858-9

关键词

Data streams; Regression; KNN; ARF-Reg

资金

  1. CAUL

向作者/读者索取更多资源

Most research in machine learning for data streams has focused on classification algorithms, with less attention on regression methods. This paper presents a novel forest-based algorithm, called Self-Optimising K-Nearest Leaves (SOKNL), for streaming regression problems. The algorithm extends the Adaptive Random Forest Regression, by generating representative data points (centroids) in each leaf and utilizing only the k trees with the smallest distances for prediction. The algorithm also introduces a mechanism for tuning the k values based on historical information. Experimental results show that the proposed algorithm outperforms standard stream regression methods.
Most research in machine learning for data streams has focused on classification algorithms, whereas regression methods have received a lot less attention. This paper proposes Self-Optimising K-Nearest Leaves (SOKNL), a novel forest-based algorithm for streaming regression problems. Specifically, the Adaptive Random Forest Regression, a state-of-the-art online regression algorithm is extended like this: in each leaf, a representative data point - also called centroid - is generated by compressing the information from all instances in that leaf. During the prediction step, instead of letting all trees in the forest participate, the distances between the input instance and all centroids from relevant leaves are calculated, only k trees that possess the smallest distances are utilised for the prediction. Furthermore, we simplify the algorithm by introducing a mechanism for tuning the k values, which is dynamically and automatically optimised based on historical information. This new algorithm produces promising predictive results and achieves a superior ranking according to statistical testing when compared with several standard stream regression methods over typical benchmark datasets. This improvement incurs only a small increase in runtime and memory consumption over the basic Adaptive Random Forest Regressor.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据