4.6 Article

A Weighted k-Nearest Neighbours Ensemble With Added Accuracy and Diversity

期刊

IEEE ACCESS
卷 10, 期 -, 页码 125920-125929

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/ACCESS.2022.3225682

关键词

Classification; feature weighting; k-nearest neighbor ensemble; support vectors

向作者/读者索取更多资源

The paper proposes a $k$ NN ensemble method that identifies nearest observations based on weighted distances using support vectors, showing better classification performance on datasets with noisy features. Through majority voting, the estimated class of a test observation is decided, outperforming other methods in most cases.
Ensembles based on $k$ NN models are considered effective in reducing the adverse effect of outliers, primarily, by identifying the closest observations to a test point in a given training data. Class label of the test point is estimated by taking a majority-vote of the nearest observations' class labels. While identifying the closest observations, certain training patterns might possess high regulatory power than the others. Therefore, assigning weights to observations and then calculating weighted distances are deemed important in addressing this scenario. This paper proposes a $k$ NN ensemble that identifies nearest observations based on their weighted distance in relation to the response variable via support vectors. This is done by building a large number of $k$ NN models each on a bootstrap sample from the training data along with a randomly selected subset of features from the given feature space. The estimated class of the test observation is decided via majority voting based on the estimates given by all the base $k$ NN models. The ensemble is assessed on 14 benchmark and simulated datasets against other classical methods, including $k$ NN based models using Brier score, classification accuracy and Kappa as performance measures. On both the benchmark and simulated datasets, the proposed ensemble outperformed the other competing methods in majority of the cases. It gave better overall classification performance than the other methods on 8 datasets. The analyses on simulated datasets reveal that the proposed method is effective in classification problems that involve noisy features in the data. Furthermore, feature weighting and randomization also make the method robust to the choice of $k$ , i.e., the number of nearest observations in a base model.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据