4.6 Article

A Weighted k-Nearest Neighbours Ensemble With Added Accuracy and Diversity

Journal

IEEE ACCESS
Volume 10, Issue -, Pages 125920-125929

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/ACCESS.2022.3225682

Keywords

Classification; feature weighting; k-nearest neighbor ensemble; support vectors

Ask authors/readers for more resources

The paper proposes a $k$ NN ensemble method that identifies nearest observations based on weighted distances using support vectors, showing better classification performance on datasets with noisy features. Through majority voting, the estimated class of a test observation is decided, outperforming other methods in most cases.
Ensembles based on $k$ NN models are considered effective in reducing the adverse effect of outliers, primarily, by identifying the closest observations to a test point in a given training data. Class label of the test point is estimated by taking a majority-vote of the nearest observations' class labels. While identifying the closest observations, certain training patterns might possess high regulatory power than the others. Therefore, assigning weights to observations and then calculating weighted distances are deemed important in addressing this scenario. This paper proposes a $k$ NN ensemble that identifies nearest observations based on their weighted distance in relation to the response variable via support vectors. This is done by building a large number of $k$ NN models each on a bootstrap sample from the training data along with a randomly selected subset of features from the given feature space. The estimated class of the test observation is decided via majority voting based on the estimates given by all the base $k$ NN models. The ensemble is assessed on 14 benchmark and simulated datasets against other classical methods, including $k$ NN based models using Brier score, classification accuracy and Kappa as performance measures. On both the benchmark and simulated datasets, the proposed ensemble outperformed the other competing methods in majority of the cases. It gave better overall classification performance than the other methods on 8 datasets. The analyses on simulated datasets reveal that the proposed method is effective in classification problems that involve noisy features in the data. Furthermore, feature weighting and randomization also make the method robust to the choice of $k$ , i.e., the number of nearest observations in a base model.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available