Journal
NEUROCOMPUTING
Volume 251, Issue -, Pages 26-34Publisher
ELSEVIER
DOI: 10.1016/j.neucom.2017.04.018
Keywords
Instance selection; Nearest neighbor; Regression; Data reduction; Significant difference
Categories
Funding
- National Natural Science Foundation of China [61432011, 61603230, U1435212]
- National Key Basic Research and Development Program of China (973) [2013CB329404]
Ask authors/readers for more resources
The k-Nearest Neighbor algorithm(kNN) is an algorithm that is very simple to understand for classification or regression. It is also a lazy algorithm that does not use the training data points to do any generalization, in other words, it keeps all the training data during the testing phase. Thus, the population size becomes a major concern for kNN, since large population size may result in slow execution speed and large memory requirements. To solve this problem, many efforts have been devoted, but mainly focused on kNN classification. And now we propose an algorithm to decrease the size of the training set for kNN regression(DISKR). In this algorithm, we firstly remove the outlier instances that impact the performance of regressor, and then sorts the left instances by the difference on output among instances and their nearest neighbors. Finally, the left instances with little contribution measured by the training error are successively deleted following the rule. The proposed algorithm is compared with five state-of-the-art algorithms on 19 datasets, and experiment results show it could get the similar prediction ability but have the lowest instance storage ratio. (C) 2017 Elsevier B.V. All rights reserved.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available