4.7 Article

A novel extreme learning machine based kNN classification method for dealing with big data

期刊

EXPERT SYSTEMS WITH APPLICATIONS
卷 183, 期 -, 页码 -

出版社

PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.eswa.2021.115293

关键词

Big data; kNN; ELM; Label matrix; Correctness matrix; Tree

向作者/读者索取更多资源

A new fast and robust kNN finding framework is introduced in this paper to deal with big datasets. The training data samples are grouped based on mini-classifiers' outputs, and a tree structure is used for partition indexing, finding the corresponding group of relevant data samples to an input. Experimental results show better performance in most cases and comparable performance on other cases of big data problems.
kNN algorithm, as an effective data mining technique, is always attended for supervised classification. On the other hand, the previously proposed kNN finding methods cannot be considered as efficient methods for dealing with big data. As there is daily generated and expanded big datasets on different online and offline servers, the efficient methods for such data must be introduced to find kNN. Moreover, massive amounts of data contain more noise and imperfection data samples that significantly increase the need for a robust kNN finding method. In this paper, a new fast and robust kNN finding framework is introduced to deal with the big datasets. In this method, a group of most relevant data samples to an input data sample are detected and the original kNN method is applied on them for finding the final nearest neighbors. The main goal of this method is dealing with the big datasets in an accurate, fast, and robust manner. Here, the training data samples of each label are grouped into some partitions based on the output of some mini-classifiers (i.e. ELM classifier). In fact, the behavior of the miniclassifiers is the basis of partitioning the training data samples. These mini-classifiers are trained using nonoverlapping subsets of the training set in the form of each mini-classifier a subset. Here, an index is calculated for each partition to make the corresponding partition finding faster using a tree structure in which each partition index is fallen into a leaf. Then, outputs of the mini-classifiers for an input test sample are used to find the corresponding group of most relevant data samples to the input data sample on the tree. Experimental results indicate that the proposed method has better performance in most cases and comparable performance on other cases of original and noisy big data problems.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据