☆ 4.2 Article

Evaluation of k-nearest neighbour classifier performance for heterogeneous data sets

SN APPLIED SCIENCES (2019)

期刊

SN APPLIED SCIENCES

卷 1, 期 12, 页码 -

出版社

SPRINGER INTERNATIONAL PUBLISHING AG

DOI: 10.1007/s42452-019-1356-9

关键词

k-nearest neighbour; Heterogeneous data set; Combination similarity measures

类别

Multidisciplinary Sciences

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Distance-based algorithms are widely used for data classification problems. The k-nearest neighbour classification (k-NN) is one of the most popular distance-based algorithms. This classification is based on measuring the distances between the test sample and the training samples to determine the final classification output. The traditional k-NN classifier works naturally with numerical data. The main objective of this paper is to investigate the performance of k-NN on heterogeneous datasets, where data can be described as a mixture of numerical and categorical features. For the sake of simplicity, this work considers only one type of categorical data, which is binary data. In this paper, several similarity measures have been defined based on a combination between well-known distances for both numerical and binary data, and to investigate k-NN performances for classifying such heterogeneous data sets. The experiments used six heterogeneous datasets from different domains and two categories of measures. Experimental results showed that the proposed measures performed better for heterogeneous data than Euclidean distance, and that the challenges raised by the nature of heterogeneous data need personalised similarity measures adapted to the data characteristics.

Evaluation of k-nearest neighbour classifier performance for heterogeneous data sets

期刊

SN APPLIED SCIENCES

出版社

SPRINGER INTERNATIONAL PUBLISHING AG

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Evaluation of k-nearest neighbour classifier performance for heterogeneous data sets

期刊

SN APPLIED SCIENCES

出版社

SPRINGER INTERNATIONAL PUBLISHING AG

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文