☆ 4.6 Article

Prototype generation in the string space via approximate median for data reduction in nearest neighbor classification

SOFT COMPUTING (2021)

期刊

SOFT COMPUTING

卷 25, 期 24, 页码 15403-15415

出版社

SPRINGER

DOI: 10.1007/s00500-021-06178-2

关键词

String Space; Data Reduction; k-Nearest Neighbor; Prototype Generation

类别

Computer Science, Artificial Intelligence Computer Science, Interdisciplinary Applications

资金

Programa I+D+i de la Generalitat Valenciana [ACIF/2019/042, APOSTD/2020/256]
Spanish Ministrythrough HISPAMUS project [TIN2017-86576-R]
EU
University of Alicante [GRE19-04]
CRUE-CSIC agreement with Springer Nature

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

The k-nearest neighbor rule is well-known for its high performance and versatility, but it also suffers from low efficiency due to the exhaustive search required for each new query. To address this issue, data reduction techniques have been used, but their application to complex structural data has been limited. A new adaptation of the reduction through homogeneous clusters algorithm for string data shows significant improvements in classification performance and reduction rates compared to the set-median version.

The k-nearest neighbor (kNN) rule is one of the best-known distance-based classifiers, and is usually associated with high performance and versatility as it requires only the definition of a dissimilarity measure. Nevertheless, kNN is also coupled with low-efficiency levels since, for each new query, the algorithm must carry out an exhaustive search of the training data, and this drawback is much more relevant when considering complex structural representations, such as graphs, trees or strings, owing to the cost of the dissimilarity metrics. This issue has generally been tackled through the use of data reduction (DR) techniques, which reduce the size of the reference set, but the complexity of structural data has historically limited their application in the aforementioned scenarios. A DR algorithm denominated as reduction through homogeneous clusters (RHC) has recently been adapted to string representations but as obtaining the exact median value of a set of string data is known to be computationally difficult, its authors resorted to computing the set-median value. Under the premise that a more exact median value may be beneficial in this context, we, therefore, present a new adaptation of the RHC algorithm for string data, in which an approximate median computation is carried out. The results obtained show significant improvements when compared to those of the set-median version of the algorithm, in terms of both classification performance and reduction rates.

Prototype generation in the string space via approximate median for data reduction in nearest neighbor classification

期刊

SOFT COMPUTING

出版社

SPRINGER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Prototype generation in the string space via approximate median for data reduction in nearest neighbor classification

期刊

SOFT COMPUTING

出版社

SPRINGER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文