4.6 Article

Prototype generation in the string space via approximate median for data reduction in nearest neighbor classification

期刊

SOFT COMPUTING
卷 25, 期 24, 页码 15403-15415

出版社

SPRINGER
DOI: 10.1007/s00500-021-06178-2

关键词

String Space; Data Reduction; k-Nearest Neighbor; Prototype Generation

资金

  1. Programa I+D+i de la Generalitat Valenciana [ACIF/2019/042, APOSTD/2020/256]
  2. Spanish Ministrythrough HISPAMUS project [TIN2017-86576-R]
  3. EU
  4. University of Alicante [GRE19-04]
  5. CRUE-CSIC agreement with Springer Nature

向作者/读者索取更多资源

The k-nearest neighbor rule is well-known for its high performance and versatility, but it also suffers from low efficiency due to the exhaustive search required for each new query. To address this issue, data reduction techniques have been used, but their application to complex structural data has been limited. A new adaptation of the reduction through homogeneous clusters algorithm for string data shows significant improvements in classification performance and reduction rates compared to the set-median version.
The k-nearest neighbor (kNN) rule is one of the best-known distance-based classifiers, and is usually associated with high performance and versatility as it requires only the definition of a dissimilarity measure. Nevertheless, kNN is also coupled with low-efficiency levels since, for each new query, the algorithm must carry out an exhaustive search of the training data, and this drawback is much more relevant when considering complex structural representations, such as graphs, trees or strings, owing to the cost of the dissimilarity metrics. This issue has generally been tackled through the use of data reduction (DR) techniques, which reduce the size of the reference set, but the complexity of structural data has historically limited their application in the aforementioned scenarios. A DR algorithm denominated as reduction through homogeneous clusters (RHC) has recently been adapted to string representations but as obtaining the exact median value of a set of string data is known to be computationally difficult, its authors resorted to computing the set-median value. Under the premise that a more exact median value may be beneficial in this context, we, therefore, present a new adaptation of the RHC algorithm for string data, in which an approximate median computation is carried out. The results obtained show significant improvements when compared to those of the set-median version of the algorithm, in terms of both classification performance and reduction rates.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据