☆ 4.5 Article

Using Proximity Graph Cut for Fast and Robust Instance-Based Classification in Large Datasets

COMPLEXITY (2021)

Journal

COMPLEXITY

Volume 2021, Issue -, Pages -

Publisher

WILEY-HINDAWI

DOI: 10.1155/2021/2011738

Keywords

Funding

Analytical Center for the Government of the Russian Federation [70-2021-00143, IGK 000000D730321P5Q0002]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This study introduces a novel classification algorithm with index data structures to build fast and scalable solutions for large multidimensional datasets, showing significant speed and accuracy improvements compared to traditional methods.

K-nearest neighbours (kNN) is a very popular instance-based classifier due to its simplicity and good empirical performance. However, large-scale datasets are a big problem for building fast and compact neighbourhood-based classifiers. This work presents the design and implementation of a classification algorithm with index data structures, which would allow us to build fast and scalable solutions for large multidimensional datasets. We propose a novel approach that uses navigable small-world (NSW) proximity graph representation of large-scale datasets. Our approach shows 2-4 times classification speedup for both average and 99th percentile time with asymptotically close classification accuracy compared to the 1-NN method. We observe two orders of magnitude better classification time in cases when method uses swap memory. We show that NSW graph used in our method outperforms other proximity graphs in classification accuracy. Our results suggest that the algorithm can be used in large-scale applications for fast and robust classification, especially when the search index is already constructed for the data.

Using Proximity Graph Cut for Fast and Robust Instance-Based Classification in Large Datasets

Journal

COMPLEXITY

Publisher

WILEY-HINDAWI

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Using Proximity Graph Cut for Fast and Robust Instance-Based Classification in Large Datasets

Journal

COMPLEXITY

Publisher

WILEY-HINDAWI

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper