4.7 Article

Billion-Scale Similarity Search with GPUs

Journal

IEEE TRANSACTIONS ON BIG DATA
Volume 7, Issue 3, Pages 535-547

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TBDATA.2019.2921572

Keywords

Similarity search; multimedia databases; indexing methods; graphical processing units

Ask authors/readers for more resources

This paper addresses the issue of better utilizing GPUs for similarity search tasks, proposing a novel design for k-selection which outperforms existing approaches by large margins. The implementation achieves up to 55 percent of theoretical peak performance and enables significantly faster nearest neighbor implementations on GPU. It also allows for the construction of high accuracy k-NN graphs on large datasets in a fraction of the time compared to prior methods.
Similarity search finds application in database systems handling complex data such as images or videos, which are typically represented by high-dimensional features and require specific indexing structures. This paper tackles the problem of better utilizing GPUs for this task. While GPUs excel at data parallel tasks such as distance computation, prior approaches in this domain are bottlenecked by algorithms that expose less parallelism, such as k-min selection, or make poor use of the memory hierarchy. We propose a novel design for k-selection. We apply it in different similarity search scenarios, by optimizing brute-force, approximate and compressed-domain search based on product quantization. In all these setups, we outperform the state of the art by large margins. Our implementation operates at up to 55 percent of theoretical peak performance, enabling a nearest neighbor implementation that is 8.5 x faster than prior GPU state of the art. It enables the construction of a high accuracy k-NN graph on 95 million images from the Y(FCC)100M dataset in 35 minutes, and of a graph connecting 1 billion vectors in less than 12 hours on 4 Maxwell Titan X GPUs. We have open-sourced our approach for the sake of comparison and reproducibility.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available