4.6 Article

Efficient locality-sensitive hashing over high-dimensional streaming data

期刊

NEURAL COMPUTING & APPLICATIONS
卷 35, 期 5, 页码 3753-3766

出版社

SPRINGER LONDON LTD
DOI: 10.1007/s00521-020-05336-1

关键词

Approximate nearest neighbor search; Locality-sensitive hashing; LSM-tree; Streaming data

向作者/读者索取更多资源

This paper presents a novel disk-based LSH index that provides efficient support for both searches and updates. By utilizing write-friendly LSM trees to store LSH projections and developing a novel estimation scheme, the efficiency of search and the cost-effectiveness of disk storage and access are improved. Experimental results demonstrate that the proposed method outperforms state-of-the-art schemes on four real-world datasets.
Approximate nearest neighbor (ANN) search in high-dimensional spaces is fundamental in many applications. Locality-sensitive hashing (LSH) is a well-known methodology to solve the ANN problem. Existing LSH-based ANN solutions typically employ a large number of individual indexes optimized for searching efficiency. Updating such indexes might be impractical when processing high-dimensional streaming data. In this paper, we present a novel disk-based LSH index that offers efficient support for both searches and updates. The contributions of our work are threefold. First, we use the write-friendly LSM-trees to store the LSH projections to facilitate efficient updates. Second, we develop a novel estimation scheme to estimate the number of required LSH functions, with which the disk storage and access costs are effectively reduced. Third, we exploit both the collision number and the projection distance to improve the efficiency of candidate selection, improving the search performance with theoretical guarantees on the result quality. Experiments on four real-world datasets show that our proposal outperforms the state-of-the-art schemes.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据