3.8 Proceedings Paper

Sketch 'Em All: Fast Approximate Similarity Search for Dynamic Data Streams

Publisher

ASSOC COMPUTING MACHINERY
DOI: 10.1145/3159652.3159694

Keywords

locality sensitive hashing; dynamic data streams; Jaccard index

Funding

  1. Deutsche Forschungsgemeinschaft within the Collaborative Research Center SFB 876
  2. Google Focused Award on Web Algorithmics for Large-scale Data Analysis
  3. Microsoft Azure for Research Award (Online Event Detection and Evolution Tracking in Activity Networks)

Ask authors/readers for more resources

Recommender systems are an integral part of many web applications. With increasingly larger user bases, scalability has become an important issue. Many of the most scalable algorithms with respect to both space and running times are based on locality-sensitive hashing (LSH). However, a significant drawback is that these methods are only able to handle insertions to user profiles and tend to perform poorly when items may be removed. We initiate the study of scalable locality-sensitive hashing for dynamic input. Specifically, using the Jaccard index as similarity measure, we design (1) a sketching algorithm for similarity estimation via a black box reduction to l(0) norm estimation and (2) a locality sensitive hashing scheme maintainable in fully dynamic data streams that quickly filters out low-similarity pairs. Our algorithms have little to no overhead in terms of running time compared to previous LSH approaches for the insertion only case, and drastically outperform previous algorithms in case of deletions.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

3.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available