4.5 Article

Resemblance and mergence based indexing for high performance data deduplication

Journal

JOURNAL OF SYSTEMS AND SOFTWARE
Volume 128, Issue -, Pages 11-24

Publisher

ELSEVIER SCIENCE INC
DOI: 10.1016/j.jss.2017.02.039

Keywords

Fast index; Deduplication; Resemblance mergence; Fingerprint retrieval; Key value index

Funding

  1. Natural Science Foundation of China (NSFC) [61502189, 61232004]
  2. U.S. National Science Foundation (NSF) [CCF-1547804, CNS-1218960, CNS-1320349]
  3. Division of Computing and Communication Foundations
  4. Direct For Computer & Info Scie & Enginr [1547804] Funding Source: National Science Foundation

Ask authors/readers for more resources

Data deduplication, a data redundancy elimination technique, has been widely employed in many application environments to reduce data storage space. However, it is challenging to provide a fast and scalable key-value fingerprint index particularly for large datasets, while the index performance is critical to the overall deduplication performance. This paper proposes RMD, a resemblance and mergence based deduplication scheme, which aims to provide quick responses to fingerprint queries. The key idea of RMD is to leverage a bloom filter array and a data resemblance algorithm to dramatically reduce the query range. At data ingesting time, RMD uses a resemblance algorithm to detect resemble data segments and put resemblance segments in the same bin. As a result, at querying time, it only needs to search in the corresponding bin to detect duplicate content, which significantly speeds up the query process. Moreover, RMD uses a mergence strategy to accumulate resemblance segments to relevant bins, and exploits frequency-based fingerprint retention policy to cap the bin capacity to improve query throughput and data deduplication ratio. Extensive experimental results with real-world datasets have shown that RMD is able to achieve high query performance and outperforms several well-known deduplication schemes. (C) 2017 Elsevier Inc. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available