4.7 Article

Improving the Performance of Deduplication-Based Storage Cache via Content-Driven Cache Management Methods

期刊

出版社

IEEE COMPUTER SOC
DOI: 10.1109/TPDS.2020.3012704

关键词

Metadata; Cache storage; Redundancy; Performance evaluation; Distributed databases; Degradation; Indexes; Data deduplication; storage cache; content sharing

资金

  1. Open Project Program of Wuhan National Laboratory for Optoelectronics [2019WNLOKF009]
  2. Fundamental Research Funds for the Central Universities [2019CDJGFJSJ001]
  3. National Natural Science Foundation of China [61402061, 61672116, 61802038]
  4. Chongqing High-Tech Research Program [cstc2016jcyjA0274, cstc2016jcyjA0332]
  5. China Postdoctoral Science Foundation [2017M620412]
  6. Chongqing Postdoctoral Special Science Foundation [XmT2018003]

向作者/读者索取更多资源

Data deduplication technology, while effective in reducing data size in storage systems, faces challenges in cache performance due to larger block sizes. Our proposed CDAC caches explore content redundancy and sharing intensity, outperforming existing algorithms by up to 23.83X in read cache hit ratio and up to 53.3% in IOPS in real-world workloads.
Data deduplication, as a proven technology for effective data reduction in backup and archiving storage systems, is also showing promises in increasing the logical space capacity for storage caches by removing redundant data. However, our in-depth evaluation of the existing deduplication-aware caching algorithms reveals that they only work well when the cached block size is set to 4 KB. Unfortunately, modern storage systems often set the block size to be much larger than 4 KB, and in this scenario, the overall performance of these caching schemes drops below that of the conventional replacement algorithms without any deduplication. There are several reasons for this performance degradation. The first reason is the deduplication overhead, which is the time spent on generating the data fingerprints and their use to identify duplicate data. Such overhead offsets the benefits of deduplication. The second reason is the extremely low cache space utilization caused by read and write alignment. The third reason is that existing algorithms only exploit access locality to identify block replacement. There is a lost opportunity to effectively leverage the content usage patterns such as intensity of content redundancy and sharing in deduplication-based storage caches to further improve performance. We propose CDAC, a Content-driven Deduplication-Aware Cache, to address this problem. CDAC focuses on exploiting the content redundancy in blocks and intensity of content sharing among source addresses in cache management strategies. We have implemented CDAC based on LRU and ARC algorithms, called CDAC-LRU and CDAC-ARC respectively. Our extensive experimental results show that CDAC-LRU and CDAC-ARC outperform the state-of-the-art deduplication-aware caching algorithms, D-LRU, and D-ARC, by up to 23.83X in read cache hit ratio, with an average of 3.23X, and up to 53.3 percent in IOPS, with an average of 49.8 percent, under a real-world mixed workload when the cache size ranges from 20 to 50 percent of the workload size and the block size ranges from 4KB to 32 KB.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据