4.7 Article

Fast Proactive Repair in Erasure-Coded Storage: Analysis, Design, and Implementation

期刊

出版社

IEEE COMPUTER SOC
DOI: 10.1109/TPDS.2022.3152817

关键词

Erasure coding; repair; distributed storage

资金

  1. National Key R&D Program of China [2021YFF0704001]
  2. Natural Science Foundation of China [62072381]
  3. CCF-Huawei Innovation Research Plan [CCF2021-admin-270-202102]
  4. Xiamen Youth Innovation Fund [3502Z20206052]

向作者/读者索取更多资源

Erasure coding provides a storage-efficient redundancy mechanism for large-scale storage clusters, but it incurs high performance overhead in failure repair. Recent developments in accurate disk failure prediction allow repairing soon-to-fail nodes in advance, opening new opportunities for accelerating failure repair in erasure-coded storage. FastPR is a fast proactive repair solution that fully parallelizes the repair operation by coupling migration and reconstruction methods. FastPR significantly reduces repair time for both Reed-Solomon codes and Azure's Local Reconstruction Codes.
Erasure coding offers a storage-efficient redundancy mechanism for maintaining data availability guarantees in large-scale storage clusters, yet it also incurs high performance overhead in failure repair. Recent developments in accurate disk failure prediction allow soon-to-fail (STF) nodes to be repaired in advance, thereby opening new opportunities for accelerating failure repair in erasure-coded storage. To this end, we present a fast proactive repair solution called FastPR, which carefully couples two repair methods, namely migration (i.e., relocating the chunks of an STF node) and reconstruction (i.e., decoding the chunks of an STF node through erasure coding), so as to fully parallelize the repair operation across the storage cluster. FastPR solves a bipartite maximum matching problem and schedules both migration and reconstruction in a parallel fashion. We show that FastPR significantly reduces the repair time over the baseline repair approaches for both Reed-Solomon codes and Azure's Local Reconstruction Codes via mathematical analysis, large-scale simulation, and Amazon EC2 experiments.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据