☆ 4.7 Article

On the Distribution Modeling of Heavy-Tailed Disk Failure Lifetime in Big Data Centers

IEEE TRANSACTIONS ON RELIABILITY (2021)

Journal

IEEE TRANSACTIONS ON RELIABILITY

Volume 70, Issue 2, Pages 507-524

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TR.2020.3007127

Keywords

Estimation; Reliability; Measurement; Data models; Probability density function; Predictive models; Kernel; Data analytics; data storage; hard-disk systems; kernel density estimation (KDE); modeling

Funding

Scientific and Technological Research Council of Turkey (TUBITAK) [115C111, 119E235]
Spanish MINECO [TEC2017-88373-R]
Generalitat de Catalunya [2017SGR1195]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This article explores modeling disk failure trends in big data centers and proposes a method to calculate failure density through transformations and inverse transformations. The study suggests that, when dealing with heavy-tailed data, the complex Gaussian hypergeometric distribution and classical KDE approach can perform best in representing the data characteristics.

It has become commonplace to observe frequent multiple disk failures in big data centers in which thousands of drives operate simultaneously. Disks are typically protected by replication or erasure coding to guarantee a predetermined reliability. However, in order to optimize data protection, real life disk failure trends need to be modeled appropriately. The classical approach to modeling is to estimate the probability density function of failures using nonparametric estimation techniques such as kernel density estimation (KDE). However, these techniques are suboptimal in the absence of the true underlying density function. Moreover, insufficient data may lead to overfitting. In this article, we propose to use a set of transformations to the collected failure data for almost perfect regression in the transform domain. Then, by inverse transformation, we analytically estimated the failure density through the efficient computation of moment generating functions, and hence, the density functions. Moreover, we developed a visualization platform to extract useful statistical information such as model-based mean time to failure. Our results indicate that for other heavy-tailed data, the complex Gaussian hypergeometric distribution and classical KDE approach can perform best if the overfitting problem can be avoided and the complexity burden is overtaken. On the other hand, we show that the failure distribution exhibits less complex Argus-like distribution after performing the Box-Cox transformation up to appropriate scaling and shifting operations.

On the Distribution Modeling of Heavy-Tailed Disk Failure Lifetime in Big Data Centers

Journal

IEEE TRANSACTIONS ON RELIABILITY

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

On the Distribution Modeling of Heavy-Tailed Disk Failure Lifetime in Big Data Centers

Journal

IEEE TRANSACTIONS ON RELIABILITY

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper