4.7 Article

Availability Analysis of Systems Deploying Sequences of Environmental-Diversity-Based Recovery Methods

期刊

IEEE TRANSACTIONS ON RELIABILITY
卷 70, 期 3, 页码 1126-1142

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TR.2020.3023032

关键词

Computer bugs; Fault tolerance; Fault tolerant systems; Rail to rail inputs; Servers; Software systems; Fault tolerance; imperfect coverage; Mandelbug; recovery methods; semi-Markov process (SMP)

资金

  1. National Natural Science Foundation of China [61772055, 61872169]
  2. Technical Foundation Project of Ministry of Industry and Information Technology of China [JSZL2016601B003]
  3. Equipment Preliminary R&D Project of China [41402020102]

向作者/读者索取更多资源

This article examines the threat of Mandelbug-caused software failures to system availability and proposes methods to improve system availability, such as developing an analytic model and a tool for calculating system availability. The research demonstrates that recovery methods based on environmental diversity can effectively enhance system availability.
Mandelbug-caused software failures are significant threats to system availability, especially in the context of mission-critical and safety-critical systems. However, there is still no systematic method for keeping the software free from Mandelbugs before release. To guarantee the availability of systems suffering from Mandelbugs, environmental-diversity-based fault tolerance techniques have been proposed to recover from the failures caused by them. In this article, we develop and study an analytic model to assess the availability of systems that utilize a sequence of environmental-diversity-based recovery methods. Improving over previous relevant studies, the availability formula we obtain in this article works for any number of recovery methods the system is equipped with; it is also independent on both the nature of those recovery methods and the order of their utilization. In addition, we consider the problem of how to arrange the set of available recovery methods to achieve the largest system availability. Based on the results of our analysis, we develop an open-source tool, called OPENS, which assists in the calculation of the optimal system availability. We validate the effectiveness of the proposed modeling approach in two ways, namely by comparing our results with those obtained for specific systems considered in relevant studies and by conducting numerical analyses for more general scenarios of its application.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据