4.5 Article

Tutorial on systems with antifragility to downtime

期刊

COMPUTING
卷 104, 期 1, 页码 73-93

出版社

SPRINGER WIEN
DOI: 10.1007/s00607-020-00895-6

关键词

Antifragility; Distributed systems; Design principles; Uptime

向作者/读者索取更多资源

The article discusses the importance of designing and operating socio-technical systems with antifragility in mind to prevent downtime. It emphasizes the principles of separate processes, asynchronous communication, and injecting artificial failures into the production system to detect vulnerabilities and adapt to changes in the system and its environment. By following these design and operational principles, incidents can be minimized and uptime can be maintained at a high level.
An antifragile system of software and stakeholders, including designers, developers, and operators, learn from incidents how to avoid outages and maintain high uptime. This tutorial article reviews how to design and operate such socio-technical systems with antifragility to downtime. It documents the importance of four design principles and two operational principles by exploring the polar opposite anti-principles and the interplay between the principles and the anti-principles. The design principles mandate a software design of separate and isolatable processes with sufficient diversity and redundancy. The processes should communicate asynchronously over an external network. The operational principles imply that the software development teams should repeatedly inject artificial failures into the production system to understand its behavior and detect and mitigate vulnerabilities as the system and its environment change.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据