4.7 Article

Data-driven mixed-Integer linear programming-based optimisation for efficient failure detection in large-scale distributed systems

期刊

EUROPEAN JOURNAL OF OPERATIONAL RESEARCH
卷 303, 期 1, 页码 337-353

出版社

ELSEVIER
DOI: 10.1016/j.ejor.2022.02.006

关键词

Nonlinear programming; Mixed integer linear programming; Distributed systems; Failure detection; Heartbeats

向作者/读者索取更多资源

In this paper, a new Mixed Integer Linear Programming (MILP) optimization-based failure detector (FD) algorithm is proposed. The MILP formulation is obtained via piecewise linearization relaxations and aims to find optimal FD parameters that meet the desired system requirements. The proposed approach improves overall FD performance and scalability by considering network conditions and system parameters as constraints and adapting to real-time network changes. The results obtained from testing in a realistic environment show consistent improvement in the performance and scalability of the FD. This paper is the first attempt to combine MILP-based optimization modeling with FD to achieve performance guarantees.
Failure detectors (FDs) are fundamental building blocks for distributed systems. An FD detects whether a process has crashed or not based on the reception of heartbeats' messages sent by this process over a communication channel. A key challenge of FDs is to tune their parameters to achieve optimal performance which satisfies the desired system requirements. This is challenging due to the complexities of large-scale networks. Existing FDs ignore such optimisation and adopt ad-hoc parameters. In this paper, we propose a new Mixed Integer Linear Programming (MILP) optimisation-based FD algorithm. We obtain the MILP formulation via piecewise linearisation relaxations. The MILP involves obtaining optimal FD parameters that meet the optimal trade-off between its performance metrics requirements, network conditions and system parameters. The MILP maximises our FD's accuracy under bounded failure detection time while considering network and system conditions as constraints. The MILP's solution represents optimised FD parameters that maximise FD's expected performance. To adapt to real-time network changes, our proposed MILP-based FD fits the probability distribution of heartbeats' inter-arrivals. To address our FD scalability challenge in large-scale systems where the MILP model needs to compute approximate optimal solutions quickly, we also propose a heuristic algorithm. To test our proposed approach, we adopt Amazon Cloud as a realistic testing environment and develop a simulator for robustness tests. Our results show consistent improvement of overall FD performance and scalability. To the best of our knowledge, this is the first attempt to combine the MILP-based optimisation modelling with FD to achieve performance guarantees.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据