Journal
JOURNAL OF SYSTEMS ARCHITECTURE
Volume 101, Issue -, Pages -Publisher
ELSEVIER
DOI: 10.1016/j.sysarc.2019.101651
Keywords
Cloud computing; Fault tolerance; Checkpoint; Machine learning; Resilient architecture; Spot instance; Survival analysis
Funding
- Brazilian Coordination for the Improvement of Higher Education Personnel (CAPES) [1441250]
- Brazilian National Council for Scientific and Technological Development (CNPq) [311301/2018-5]
Ask authors/readers for more resources
The large-scale utilization of cloud computing resources has led to the emergence of cloud environment reliability as an important issue. In addition, cloud providers are negotiating unreliable virtual machines as a result of exploring unused resources offering them as transient servers - a lower price virtual machine service with resource revocations without user intervention. To increase the availability of transient servers, we propose a multi-cloud fault-tolerant architecture to provide a resilient environment using a scenario-based optimal checkpoint in a scheme to guarantee running processes with reduced user costs. The architecture combines a heuristic to extract information from a case-based reasoning and a statistical model to predict failure events helping to refine fault tolerance parameters. As a result, a cloud environment with better levels of reliability and reduced execution time is provided. Extensive simulations show high levels of accuracy reaching up to 92% survival prediction success rate and a gain of 74,58% of execution time reduction for long running applications. The results are promising, indicating that the proposed architecture can prevent revocation failures under realistic working conditions.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available