3.8 Proceedings Paper

FINJ: A Fault Injection Tool for HPC Systems

Journal

EURO-PAR 2018: PARALLEL PROCESSING WORKSHOPS
Volume 11339, Issue -, Pages 800-812

Publisher

SPRINGER INTERNATIONAL PUBLISHING AG
DOI: 10.1007/978-3-030-10549-5_62

Keywords

Exascale systems; Resiliency; Fault detection; Monitoring; Benchmarking; Open-source

Funding

  1. Oprecomp-Open Transprecision Computing project
  2. EU [654024]

Ask authors/readers for more resources

We present FINJ, a high-level fault injection tool for High-Performance Computing (HPC) systems, with a focus on the management of complex experiments. FINJ provides support for custom workloads and allows generation of anomalous conditions through the use of fault-triggering executable programs. FINJ can also be integrated seamlessly with most other lower-level fault injection tools, allowing users to create and monitor a variety of highly-complex and diverse fault conditions in HPC systems that would be difficult to recreate in practice. FINJ is suitable for experiments involving many, potentially interacting nodes, making it a very versatile design and evaluation tool.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

3.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available