3.8 Proceedings Paper

POSTER: Hardening Selective Protection across Multiple Program Inputs for HPC Applications

Publisher

ASSOC COMPUTING MACHINERY
DOI: 10.1145/3503221.3508414

Keywords

Error Resilience; Fault Injection; Compiler; High Performance Computing

Funding

  1. U.S. Department of Energy, Office of Science [DE-AC02-06CH11357]

Ask authors/readers for more resources

The study finds that the existing SID technique faces a decrease in SDC coverage in HPC applications, due to evaluation limitations to single program inputs. To address this issue, the Sentinel framework is proposed to enhance SDC coverage across multiple inputs through automated compiler techniques.
With the ever-shrinking size of transistors and increasing scale of applications, silent data corruptions (SDCs) have become a common yet serious issue in HPC applications. Selective instruction duplication (SID) is a popular fault-tolerance technique that can obtain a high SDC coverage with low-performance overhead, as it selects the most vulnerable parts of a program for protection with priority. However, existing studies of SID are confined to single program input in the evaluation, assuming that the error resilience of the program remains similar across inputs, leading to a drastic loss of SDC coverage from SID when the protected program runs different inputs. Hence, we proposed Sentinel, an automated compiler-based framework to mitigate the loss of SDC coverage. Evaluation results show that Sentinel can effectively mitigate the loss of SDC coverage (up to 97.00%) across multiple inputs, which significantly hardens existing SID techniques.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

3.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available