Journal
APPLIED SCIENCES-BASEL
Volume 12, Issue 8, Pages -Publisher
MDPI
DOI: 10.3390/app12083779
Keywords
safety; reliability; CNN; matrix multiplication; GPU; fault detection
Categories
Funding
- European Union [871465]
Ask authors/readers for more resources
This paper presents a safe matrix-matrix multiplication software implementation for GPUs with random hardware error-detection capabilities, which serves as a foundation for the implementation of safe deep learning libraries for GPUs. The performance impact and achievable diagnostic coverage of these mechanisms are measured with a set of representative matrix dimensions.
Deep learning technology has enabled the development of increasingly complex safety-related autonomous systems using high-performance computers, such as graphics processing units (GPUs), which provide the required high computing performance for the execution of parallel computing algorithms, such as matrix-matrix multiplications (a central computing element of deep learning software libraries). However, the safety certification of parallel computing software algorithms and GPU-based safety-related systems is a challenge to be addressed. For example, achieving the required fault-tolerance and diagnostic coverage for random hardware errors. This paper contributes with a safe matrix-matrix multiplication software implementation for GPUs with random hardware error-detection capabilities (permanent, transient) that can be used with different architectural patterns for fault-tolerance, and which serves as a foundation for the implementation of safe deep learning libraries for GPUs. The proposed contribution is complementary and can be combined with other techniques, such as algorithm-based fault tolerance. In particular, (i) we provide the high-performance matrix multiplication CUTLASS library with a catalog of diagnostic mechanisms to detect random hardware errors down to the arithmetic operation level; and (ii) we measure the performance impact incurred by the adoption of these mechanisms and their achievable diagnostic coverage with a set of representative matrix dimensions. To that end, we implement these algebraic operations, targeting CUDA cores with single instructions and multiple-thread math instructions in an NVIDIA Xavier NX GPU.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available