4.6 Article

On the Safe Deployment of Matrix Multiplication in Massively Parallel Safety-Related Systems

Journal

APPLIED SCIENCES-BASEL
Volume 12, Issue 8, Pages -

Publisher

MDPI
DOI: 10.3390/app12083779

Keywords

safety; reliability; CNN; matrix multiplication; GPU; fault detection

Funding

  1. European Union [871465]

Ask authors/readers for more resources

This paper presents a safe matrix-matrix multiplication software implementation for GPUs with random hardware error-detection capabilities, which serves as a foundation for the implementation of safe deep learning libraries for GPUs. The performance impact and achievable diagnostic coverage of these mechanisms are measured with a set of representative matrix dimensions.
Deep learning technology has enabled the development of increasingly complex safety-related autonomous systems using high-performance computers, such as graphics processing units (GPUs), which provide the required high computing performance for the execution of parallel computing algorithms, such as matrix-matrix multiplications (a central computing element of deep learning software libraries). However, the safety certification of parallel computing software algorithms and GPU-based safety-related systems is a challenge to be addressed. For example, achieving the required fault-tolerance and diagnostic coverage for random hardware errors. This paper contributes with a safe matrix-matrix multiplication software implementation for GPUs with random hardware error-detection capabilities (permanent, transient) that can be used with different architectural patterns for fault-tolerance, and which serves as a foundation for the implementation of safe deep learning libraries for GPUs. The proposed contribution is complementary and can be combined with other techniques, such as algorithm-based fault tolerance. In particular, (i) we provide the high-performance matrix multiplication CUTLASS library with a catalog of diagnostic mechanisms to detect random hardware errors down to the arithmetic operation level; and (ii) we measure the performance impact incurred by the adoption of these mechanisms and their achievable diagnostic coverage with a set of representative matrix dimensions. To that end, we implement these algebraic operations, targeting CUDA cores with single instructions and multiple-thread math instructions in an NVIDIA Xavier NX GPU.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available