☆ 4.6 Article

The Singular Value Decomposition: Anatomy of Optimizing an Algorithm for Extreme Scale

SIAM REVIEW (2018)

Journal

SIAM REVIEW

Volume 60, Issue 4, Pages 808-865

Publisher

SIAM PUBLICATIONS

DOI: 10.1137/17M1117732

Keywords

singular value decomposition; SVD; bidiagonal matrix; QR iteration; divide and conquer; bisection; MRRR; Jacobi method; Kogbetliantz method; Hestenes method

Funding

National Science Foundation [1339822]
Exascale Computing Project [17-SC-20-SC]
NVIDIA
Intel
Office of Advanced Cyberinfrastructure (OAC)
Direct For Computer & Info Scie & Enginr [1339822] Funding Source: National Science Foundation

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

The computation of the singular value decomposition, or SVD, has a long history with many improvements over the years, both in its implementations and algorithmically. Here, we survey the evolution of SVD algorithms for dense matrices, discussing the motivation and performance impacts of changes. There are two main branches of dense SVD methods: bidiagonalization and Jacobi. Bidiagonalization methods started with the implementation by Golub and Reinsch in Algol60, which was subsequently ported to Fortran in the EIS-PACK library, and was later more efficiently implemented in the LINPACK library, targeting contemporary vector machines. To address cache-based memory hierarchies, the SVD algorithm was reformulated to use Level 3 BLAS in the LAPACK library. To address new architectures, ScaLAPACK was introduced to take advantage of distributed computing, and MAGMA was developed for accelerators such as GPUs. Algorithmically, the divide and conquer and MRRR algorithms were developed to reduce the number of operations. Still, these methods remained memory bound, so two-stage algorithms were developed to reduce memory operations and increase the computational intensity, with efficient implementations in PLASMA, DPLASMA, and MAGMA. Jacobi methods started with the two-sided method of Kogbetliantz and the one-sided method of Hestenes. They have likewise had many developments, including parallel and block versions and preconditioning to improve convergence. In this paper, we investigate the impact of these changes by testing various historical and current implementations on a common, modern multicore machine and a distributed computing platform. We show that algorithmic and implementation improvements have increased the speed of the SVD by several orders of magnitude, while using up to 40 times less energy.

The Singular Value Decomposition: Anatomy of Optimizing an Algorithm for Extreme Scale

Journal

SIAM REVIEW

Publisher

SIAM PUBLICATIONS

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

The Singular Value Decomposition: Anatomy of Optimizing an Algorithm for Extreme Scale

Journal

SIAM REVIEW

Publisher

SIAM PUBLICATIONS

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper