☆ 4.5 Article

Parallel QR Factorization of Block Low-rank Matrices

ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE (2022)

Journal

ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE

Volume 48, Issue 3, Pages -

Publisher

ASSOC COMPUTING MACHINERY

DOI: 10.1145/3538647

Keywords

Block low-rank matrix; QR factorization; householder reflections; task-based execution

Funding

JSPS KAKENHI [JP20K20624, JP21H03447]
Joint Usage/Research Center for Interdisciplinary Large-scale Information Infrastructures
High Performance Computing Infrastructure in Japan [jh210024-NAHI]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

The study introduces two new algorithms for Householder QR factorization of Block Low-Rank (BLR) matrices, achieving arithmetic complexities of O(mn) and O(mn(1.5)) respectively. The algorithms are compared in terms of parallel task-based execution and performance, showing significant speedups compared to dense QR methods. The study also demonstrates robustness to ill conditioning and better orthogonal factors compared to existing methods.

We present two new algorithms for Householder QR factorization of Block Low-Rank (BLR) matrices: one that performs block-column-wise QR and another that is based on tiled QR. We show how the block-column-wise algorithm exploits BLR structure to achieve arithmetic complexity of O(mn), while the tiled BLR-QR exhibits O(mn(1.5)) complexity. However, the tiled BLR-QR has finer task granularity that allows parallel task-based execution on shared memory systems. We compare the block-column-wise BLR-QR using fork-join parallelism with tiled BLR-QR using task-based parallelism. We also compare these two implementations of Householder BLR-QR with a block-column-wise Modified Gram-Schmidt (MGS) BLR-QR using fork-join parallelism and a state-of-the-art vendor-optimized dense Householder QR in Intel MKL. For a matrix of size 131k x 65k, all BLR methods are more than an order of magnitude faster than the dense QR in MKL. Our methods are also robust to ill conditioning and produce better orthogonal factors than the existing MGS-based method. On a CPU with 64 cores, our parallel tiled Householder and block-column-wise Householder algorithms show a speedup of 50 and 37 times, respectively.

Parallel QR Factorization of Block Low-rank Matrices

Journal

ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE

Publisher

ASSOC COMPUTING MACHINERY

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Parallel QR Factorization of Block Low-rank Matrices

Journal

ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE

Publisher

ASSOC COMPUTING MACHINERY

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper