3.8 Proceedings Paper

Communication-Optimal Parallel Recursive Rectangular Matrix Multiplication

Publisher

IEEE COMPUTER SOC
DOI: 10.1109/IPDPS.2013.80

Keywords

communication-avoiding algorithms; linear algebra; matrix multiplication

Funding

  1. Microsoft [024263]
  2. Intel [024894]
  3. U.C. Discovery [DIG07-10227]
  4. DOE [DE-SC0004938, DE-SC0005136, DE-SC0003959, DE-SC0008700, AC02-05CH11231]
  5. DARPA [HR0011-12-2-0016]
  6. Office of Science of the U.S. Department of Energy [DE-AC02-05CH11231]
  7. U.S. Department of Energy (DOE) [DE-SC0005136, DE-SC0008700, DE-SC0004938] Funding Source: U.S. Department of Energy (DOE)

Ask authors/readers for more resources

Communication-optimal algorithms are known for square matrix multiplication. Here, we obtain the first communication-optimal algorithm for all dimensions of rectangular matrices. Combining the dimension-splitting technique of Frigo, Leiserson, Prokop and Ramachandran (1999) with the recursive BFS/DFS approach of Ballard, Demmel, Holtz, Lipshitz and Schwartz (2012) allows for a communication-optimal as well as cache-and network-oblivious algorithm. Moreover, the implementation is simple: approximately 50 lines of code for the shared-memory version. Since the new algorithm minimizes communication across the network, between NUMA domains, and between levels of cache, it performs well in practice on both shared-and distributed-memory machines. We show significant speedups over existing parallel linear algebra libraries both on a 32-core shared-memory machine and on a distributed-memory supercomputer.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

3.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available