期刊
出版社
NATL ACAD SCIENCES
DOI: 10.1073/pnas.2122762119
关键词
TPUs; scientific computation; linear algebra; distributed computing; ASICs
资金
- Cloud TPUs from Google's TPU Research Cloud
- Government of Canada through the Department of Innovation, Science and Economic Development
- Province of Ontario through the Ministry of Research, Innovation and Science
Researchers have repurposed Google's TPUs to create large-scale dense linear algebra supercomputers, which can rapidly compute large matrices in less than two minutes using distributed matrix multiplication algorithms.
We have repurposed Google tensor processing units (TPUs), application-specific chips developed for machine learning, into large-scale dense linear algebra supercomputers. The TPUs' fast intercore interconnects (ICIs), physically two-dimensional network topology, and high-bandwidth memory (HBM) permit distributed matrix multiplication algorithms to rapidly become computationally bound. In this regime, the matrix-multiply units (MXUs) dominate the runtime, yielding impressive scaling, performance, and raw size: Operating in float32 precision, a full 2,048-core pod of third-generation TPUs can multiply two matrices with linear size N = 2(20) = 1,048,576 in about 2 min. Via curated algorithms emphasizing large, single-core matrix multiplications, other tasks in dense linear algebra can similarly scale. As examples, we present 1) QR decomposition; 2) resolution of linear systems; and 3) the computation of matrix functions by polynomial iteration, demonstrated by the matrix polar factorization.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据