期刊
SOFTWARE-PRACTICE & EXPERIENCE
卷 35, 期 2, 页码 101-121出版社
WILEY
DOI: 10.1002/spe.626
关键词
ATLAS; BLAS; kernel optimization; recursive optimization; linear algebra
The Basic Linear Algebra Subprograms (BLAS) define one of the most heavily used performance-critical APIs in scientific computing today. It has long been understood that the most important of these routines the dense Level 3 BLAS, may be written efficiently given a highly optimized general matrix multiply routine. In this paper, however, we show that an even larger set of operations can be efficiently maintained using a much simpler matrix multiply kernel. Indeed, this is how our own project, ATLAS (which provides one of the most widely used BLAS implementations in use today), supports a large variety of performance-critical routines. Copyright (C) 2004 John Wiley Sons, Ltd.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据