4.3 Article

Performance portability in a real world application: PHAST applied to Caffe

出版社

SAGE PUBLICATIONS LTD
DOI: 10.1177/10943420221077107

关键词

High-performance computing; performance portability; heterogeneous computing; machine learning

资金

  1. AEI (State Research Agency, Spain)
  2. ERDF (European Regional Development Fund, EU) [RTI2018-098156-B-C53]

向作者/读者索取更多资源

This paper discusses the application of the PHAST Library to improve the performance of the Caffe framework. By optimizing the source code of Caffe and the PHAST Library itself, the PHAST implementation achieves performance portability on both CPU and GPU. The results show that the PHAST version of Caffe performs better in various aspects compared to the original version.
This work covers the PHAST Library's employment, a hardware-agnostic programming library, to a real-world application like the Caffe framework. The original implementation of Caffe consists of two different versions of the source code: one to run on CPU platforms and another one to run on the GPU side. With PHAST, we aim to develop a single-source code implementation capable of running efficiently on CPU and GPU. In this paper, we start by carrying out a basic Caffe implementation performance analysis using PHAST. Then, we detail possible performance upgrades. We find that the overall performance is dominated by few 'heavy' layers. In refining the inefficient parts of this version, we find two different approaches: improvements to the Caffe source code and improvements to the PHAST Library itself, which ultimately translates into improved performance in the PHAST version of Caffe. We demonstrate that our PHAST implementation achieves performance portability on CPUs and GPUs. With a single source, the PHAST version of Caffe provides the same or even better performance than the original version of Caffe built from two different codebases. For the MNIST database, the PHAST implementation takes an equivalent amount of time as native code in CPU and GPU. Furthermore, PHAST achieves a speedup of 51% and a 49% with the CIFAR-10 database against native code in CPU and GPU, respectively. These results provide a new horizon for software development in the upcoming heterogeneous computing era.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.3
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据