3.8 Proceedings Paper

Benchmarking the Performance of Accelerators on National Cyberinfrastructure Resources for Artificial Intelligence/Machine LearningWorkloads

出版社

ASSOC COMPUTING MACHINERY
DOI: 10.1145/3491418.3530772

关键词

ResNet50; ACES (Accelerating Computing for Emerging Sciences); Expanse; Graphics Processing Unit; Intelligence Processing Unit; PopVision; Classification; Convolution Neural Network; Optimization; Frontera; LoneStar6

资金

  1. National Science Foundation (NSF) [2112356, 1925764, 2019136, 2019129]

向作者/读者索取更多资源

In this paper, the performance of two different architectures (GPU and IPU) in AI/ML workflows is compared, and it is found that they have similar performance and scalability. However, due to the differences in memory processing structures, IPUs are efficient with smaller batch sizes, while GPUs benefit from larger batch sizes.
Upcoming regional and National Science Foundation (NSF)-funded Cyberinfrastructure (CI) resources will give researchers opportunities to run their artificial intelligence / machine learning (AI/ML) workflows on accelerators. To effectively leverage this burgeoning CI-rich landscape, researchers need extensive benchmark data to maximize performance gains and map their workflows to appropriate architectures. This data will further assist CI administrators, NSF program officers, and CI allocation-reviewers make informed determinations on CI-resource allocations. Here, we compare the performance of two very different architectures: the commonly used Graphical Processing Units (GPUs) and the new generation of Intelligence Processing Units (IPUs), by running training benchmarks of common AI/ML models. We leverage the maturity of software stacks, and the ease of migration among these platforms to learn that performance and scaling are similar for both architectures. Exploring training parameters, such as batch size, however finds that owing to memory processing structures, IPUs run efficiently with smaller batch sizes, while GPUs benefit from large batch sizes to extract sufficient parallelism in neural network training and inference. This comes with different advantages and disadvantages as discussed in this paper.As such considerations of inference latency, inherent parallelism and model accuracy will play a role in researcher selection of these architectures. The impact of these choices on a representative image compression model system is discussed.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

3.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据