4.3 Article

A Case For Intra-rack Resource Disaggregation in HPC

出版社

ASSOC COMPUTING MACHINERY
DOI: 10.1145/3514245

关键词

Disaggregation; HPC; utilization; memory; LDMS

资金

  1. ARPA-E ENLITENED Program [DE-AR00000843]
  2. Office of Science, of the U.S. Department of Energy [DE-AC02-05CH11231]
  3. National Energy Research Scientific Computing Center (NERSC), a U.S. Department of Energy Office of Science User Facility located at Lawrence Berkeley National Laboratory [DE-AC02-05CH11231]

向作者/读者索取更多资源

With the emergence of specialized accelerators, there is a need for more flexible resource allocation in HPC systems. This study analyzes NERSC's Cori system and profiles deep-learning applications to investigate hardware resource disaggregation. The results demonstrate that intra-rack disaggregation can effectively locate required resources and reduce the utilization of memory and NICs.
The expected halt of traditional technology scaling is motivating increased heterogeneity in high-performance computing (HPC) systems with the emergence of numerous specialized accelerators. As heterogeneity increases, so does the risk of underutilizing expensive hardware resources if we preserve today's rigid node configuration and reservation strategies. This has sparked interest in resource disaggregation to enable finer-grain allocation of hardware resources to applications. However, there is currently no data-driven study of what range of disaggregation is appropriate in HPC. To that end, we perform a detailed analysis of key metrics sampled in NERSC's Cori, a production HPC system that executes a diverse open-science HPC workload. In addition, we profile a variety of deep-learning applications to represent an emerging workload. We show that for a rack (cabinet) configuration and applications similar to Cori, a central processing unit with intra-rack disaggregation has a 99.5% probability to find all resources it requires inside its rack. In addition, ideal intra-rack resource disaggregation in Cori could reduce memory and NIC resources by 5.36% to 69.01% and still satisfy the worst-case average rack utilization.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.3
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据