3.9 Article

Noise in the Clouds: Influence of Network Performance Variability on Application Scalability

出版社

ASSOC COMPUTING MACHINERY
DOI: 10.1145/3570609

关键词

cloud; HPC; network noise; scalability

资金

  1. European Research Council (ERC) grant PSAP [101002047]
  2. European Union [955606, 955776]
  3. ETH Postdoctoral Fellowship [19-2 FEL-50]
  4. European Research Council (ERC) [101002047] Funding Source: European Research Council (ERC)

向作者/读者索取更多资源

This paper analyzes the network performance, scalability, and cost of running HPC workloads on cloud systems. It examines latency, bandwidth, and collective communication patterns at different scales and validates the impact of network and OS noise on performance and cost.
Cloud computing represents an appealing opportunity for cost-effective deployment of HPC workloads on the best-fitting hardware. However, although cloud and on-premise HPC systems offer similar computational resources, their network architecture and performance may differ significantly. For example, these systems use fundamentally different network transport and routing protocols, which may introduce network noise that can eventually limit the application scaling. This work analyzes network performance, scalability, and cost of running HPC workloads on cloud systems. First, we consider latency, bandwidth, and collective communication patterns in detailed small-scale measurements, and then we simulate network performance at a larger scale. We validate our approach on four popular cloud providers and three on-premise HPC systems, showing that network (and also OS) noise can significantly impact performance and cost both at small and large scale.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

3.9
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据