3.9 Article

Distributed deep learning system for cancerous region detection on Sunway TaihuLight

期刊

出版社

SPRINGERNATURE
DOI: 10.1007/s42514-020-00046-5

关键词

Deep neural network; Parameter server; Ring all-reduce; Cancerous region detection

资金

  1. National Key Research and Development Program of China [2016YF B1000403]
  2. Fundamental Research Funds for the Central Universities of China

向作者/读者索取更多资源

To explore the potential of distributed training on deep neural networks, we implement several distributed algorithms with the basis of swFlow on the world-leading supercomputer, Sunway TaihuLight. Based on two naive designs of parameter server and ring all-reduce, we present the limitation of the communication model and discuss the optimizations for adapting the five-level interconnect architecture of Sunway system. To reduce the communication bottleneck on large scale system, multi-severs and hierarchical ring all-reduce models are introduced. With a benchmark from deep learning-based cancerous region detection algorithm, the average parallel efficiency obtains over 80% for at most 1024 processors. It reveals the great opportunity for joint combination of deep learning and HPC system.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

3.9
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据