Journal
ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION
Volume 15, Issue 1, Pages -Publisher
ASSOC COMPUTING MACHINERY
DOI: 10.1145/3177885
Keywords
Convolutional neural network; deep learning; heterogeneous many-core architecture; Sunway TaihuLight supercomputer
Funding
- National Key R&D Program of China [2016YFA0602200]
- National Natural Science Foundation of China [4137411, 91530323, 61702297, 61672312]
- China Postdoctoral Science Foundation [2016M601031]
Ask authors/readers for more resources
The Sunway TaihuLight supercomputer is powered by SW26010, a new 260-core processor designed with on-chip fusion of heterogeneous cores. In this article, we present our work on optimizing the training process of convolutional neural networks (CNNs) on the Sunway TaihuLight supercomputer. Specifically, a highly efficient library (swDNN) and a customized Caffe framework (swCaffe) are proposed. Architecture-oriented optimization methods targeting the many-core architecture of SW26010 are introduced and are able to achieve 48x speedup for the convolution routine in swDNN and 4x speedup for the complete training process of the VGG-16 network using swCaffe, compared to the unoptimized algorithm and framework. Compared to the cuDNN library and the Caffe framework based on the NVIDIA K40m GPU, the proposed swDNN library and swCaffe framework on SW26010 have nearly half the performance of K40m in single-precision and have 3.6x and 1.8x speedup over K40m in double precision, respectively.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available