4.3 Article

Optimizing Convolutional Neural Networks on the Sunway TaihuLight Supercomputer

Publisher

ASSOC COMPUTING MACHINERY
DOI: 10.1145/3177885

Keywords

Convolutional neural network; deep learning; heterogeneous many-core architecture; Sunway TaihuLight supercomputer

Funding

  1. National Key R&D Program of China [2016YFA0602200]
  2. National Natural Science Foundation of China [4137411, 91530323, 61702297, 61672312]
  3. China Postdoctoral Science Foundation [2016M601031]

Ask authors/readers for more resources

The Sunway TaihuLight supercomputer is powered by SW26010, a new 260-core processor designed with on-chip fusion of heterogeneous cores. In this article, we present our work on optimizing the training process of convolutional neural networks (CNNs) on the Sunway TaihuLight supercomputer. Specifically, a highly efficient library (swDNN) and a customized Caffe framework (swCaffe) are proposed. Architecture-oriented optimization methods targeting the many-core architecture of SW26010 are introduced and are able to achieve 48x speedup for the convolution routine in swDNN and 4x speedup for the complete training process of the VGG-16 network using swCaffe, compared to the unoptimized algorithm and framework. Compared to the cuDNN library and the Caffe framework based on the NVIDIA K40m GPU, the proposed swDNN library and swCaffe framework on SW26010 have nearly half the performance of K40m in single-precision and have 3.6x and 1.8x speedup over K40m in double precision, respectively.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.3
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available