4.5 Article

Deeper Weight Pruning Without Accuracy Loss in Deep Neural Networks: Signed-Digit Representation-Based Approach

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TCAD.2021.3064914

关键词

Parallel processing; Hardware; Neural networks; Acceleration; Performance evaluation; Computational modeling; Shortest path problem; Bit-level parallelism; deep neural networks (DNNs); signed-digit representation; weight pruning

资金

  1. IC Design Education Center (IDEC), South Korea

向作者/读者索取更多资源

In addition to word-level weight pruning, recent studies have shown that bit-level weight pruning is highly effective in accelerating neural network computation without sacrificing accuracy. This work proposes a transformation technique, a shortest path problem formulation, and a novel acceleration architecture to overcome the limitations of bit-level weight pruning. Experimental results demonstrate significant reductions in essential bits and inference computation time.
In addition to the word-level weight pruning, which excludes the 0-value weights from the neural network inference computation, it is recently demonstrated that the bit-level weight pruning, which excludes the 0-bits in the weight value representation regardless of whether the weight values are zero or not, is very effective to further accelerate the neural network computation without accuracy loss. This work overcomes the inherent limitation of the bit-level weight pruning, that is, the maximal computation speedup is bounded by the total number of nonzero bits of the weights and the bound is invariably considered uncontrollable (i.e., constant) for the neural network to be pruned. Precisely, this work, based on the signed-digit encoding 1) proposes a transformation technique which converts the two's complement representation of every weight into a set of signed-digit representations of the minimal number of essential (i.e., nonzero) bits; 2) formulates the problem of selecting signed-digit representations of weights that maximize the parallelism of bit-level multiplication on the weights into a objective shortest path problem to achieve a maximal digit-index by digit-index (i.e., columnwise) compression for the weights and solves it efficiently using an approximation algorithm; 3) proposes a supporting novel acceleration architecture (DWP) with no additional inclusion of nontrivial hardware; and 4) proposes a variant of DWP to support bit-level parallel multiplication with the capability of predicting a tight worst-case latency of the parallel processing. Through experiments on several representative models using the ImageNet dataset, it is shown that our proposed approach is able to reduce the number of essential bits by 69% on AlexNet, 74% on VGG-16, and 68% on ResNet-152, by which our accelerator is able to reduce the inference computation time by up to 3.57 over the conventional bit-level weight pruning.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据