4.6 Article

Hardware-aware approach to deep neural network optimization

Journal

NEUROCOMPUTING
Volume 559, Issue -, Pages -

Publisher

ELSEVIER
DOI: 10.1016/j.neucom.2023.126808

Keywords

DNNs; Optimization; Hardware-aware approach; LWPolar; Polar_HSPG; IHSOpti

Ask authors/readers for more resources

This paper introduces a hardware-aware mechanism, IHSOpti, which optimizes deep neural networks by utilizing hardware characteristics and parallelism features. It achieves remarkable pruning ratios and improvements in running efficiency, surpassing the latest advances in the field.
Deep neural networks (DNNs) have been a pivotal technology in a myriad of fields, boasting remarkable achievements. Nevertheless, their substantial workload and inherent redundancies pose ongoing challenges for both practitioners and academia. While numerous researchers endeavor to optimize DNNs, the inherent parallelism features of hardware are generally underutilized, resulting in inefficient use of hardware resources. To address this deficit, the paper unveils a hardware-aware mechanism, IHSOpti, which incorporates hardware characteristics with software algorithms for DNN optimization. IHSOpti endeavors to exploit the full potential of modern hardware parallelism, with significant emphasis on pipelining mechanisms. Specifically, IHSOpti for-mulates an advanced sparse training algorithm Polar_HSPG which incorporates the newly-proposed layer-wise refined polarization regularizer (LWPolar), grounded on the half-space project gradient (HSPG). Subsequently, IHSOpti pioneeringly introduces the residual strategy for optimizing the layer-level redundancies of neural networks, capitalizing on the pipelining attributes inherent in current hardware. Experimental findings demonstrate that IHSOpti attains outstanding pruning ratios in both parameters and FLOPs. Specifically, IHSOpti achieves up to 96.90% and 82.73% pruning ratios with the accuracy of 93.34% for VGGBN, 97.69% and 95.24% pruning ratios with the accuracy of 94.69% for ResNet, 98.07% and 97.80% pruning ratios with the accuracy of 95.73% for the cutting-edge network RegNet, respectively. Notably, the running efficiency exhibits remarkable improvements with accelerations ranging from 3.63x to 8.20x for CPUs and 1.22x to 2.25x for GPUs, respectively. These outcomes surpass the latest advances in the field. Through the incorporation of specific hardware characteristics, IHSOpti provides a comprehensive and effective approach to harness the intrinsic parallelism of contemporary hardware platforms for DNNs.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available