4.6 Article

No Fine-Tuning, No Cry: Robust SVD for Compressing Deep Networks

期刊

SENSORS
卷 21, 期 16, 页码 -

出版社

MDPI
DOI: 10.3390/s21165599

关键词

matrix factorization; neural networks compression; robust low rank approximation; Lowner ellipsoid

向作者/读者索取更多资源

The study introduces an algorithm for compressing neural networks that utilizes modern techniques in computational geometry to approximate lp instead of k-rank l2 for effective compression. Experimental results confirm the practicality and theoretical advantage of this method in compressing networks such as BERT, DistilBERT, XLNet, and RoBERTa on the GLUE benchmark.
A common technique for compressing a neural network is to compute the k-rank l2 approximation Ak of the matrix A is an element of Rnxd via SVD that corresponds to a fully connected layer (or embedding layer). Here, d is the number of input neurons in the layer, n is the number in the next one, and Ak is stored in O((n+d)k) memory instead of O(nd). Then, a fine-tuning step is used to improve this initial compression. However, end users may not have the required computation resources, time, or budget to run this fine-tuning stage. Furthermore, the original training set may not be available. In this paper, we provide an algorithm for compressing neural networks using a similar initial compression time (to common techniques) but without the fine-tuning step. The main idea is replacing the k-rank l2 approximation with lp, for p is an element of[1,2], which is known to be less sensitive to outliers but much harder to compute. Our main technical result is a practical and provable approximation algorithm to compute it for any p >= 1, based on modern techniques in computational geometry. Extensive experimental results on the GLUE benchmark for compressing the networks BERT, DistilBERT, XLNet, and RoBERTa confirm this theoretical advantage.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据