☆ 4.8 Article

Quantformer: Learning Extremely Low-Precision Vision Transformers

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (2023)

期刊

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE

卷 45, 期 7, 页码 8813-8826

出版社

IEEE COMPUTER SOC

DOI: 10.1109/TPAMI.2022.3229313

关键词

Differentiable search; group-wise discretization; network quantization; self-attention rank consistency; vision transformers

类别

Computer Science, Artificial Intelligence Engineering, Electrical & Electronic

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

In this article, the authors propose Quantformer, a type of extremely low-precision vision transformers for efficient inference. They address the limitations of conventional network quantization methods by considering the properties of transformer architectures and implementing capacity-aware distribution and group-wise discretization strategies. Experimental results show that Quantformer outperforms state-of-the-art methods in image classification and object detection across various vision transformer architectures. The authors also integrate Quantformer with mixed-precision quantization to further enhance performance.

In this article, we propose extremely low-precision vision transformers called Quantformer for efficient inference. Conventional network quantization methods directly quantize weights and activations of fully-connected layers without considering properties of transformer architectures. Quantization sizably deviates the self-attention compared with full-precision counterparts, and the shared quantization strategy for diversely distributed patch features causes severe quantization errors. To address these issues, we enforce the self-attention rank in quantized transformers to mimic that in full-precision counterparts with capacity-aware distribution for information retention, and quantize patch features with group-wise discretization strategy for quantization error minimization. Specifically, we efficiently preserve the self-attention rank consistency by minimizing the distance between the self-attention in quantized and real-valued transformers with adaptive concentration degree, where the optimal concentration degree is selected according to the self-attention entropy for model capacity adaptation. Moreover, we partition patch features in different dimensions with differentiable group assignment, so that features in different groups leverage various discretization strategies with minimal rounding and clipping errors. Experimental results show that our Quantformer outperforms the state-of-the-art network quantization methods by a sizable margin across various vision transformer architectures in image classification and object detection. We also integrate our Quantformer with mixed-precision quantization to further enhance the performance of the vanilla models.

Quantformer: Learning Extremely Low-Precision Vision Transformers

期刊

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE

出版社

IEEE COMPUTER SOC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Quantformer: Learning Extremely Low-Precision Vision Transformers

期刊

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE

出版社

IEEE COMPUTER SOC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文