4.6 Article

On the diversity of multi-head attention

期刊

NEUROCOMPUTING
卷 454, 期 -, 页码 14-24

出版社

ELSEVIER
DOI: 10.1016/j.neucom.2021.04.038

关键词

Natural language processing; Multi-head attention; Diversity; Routing-by-agreement; Neural machine translation; Sentence encoding

向作者/读者索取更多资源

This paper introduces two methods, disagreement regularization and routing-by-agreement algorithm, to better exploit the diversity of multi-head attention. Experimental results show that these methods can effectively improve model performance, and combining them can further enhance the model performance.
Multi-head attention is appealing for the ability to jointly attend to information from different represen-tation subspaces at different positions. In this work, we propose two approaches to better exploit such diversity for multi-head attention, which are complementary to each other. First, we introduce a dis-agreement regularization to explicitly encourage the diversity among multiple attention heads. Specifically, we propose three types of disagreement regularization, which respectively encourage the subspace, the attended positions, and the output representation associated with each attention head to be different from other heads. Second, we propose to better capture the diverse information distributed in the extracted partial-representations with the routing-by-agreement algorithm. The routing algorithm iteratively updates the proportion of how much a part (i.e. the distinct information learned from a speci-fic subspace) should be assigned to a whole (i.e. the final output representation), based on the agreement between parts and wholes. Experimental results on the machine translation, sentence encoding and log-ical inference tasks demonstrate the effectiveness and universality of the proposed approaches, which indicate the necessity of better exploiting the diversity for multi-head attention. While the two strategies individually boost performance, combining them together can further improve the model performance. (c) 2021 Elsevier B.V. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据