4.5 Article

MS-Transformer: Introduce multiple structural priors into a unified transformer for encoding sentences

期刊

COMPUTER SPEECH AND LANGUAGE
卷 72, 期 -, 页码 -

出版社

ACADEMIC PRESS LTD- ELSEVIER SCIENCE LTD
DOI: 10.1016/j.csl.2021.101304

关键词

Sentence representation; Transformer; Natural language processing

资金

  1. Key Development Program of the Ministry of Science and Technology, China [2019YFF0303003]
  2. National Natural Science Foundation of China [61976068]
  3. Hundreds, Millions'' Engineering Science and Technology Major Special Project of Heilongjiang Province, China [2020ZX14A02]

向作者/读者索取更多资源

This paper introduces a Multiple Structural Priors Guided Transformer (MS-Transformer) that integrates different types of structural priors into Transformers using a novel multi-head attention mechanism, aiming to capture the rich structural information of texts. Experimental results demonstrate significant performance improvements of MS-Transformer compared to other models.
Transformers have been widely utilized in recent NLP studies. Unlike CNNs or RNNs, the vanilla Transformer is position-insensitive, and thus is incapable of capturing the structural priors between sequences of words. Existing studies commonly apply one single mask strategy on Transformers for incorporating structural priors while failing at modeling more abundant structural information of texts. In this paper, we aim at introducing multiple types of structural priors into Transformers, proposing the Multiple Structural Priors Guided Transformer (MS-Transformer) that transforms different structural priors into different attention heads by using a novel multi-mask based multi-head attention mechanism. In particular, we integrate two categories of structural priors, including the sequential order and the relative position of words. For the purpose of capturing the latent hierarchical structure of the texts, we extract these information not only from the word contexts but also from the dependency syntax trees. Experimental results on three tasks show that MS-Transformer achieves significant improvements against other strong baselines.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据