☆ 4.5 Article

MS-Transformer: Introduce multiple structural priors into a unified transformer for encoding sentences

COMPUTER SPEECH AND LANGUAGE (2022)

期刊

COMPUTER SPEECH AND LANGUAGE

卷 72, 期 -, 页码 -

出版社

ACADEMIC PRESS LTD- ELSEVIER SCIENCE LTD

DOI: 10.1016/j.csl.2021.101304

关键词

Sentence representation; Transformer; Natural language processing

类别

Computer Science, Artificial Intelligence

资金

Key Development Program of the Ministry of Science and Technology, China [2019YFF0303003]
National Natural Science Foundation of China [61976068]
Hundreds, Millions'' Engineering Science and Technology Major Special Project of Heilongjiang Province, China [2020ZX14A02]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This paper introduces a Multiple Structural Priors Guided Transformer (MS-Transformer) that integrates different types of structural priors into Transformers using a novel multi-head attention mechanism, aiming to capture the rich structural information of texts. Experimental results demonstrate significant performance improvements of MS-Transformer compared to other models.

Transformers have been widely utilized in recent NLP studies. Unlike CNNs or RNNs, the vanilla Transformer is position-insensitive, and thus is incapable of capturing the structural priors between sequences of words. Existing studies commonly apply one single mask strategy on Transformers for incorporating structural priors while failing at modeling more abundant structural information of texts. In this paper, we aim at introducing multiple types of structural priors into Transformers, proposing the Multiple Structural Priors Guided Transformer (MS-Transformer) that transforms different structural priors into different attention heads by using a novel multi-mask based multi-head attention mechanism. In particular, we integrate two categories of structural priors, including the sequential order and the relative position of words. For the purpose of capturing the latent hierarchical structure of the texts, we extract these information not only from the word contexts but also from the dependency syntax trees. Experimental results on three tasks show that MS-Transformer achieves significant improvements against other strong baselines.

MS-Transformer: Introduce multiple structural priors into a unified transformer for encoding sentences

期刊

COMPUTER SPEECH AND LANGUAGE

出版社

ACADEMIC PRESS LTD- ELSEVIER SCIENCE LTD

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

MS-Transformer: Introduce multiple structural priors into a unified transformer for encoding sentences

期刊

COMPUTER SPEECH AND LANGUAGE

出版社

ACADEMIC PRESS LTD- ELSEVIER SCIENCE LTD

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文