☆ 4.7 Article

Unsupervised multi-sense language models for natural language processing tasks

NEURAL NETWORKS (2021)

期刊

NEURAL NETWORKS

卷 142, 期 -, 页码 397-409

出版社

PERGAMON-ELSEVIER SCIENCE LTD

DOI: 10.1016/j.neunet.2021.05.023

关键词

Language model; Neural language processing (NLP); Multi-sense word modeling

类别

Computer Science, Artificial Intelligence Neurosciences

资金

Industrial Strategic Technology Development Program - Ministry of Trade, Industry and Energy (MOTIE, Korea) [10072064]
Institute of Information & Communications Technology Planning & Evaluation - Ministry of Science and ICT (MSIT, Korea) [2016-0-00562]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This paper introduces a sense-aware framework to process multi-sense word information without relying on annotated data. The framework includes context representation stage, sense-labeling stage, and multi-sense LM learning stage.

Existing language models (LMs) represent each word with only a single representation, which is unsuitable for processing words with multiple meanings. This issue has often been compounded by the lack of availability of large-scale data annotated with word meanings. In this paper, we propose a sense-aware framework that can process multi-sense word information without relying on annotated data. In contrast to the existing multi-sense representation models, which handle information in a restricted context, our framework provides context representations encoded without ignoring word order information or long-term dependency. The proposed framework consists of a context representation stage to encode the variable-size context, a sense-labeling stage that involves unsupervised clustering to infer a probable sense for a word in each context, and a multi-sense LM (MSLM) learning stage to learn the multi-sense representations. Particularly for the evaluation of MSLMs with different vocabulary sizes, we propose a new metric, i.e., unigram-normalized perplexity (PPLu), which is also understood as the negated mutual information between a word and its context information. Additionally, there is a theoretical verification of PPLu on the change of vocabulary size. Also, we adopt a method of estimating the number of senses, which does not require further hyperparameter search for an LM performance. For the LMs in our framework, both unidirectional and bidirectional architectures based on long short-term memory (LSTM) and Transformers are adopted. We conduct comprehensive experiments on three language modeling datasets to perform quantitative and qualitative comparisons of various LMs. Our MSLM outperforms single-sense LMs (SSLMs) with the same network architecture and parameters. It also shows better performance on several downstream natural language processing tasks in the General Language Understanding Evaluation (GLUE) and SuperGLUE benchmarks. (C) 2021 Elsevier Ltd. All rights reserved.

Unsupervised multi-sense language models for natural language processing tasks

期刊

NEURAL NETWORKS

出版社

PERGAMON-ELSEVIER SCIENCE LTD

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Unsupervised multi-sense language models for natural language processing tasks

期刊

NEURAL NETWORKS

出版社

PERGAMON-ELSEVIER SCIENCE LTD

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文