☆ 4.7 Article

Enhanced word embeddings using multi-semantic representation through lexical chains

INFORMATION SCIENCES (2020)

期刊

INFORMATION SCIENCES

卷 532, 期 -, 页码 16-32

出版社

ELSEVIER SCIENCE INC

DOI: 10.1016/j.ins.2020.04.048

关键词

Lexical chains; Natural language processing; Word embeddings; Document classification; Synsets

类别

Computer Science, Information Systems

资金

Science Without Borders Brazilian Government Scholarship Program, CNPq [205581/2014-5]
Coordenacao de Aperfeicoamento de Pessoal de Nivel Superior -Brasil (CAPES), Programa de Doutorado Sanduiche no Exterior (PDSE) [88881.186965/201801]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

The relationship between words in a sentence often tells us more about the underlying semantic content of a document than its actual words, individually. In this work, we propose two novel algorithms, called Flexible Lexical Chain II and Fixed Lexical Chain II. These algorithms combine the semantic relations derived from lexical chains, prior knowledge from lexical databases, and the robustness of the distributional hypothesis in word embeddings as building blocks forming a single system. In short, our approach has three main contributions: (i) a set of techniques that fully integrate word embeddings and lexical chains; (ii) a more robust semantic representation that considers the latent relation between words in a document; and (iii) lightweight word embeddings models that can be extended to any natural language task. We intend to assess the knowledge of pretrained models to evaluate their robustness in the document classification task. The proposed techniques are tested against seven word embeddings algorithms using five different machine learning classifiers over six scenarios in the document classification task. Our results show the integration between lexical chains and word embeddings representations sustain state-of-the-art results, even against more complex systems. (C) 2020 Elsevier Inc. All rights reserved.

Enhanced word embeddings using multi-semantic representation through lexical chains

期刊

INFORMATION SCIENCES

出版社

ELSEVIER SCIENCE INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Enhanced word embeddings using multi-semantic representation through lexical chains

期刊

INFORMATION SCIENCES

出版社

ELSEVIER SCIENCE INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文