☆ 3.8 Proceedings Paper

CluWords: Exploiting Semantic Word Clustering Representation for Enhanced Topic Modeling

PROCEEDINGS OF THE TWELFTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING (WSDM'19) (2019)

期刊

PROCEEDINGS OF THE TWELFTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING (WSDM'19)

卷 -, 期 -, 页码 753-761

出版社

ASSOC COMPUTING MACHINERY

DOI: 10.1145/3289600.3291032

关键词

Data Representation; Topic Modeling; Word Embedding

类别

Computer Science, Artificial Intelligence Computer Science, Information Systems Computer Science, Theory & Methods

资金

CAPES
CNPq
Finep
Fapemig
Mundiale
Astrein
project InWeb
project MASWeb

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

In this paper, we advance the state-of-the-art in topic modeling by means of a new document representation based on pre-trained word embeddings for non-probabilistic matrix factorization. Specifically, our strategy, called CluWords, exploits the nearest words of a given pre-trained word embedding to generate meta-words capable of enhancing the document representation, in terms of both, syntactic and semantic information. The novel contributions of our solution include: (i) the introduction of a novel data representation for topic modeling based on syntactic and semantic relationships derived from distances calculated within a pre-trained word embedding space and (ii) the proposal of a new TF-IDF-based strategy, particularly developed to weight the CluWords. In our extensive experimentation evaluation, covering 12 datasets and 8 state-ofthe-art baselines, we exceed (with a few ties) in almost cases, with gains of more than 50% against the best baselines (achieving up to 80% against some runner-ups). Finally, we show that our method is able to improve document representation for the task of automatic text classification.

CluWords: Exploiting Semantic Word Clustering Representation for Enhanced Topic Modeling

期刊

PROCEEDINGS OF THE TWELFTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING (WSDM'19)

出版社

ASSOC COMPUTING MACHINERY

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

CluWords: Exploiting Semantic Word Clustering Representation for Enhanced Topic Modeling

期刊

PROCEEDINGS OF THE TWELFTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING (WSDM'19)

出版社

ASSOC COMPUTING MACHINERY

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文