☆ 3.8 Proceedings Paper

Short-Text Topic Modeling via Non-negative Matrix Factorization Enriched with Local Word-Context Correlations

WEB CONFERENCE 2018: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW2018) (2018)

期刊

WEB CONFERENCE 2018: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW2018)

卷 -, 期 -, 页码 1105-1114

出版社

ASSOC COMPUTING MACHINERY

DOI: 10.1145/3178876.3186009

关键词

Topic modeling; short texts; non-negative matrix factorization; word embedding

类别

Computer Science, Interdisciplinary Applications Computer Science, Theory & Methods

资金

National Science Foundation [IIS-1619028, IIS-1707498, IIS-1646881]
National Research Foundation of Korea (NRF) - Korean government (MSIP) [NRF-2016R1C1B2015924]
Direct For Computer & Info Scie & Enginr
Div Of Information & Intelligent Systems [1707498] Funding Source: National Science Foundation

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Being a prevalent form of social communications on the Internet, billions of short texts are generated everyday. Discovering knowledge from them has gained a lot of interest from both industry and academia. The short texts have a limited contextual information, and they are sparse, noisy and ambiguous, and hence, automatically learning topics from them remains an important challenge. To tackle this problem, in this paper, we propose a semantics-assisted non-negative matrix factorization (SeaNMF) model to discover topics for the short texts. It effectively incorporates the word-context semantic correlations into the model, where the semantic relationships between the words and their contexts are learned from the skip-gram view of the corpus. The SeaNMF model is solved using a block coordinate descent algorithm. We also develop a sparse variant of the SeaNMF model which can achieve a better model interpretability. Extensive quantitative evaluations on various real-world short text datasets demonstrate the superior performance of the proposed models over several other state-of-the-art methods in terms of topic coherence and classification accuracy. The qualitative semantic analysis demonstrates the interpretability of our models by discovering meaningful and consistent topics. With a simple formulation and the superior performance, SeaNMF can be an effective standard topic model for short texts.

Short-Text Topic Modeling via Non-negative Matrix Factorization Enriched with Local Word-Context Correlations

期刊

WEB CONFERENCE 2018: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW2018)

出版社

ASSOC COMPUTING MACHINERY

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Short-Text Topic Modeling via Non-negative Matrix Factorization Enriched with Local Word-Context Correlations

期刊

WEB CONFERENCE 2018: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW2018)

出版社

ASSOC COMPUTING MACHINERY

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文