☆ 4.5 Article

Topic Modeling in Embedding Spaces

TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (2020)

期刊

TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS

卷 8, 期 -, 页码 439-453

出版社

MIT PRESS

DOI: 10.1162/tacl_a_00325

关键词

类别

Computer Science, Artificial Intelligence Linguistics Language & Linguistics

资金

ONR [N00014-17-1-2131, N00014-15-1-2209]
NIH [1U01MH115727-01]
NSF [CCF-1740833]
DARPA [SD2 FA8750-18-C-0130]
Amazon
NVIDIA
Simons Foundation
EU's Horizon 2020 R&I programme under the Marie Sklodowska-Curie grant [706760]
Google

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Topic modeling analyzes documents to learn meaningful patterns of words. However, existing topic models fail to learn interpretable topics when working with large and heavy-tailed vocabularies. To this end, we develop the embedded topic model (ETM), a generative model of documents that marries traditional topic models with word embeddings. More specifically, the ETM models each word with a categorical distribution whose natural parameter is the inner product between the word's embedding and an embedding of its assigned topic. To fit the ETM, we develop an efficient amortized variational inference algorithm. The ETM discovers interpretable topics even with large vocabularies that include rare words and stop words. It outperforms existing document models, such as latent Dirichlet allocation, in terms of both topic quality and predictive performance.

Topic Modeling in Embedding Spaces

期刊

TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS

出版社

MIT PRESS

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Topic Modeling in Embedding Spaces

期刊

TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS

出版社

MIT PRESS

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文