4.7 Article

Effective interrelation of Bayesian nonparametric document clustering and embedded-topic modeling

期刊

KNOWLEDGE-BASED SYSTEMS
卷 234, 期 -, 页码 -

出版社

ELSEVIER
DOI: 10.1016/j.knosys.2021.107591

关键词

Text analysis; Word embeddings; Topic modeling; Document clustering; Bayesian nonparametrics; Dirichlet process clustering

向作者/读者索取更多资源

An innovative unsupervised approach is presented for the interrelationship of topic modeling with document clustering, utilizing Bayesian generative modeling and posterior inference. The approach seamlessly unifies and jointly carries out the two tasks, enabling automatic inference of the relationships between word-embedding topics and cluster components. Extensive empirical study on benchmark real-world corpora demonstrates the method's higher effectiveness in partitioning text collections and discovering their semantics.
Topic modeling can be synergically interrelated with document clustering. We present an innovative unsupervised approach to the interrelationship of topic modeling with document clustering. The devised approach exploits Bayesian generative modeling and posterior inference, to seamlessly unify and jointly carry out the two tasks, respectively. Specifically, a Bayesian nonparametric model of text collections, formulates an unprecedented interrelationship of word-embedding topics with a Dirichlet process mixture of cluster components. The latter enables countably infinite clusters and permits the automatic inference of their actual number in a statistically principled manner. All latent clusters and topics under the foresaid model are inferred through collapsed Gibbs sampling and parameter estimation. An extensive empirical study of the presented approach is effected on benchmark real-world corpora of text documents. The experimental results demonstrate its higher effectiveness in partitioning text collections and coherently discovering their semantics, compared to state-of-the-art competitors and tailored baselines. Computational efficiency is also looked into under different conditions, in order to provide an insightful analysis of scalability. (C) 2021 Elsevier B.V. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据