☆ 4.5 Article

Stochastic Variational Optimization of a Hierarchical Dirichlet Process Latent Beta-Liouville Topic Model

ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA (2022)

期刊

ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA

卷 16, 期 5, 页码 -

出版社

ASSOC COMPUTING MACHINERY

DOI: 10.1145/3502727

关键词

Hierarchical dirichlet process; Bayesian nonparametric topic model; Beta-Liouville distribution; stochastic and variational optimizations; predictive distributions

类别

Computer Science, Information Systems Computer Science, Software Engineering

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This article introduces a Bayesian nonparametric (BNP) approach to handle model selection and topic sharing in topic models. By applying the Hierarchical Dirichlet process (HDP) to the BNP topic model, the model is able to more effectively characterize dependencies between documents and generate more robust and realistic compression algorithms.

In topic models, collections are organized as documents where they arise as mixtures over latent clusters called topics. A topic is a distribution over the vocabulary. In large-scale applications, parametric or finite topic mixture models such as LDA (latent Dirichlet allocation) and its variants are very restrictive in performance due to their reduced hypothesis space. In this article, we address the problem related to model selection and sharing ability of topics across multiple documents in standard parametric topic models. We propose as an alternative a BNP (Bayesian nonparametric) topic model where the HDP (hierarchical Dirichlet process) prior models documents topic mixtures through their multinomials on infinite simplex. We, therefore, propose asymmetric BL (Beta-Liouville) as a diffuse base measure at the corpus level DP (Dirichlet process) over a measurable space. This step illustrates the highly heterogeneous structure in the set of all topics that describes the corpus probability measure. For consistency in posterior inference and predictive distributions, we efficiently characterize random probability measures whose limits are the global and local DPs to approximate the HDP from the stick-breaking formulation with the GEM (Griffiths-Engen-McCloskey) random variables. Due to the diffuse measure with the BL prior as conjugate to the count data distribution, we obtain an improved version of the standard HDP that is usually based on symmetric Dirichlet (Dir). In addition, to improve coordinate ascent framework while taking advantage of its deterministic nature, our model implements an online optimization method based on stochastic, at document level, variational inference to accommodate fast topic learning when processing large collections of text documents with natural gradient. The high value in the predictive likelihood per document obtained when compared to the performance of its competitors is also consistent with the robustness of our fully asymmetric BL-based HDP. While insuring the predictive accuracy of the model using the probability of the held-out documents, we also added a combination of metrics such as the topic coherence and topic diversity to improve the quality and interpretability of the topics discovered. We also compared the performance of our model using thesemetrics against the standard symmetric LDA. We show that online HDP-LBLA (Latent BL Allocation)'s performance is the asymptote for parametric topicmodels. The accuracy in the results (improved predictive distributions of the held out) is a product of the model's ability to efficiently characterize dependency between documents (topic correlation) as now they can easily share topics, resulting in a much robust and realistic compression algorithm for information modeling.

Stochastic Variational Optimization of a Hierarchical Dirichlet Process Latent Beta-Liouville Topic Model

期刊

ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA

出版社

ASSOC COMPUTING MACHINERY

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Stochastic Variational Optimization of a Hierarchical Dirichlet Process Latent Beta-Liouville Topic Model

期刊

ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA

出版社

ASSOC COMPUTING MACHINERY

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文