4.4 Article

A CORRELATED TOPIC MODEL OF SCIENCE

期刊

ANNALS OF APPLIED STATISTICS
卷 1, 期 1, 页码 17-35

出版社

INST MATHEMATICAL STATISTICS
DOI: 10.1214/07-AOAS114

关键词

Hierarchical models; approximate posterior inference; variational methods; text analysis

资金

  1. NSF [IIS-0312814, IIS-0427206]
  2. DARPA CALO
  3. Google

向作者/读者索取更多资源

Topic models. such as latent Dirichlet allocation (LDA), call he useful tools for the statistical analysis of document collections and other discrete data. The LDA model assumes that the words of each document arise from a mixture of topics, each of which is it distribution over the vocabulary. A limitation of LDA is the inability to model topic correlation even though, for example, a document about genetics is more likely to also be about disease than X-ray astronomy. This limitation Sterns from the use of the Dirichlet distribution to model the variability among the topic proportions. In this paper we develop the correlated topic model (CTM), where the topic proportions exhibit correlation via the logistic normal distribution [J. Roy. Statist. Soc. Ser. B 44 (1982) 139-177]. We derive a fast variational inference algorithm for approximate posterior inference in this model. which is complicated by the fact that the logistic normal is not conjugate to the multinomial. We apply the CTM to the articles from Science Published from 1990-1999, a data set that comprises 57M words. The CTM gives a better fit of the data than LDA, and we demonstrate its Use as an exploratory tool of large document collections.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.4
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据