4.5 Article

The Nested Chinese Restaurant Process and Bayesian Nonparametric Inference of Topic Hierarchies

Journal

JOURNAL OF THE ACM
Volume 57, Issue 2, Pages -

Publisher

ASSOC COMPUTING MACHINERY
DOI: 10.1145/1667053.1667056

Keywords

Algorithms; Experimentation; Bayesian nonparametric statistics; unsupervised learning

Funding

  1. ONR [175-6343]
  2. NSF CAREER [0745520]
  3. Google and Microsoft Research
  4. NSF [BCS-0631518]
  5. DARPA CALO

Ask authors/readers for more resources

We present the nested Chinese restaurant process (nCRP), a stochastic process that assigns probability distributions to ensembles of infinitely deep, infinitely branching trees. We show how this stochastic process can be used as a prior distribution in a Bayesian nonparametric model of document collections. Specifically, we present an application to information retrieval in which documents are modeled as paths down a random tree, and the preferential attachment dynamics of the nCRP leads to clustering of documents according to sharing of topics at multiple levels of abstraction. Given a corpus of documents, a posterior inference algorithm finds an approximation to a posterior distribution over trees, topics and allocations of words to levels of the tree. We demonstrate this algorithm on collections of scientific abstracts from several journals. This model exemplifies a recent trend in statistical machine learning-the use of Bayesian nonparametric methods to infer distributions on flexible data structures.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available