☆ 4.5 Article

Short text clustering based on Pitman-Yor process mixture model

APPLIED INTELLIGENCE (2018)

Journal

APPLIED INTELLIGENCE

Volume 48, Issue 7, Pages 1802-1812

Publisher

SPRINGER

DOI: 10.1007/s10489-017-1055-4

Keywords

LDA; Pitman-Yor process; Short text clustering

Funding

Natural Science Foundation of Jiangsu Province of China [BK20170513, BK20161338]
National Natural Science Foundation of China [61703362, 61402203]
Natural Science Foundation of the Higher Education Institutions of Jiangsu Province of China [17KJB520045]
Science and Technology Planning Project of Yangzhou of China [YZ2016238]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

For finding the appropriate number of clusters in short text clustering, models based on Dirichlet Multinomial Mixture (DMM) require the maximum possible cluster number before inferring the real number of clusters. However, it is difficult to choose a proper number as we do not know the true number of clusters in short texts beforehand. The cluster distribution in DMM based on Dirichlet process as prior goes down exponentially as the number of clusters increases. Therefore, we propose a novel model based on Pitman-Yor Process to capture the power-law phenomenon of the cluster distribution in the paper. Specifically, each text chooses one of the active clusters or a new cluster with probabilities derived from the Pitman-Yor Process Mixture model (PYPM). Discriminative words and nondiscriminative words are identified automatically to help enhance text clustering. Parameters are estimated efficiently by collapsed Gibbs sampling and experimental results show PYPM is robust and effective comparing with the state-of-the-art models.

Short text clustering based on Pitman-Yor process mixture model

Journal

APPLIED INTELLIGENCE

Publisher

SPRINGER

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Short text clustering based on Pitman-Yor process mixture model

Journal

APPLIED INTELLIGENCE

Publisher

SPRINGER

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper