期刊
BAYESIAN ANALYSIS
卷 4, 期 2, 页码 367-391出版社
INT SOC BAYESIAN ANALYSIS
DOI: 10.1214/09-BA414
关键词
adjusted Rand index; cluster analysis; Dirichlet process mixture model; Markov chain Monte Carlo
资金
- Deutsche Forschungsgemeinschaft [SFB 475]
In this paper we address the problem of obtaining a single clustering estimate (c) over cap based on an MCMC sample of clusterings c((1)),c((2))..., c((M)) from the posterior distribution of a Bayesian cluster model. Methods to derive (c) over cap when the number of groups K varies between the clusterings are reviewed and discussed. These include the maximum a posteriori (MAP) estimate and methods based on the posterior similarity matrix, a matrix containing the posterior probabilities that the observations i and j are in the same cluster. The posterior similarity matrix is related to a commonly used loss function by Binder (1978). Minimization of the loss is shown to be equivalent to maximizing the Randindex between estimated and true clustering. We propose new criteria for estimating a clustering, which are based on the posterior expected adjusted Rand index. The criteria are shown to possess a shrink age property and out perform Binder's loss in a simulation study and in an application to gene expression data. They also perform favorably compared to other clustering procedures.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据