4.7 Article

Bayesian mixture model based clustering of replicated microarray data

期刊

BIOINFORMATICS
卷 20, 期 8, 页码 1222-1232

出版社

OXFORD UNIV PRESS
DOI: 10.1093/bioinformatics/bth068

关键词

-

资金

  1. NHGRI NIH HHS [1R21HG002849-01] Funding Source: Medline
  2. NHLBI NIH HHS [1P50HL073996-01, 5R01HL072370-02] Funding Source: Medline
  3. NIAID NIH HHS [1U54AI057141-01, 1R21AI052028-01, 5P01 AI052106-02] Funding Source: Medline
  4. NIDA NIH HHS [1 P30 DA015625-01] Funding Source: Medline
  5. NIDDK NIH HHS [5U24DK058813-02] Funding Source: Medline
  6. NIEHS NIH HHS [1U19ES011387-02, ES04908-12, 2P30 ES06096-11] Funding Source: Medline

向作者/读者索取更多资源

Motivation: Identifying patterns of co-expression in microarray data by cluster analysis has been a productive approach to uncovering molecular mechanisms underlying biological processes under investigation. Using experimental replicates can generally improve the precision of the cluster analysis by reducing the experimental variability of measurements. In such situations, Bayesian mixtures allow for an efficient use of information by precisely modeling between-replicates variability. Results: We developed different variants of Bayesian mixture based clustering procedures for clustering gene expression data with experimental replicates. In this approach, the statistical distribution of microarray data is described by a Bayesian mixture model. Clusters of co-expressed genes are created from the posterior distribution of clusterings, which is estimated by a Gibbs sampler. We define infinite and finite Bayesian mixture models with different between-replicates variance structures and investigate their utility by analyzing synthetic and the real-world datasets. Results of our analyses demonstrate that (1) improvements in precision achieved by performing only two experimental replicates can be dramatic when the between-replicates variability is high, (2) precise modeling of intra-gene variability is important for accurate identification of co-expressed genes and (3) the infinite mixture model with the 'elliptical' between-replicates variance structure performed overall better than any other method tested. We also introduce a heuristic modification to the Gibbs sampler based on the 'reverse annealing' principle. This modification effectively overcomes the tendency of the Gibbs sampler to converge to different modes of the posterior distribution when started from different initial positions. Finally, we demonstrate that the Bayesian infinite mixture model with 'elliptical' variance structure is capable of identifying the underlying structure of the data without knowing the 'correct' number of clusters.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据