☆ 4.7 Article

Critical limitations of consensus clustering in class discovery

SCIENTIFIC REPORTS (2014)

期刊

SCIENTIFIC REPORTS

卷 4, 期 -, 页码 -

出版社

NATURE PORTFOLIO

DOI: 10.1038/srep06207

关键词

类别

Multidisciplinary Sciences

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Consensus clustering (CC) has been adopted for unsupervised class discovery in many genomic studies. It calculates how frequently two samples are grouped together in repeated clustering runs, and uses the resulting pairwise consensus rates for visual demonstration that clusters exist, for comparing cluster stability, and for estimating the optimal cluster number (K). However, the sensitivity and specificity of CC have not been systemically assessed. Through simulations we find that CC is able to divide randomly generated unimodal data into apparently stable clusters for a range of K, essentially reporting chance partitions of cluster-less data. For data with known structure, the common implementations of CC perform poorly in identifying the true K. These results suggest that CC should be applied and interpreted with caution. We found that a new metric based on CC, the proportion of ambiguously clustered pairs (PAC), infers K equally or more reliably than similar methods in simulated data with known K. Our overall approach involves the use of realistic null distributions based on the observed gene-gene correlation structure in a given study, and the implementation of PAC to more accurately estimate K. We discuss the strength of our approach in the context of other ensemble-based methods.

Critical limitations of consensus clustering in class discovery

期刊

SCIENTIFIC REPORTS

出版社

NATURE PORTFOLIO

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Critical limitations of consensus clustering in class discovery

期刊

SCIENTIFIC REPORTS

出版社

NATURE PORTFOLIO

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文