4.7 Article

A comparison of cluster analysis methods using DNA methylation data

期刊

BIOINFORMATICS
卷 20, 期 12, 页码 1896-1904

出版社

OXFORD UNIV PRESS
DOI: 10.1093/bioinformatics/bth176

关键词

-

资金

  1. NCI NIH HHS [R01 CA097346, CA097346, R01 CA096958, R01 CA001815] Funding Source: Medline
  2. NIEHS NIH HHS [R21 ES011672] Funding Source: Medline

向作者/读者索取更多资源

Motivation: Aberrant DNA methylation is common in cancer. DNA methylation profiles differ between tumor types and subtypes and provide a powerful diagnostic tool for identifying clusters of samples and/or genes. DNA methylation data obtained with the quantitative, highly sensitive MethyLight technology is not normally distributed; it frequently contains an excess of zeros. Established tools to analyze this type of data do not exist. Here, we evaluate a variety of methods for cluster analysis to determine which is most reliable. Results: We introduce a Bernoulli-lognormal mixture model for clustering DNA methylation data obtained using MethyLight. We model the outcomes using a two-part distribution having discrete and continuous components. It is compared with standard cluster analysis approaches for continuous data and for discrete data. In a simulation study, we find that the two-part model has the lowest classification error rate for mixture outcome data compared with other approaches. The methods are illustrated using DNA methylation data from a study of lung cancer cell lines. Compared with competing hierarchical clustering methods, the mixture model approaches have the lowest cross-validation error for detecting lung cancer subtype (non-small versus small cell). The Bernoulli-lognormal mixture assigns observations to subgroups with the lowest uncertainty.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据