4.5 Article

Uncertainty quantification of reference-based cellular deconvolution algorithms

期刊

EPIGENETICS
卷 18, 期 1, 页码 -

出版社

TAYLOR & FRANCIS INC
DOI: 10.1080/15592294.2022.2137659

关键词

DNA methylation; epigenetic epidemiology; illumina 450K array; Illumina EPIC array; cellular heterogeneity

向作者/读者索取更多资源

Most epigenetic epidemiology studies have used bulk tissues for genome-wide profiling, but these can be confounded by variation in cellular composition. In this study, researchers developed a metric called CETYGO score to assess the accuracy of derived cellular heterogeneity variables. They found that the CETYGO score can distinguish inaccurate deconvolutions when applied to whole blood profiles. The study also revealed that estimating accurate cellular composition is influenced by technical and biological factors, such as gender, age, and smoking status.
The majority of epigenetic epidemiology studies to date have generated genome-wide profiles from bulk tissues (e.g., whole blood) however these are vulnerable to confounding from variation in cellular composition. Proxies for cellular composition can be mathematically derived from the bulk tissue profiles using a deconvolution algorithm; however, there is no method to assess the validity of these estimates for a dataset where the true cellular proportions are unknown. In this study, we describe, validate and characterize a sample level accuracy metric for derived cellular heterogeneity variables. The CETYGO score captures the deviation between a sample's DNA methylation profile and its expected profile given the estimated cellular proportions and cell type reference profiles. We demonstrate that the CETYGO score consistently distinguishes inaccurate and incomplete deconvolutions when applied to reconstructed whole blood profiles. By applying our novel metric to > 6,300 empirical whole blood profiles, we find that estimating accurate cellular composition is influenced by both technical and biological variation. In particular, we show that when using a common reference panel for whole blood, less accurate estimates are generated for females, neonates, older individuals and smokers. Our results highlight the utility of a metric to assess the accuracy of cellular deconvolution, and describe how it can enhance studies of DNA methylation that are reliant on statistical proxies for cellular heterogeneity. To facilitate incorporating our methodology into existing pipelines, we have made it freely available as an R package (https://github.com/ds420/CETYGO).

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据