4.5 Article

Significant variation in the performance of DNA methylation predictors across data preprocessing and normalization strategies

期刊

GENOME BIOLOGY
卷 23, 期 1, 页码 -

出版社

BMC
DOI: 10.1186/s13059-022-02793-w

关键词

DNA methylation; Infinium MethylationEPIC array; DNAm predictors; Consistency; Replicability; Biomarkers; Jackson Heart Study

资金

  1. NIH [1U01AG060908 -01]

向作者/读者索取更多资源

This study systematically evaluates the effect of different data preprocessing and normalization strategies on the consistency of DNAm-based predictors and finds that appropriate processing and normalization steps can improve their consistency. The successful or unsuccessful removal of technical variation also significantly impacts downstream phenotypic association analysis.
Background DNA methylation (DNAm)-based predictors hold great promise to serve as clinical tools for health interventions and disease management. While these algorithms often have high prediction accuracy, the consistency of their performance remains to be determined. We therefore conduct a systematic evaluation across 101 different DNAm data preprocessing and normalization strategies and assess how each analytical strategy affects the consistency of 41 DNAm-based predictors. Results Our analyses are conducted in a large EPIC DNAm array dataset from the Jackson Heart Study (N = 2053) that included 146 pairs of technical replicate samples. By estimating the average absolute agreement between replicate pairs, we show that 32 out of 41 predictors (78%) demonstrate excellent consistency when appropriate data processing and normalization steps are implemented. Across all pairs of predictors, we find a moderate correlation in performance across analytical strategies (mean rho = 0.40, SD = 0.27), highlighting significant heterogeneity in performance across algorithms. Successful or unsuccessful removal of technical variation furthermore significantly impacts downstream phenotypic association analysis, such as all-cause mortality risk associations. Conclusions We show that DNAm-based algorithms are sensitive to technical variation. The right choice of data processing strategy is important to achieve reproducible estimates and improve prediction accuracy in downstream phenotypic association analyses. For each of the 41 DNAm predictors, we report its degree of consistency and provide the best performing analytical strategy as a guideline for the research community. As DNAm-based predictors become more and more widely used, our work helps improve their performance and standardize their implementation.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据