4.8 Article

Addressing Missing Data in GC x GC Metabolomics: Identifying Missingness Type and Evaluating the Impact of Imputation Methods on Experimental Replication

期刊

ANALYTICAL CHEMISTRY
卷 94, 期 31, 页码 10912-10920

出版社

AMER CHEMICAL SOC
DOI: 10.1021/acs.analchem.1c04093

关键词

-

资金

  1. Cystic Fibrosis Foundation [Hill17P0, ADHS18-198861]
  2. Cystic Fibrosis Foundation Therapeutics [Hill18A0-CI]
  3. National Institutes of Health [HOFFMA20Y2-OUT, P30 DK089507]
  4. Arizona Biomedical Research Centre [R56HL139846]
  5. Arizona State University Graduate and Professional Student Association Publication Grant Program

向作者/读者索取更多资源

Missing data is a significant issue in metabolomics, and this study identifies the primary types of missingness and compares strategies for imputation using real-world data sets. The study introduces a within-replicate imputation approach and an R package for analysis. The results show the effectiveness of Gibbs sampler imputation and Random Forest for handling missing data types and suggest that within-replicate imputation improves peak quantification reproducibility for biomarker discovery.
Missing data is a significant issue in metabolomics that is often neglected when conducting data preprocessing, particularly when it comes to imputation. This can have serious implications for downstream statistical analyses and lead to misleading or uninterpretable inferences. In this study, we aim to identify the primary types of missingness that affect untargeted metabolomics data and compare strategies for imputation using two real-world comprehensive two-dimensional gas chromatography (GC X GC) data sets. We also present these goals in the context of experimental replication whereby imputation is conducted in a within-replicate-based fashion-the first description and evaluation of this strategy-and introduce an R package Metablmpute to carry out these analyses. Our results conclude that, in these two GC x GC data sets, missingness was most likely of the missing at-random MAR and missing not-at-random (MNAR) types as opposed to missing completely at-random (MCAR). Gibbs sampler imputation and Random Forest gave the best results when imputing MAR and MNAR compared against single-value imputation (zero, minimum, mean, median, and halfminimum) and other more sophisticated approaches (Bayesian principal component analysis and quantile regression imputation for left-censored data). When samples are replicated, within-replicate imputation approaches led to an increase in the reproducibility of peak quantification compared to imputation that ignores replication, suggesting that imputing with respect to replication may preserve potentially important features in downstream analyses for biomarker discovery.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据