4.7 Article

TIGER: technical variation elimination for metabolomics data using ensemble learning architecture

期刊

BRIEFINGS IN BIOINFORMATICS
卷 23, 期 2, 页码 -

出版社

OXFORD UNIV PRESS
DOI: 10.1093/bib/bbab535

关键词

metabolomics; machine learning; ensemble learning; predictive modelling; longitudinal analysis

资金

  1. Helmholtz Zentrum Munchen -German Research Center for Environmental Health - German Federal Ministry of Education and Research
  2. State of Bavaria
  3. Munich Center of Health Sciences (MC - Health), Ludwig-Maximilians Universitat, LMUinnovativ
  4. German Federal Ministry of Health (Berlin, Germany)
  5. German Federal Ministry of Education and Research
  6. European Union - European Institute of Innovation and Technology (EIT) [210997-iPDM-GO]
  7. European Union [821508]

向作者/读者索取更多资源

Large metabolomics datasets often contain unwanted technical variations that can affect data analysis. Existing methods have limitations in handling multiple types of quality control samples. In this study, a non-parametric method called TIGER is developed, which outperforms other methods in terms of robustness and reliability.
Large metabolomics datasets inevitably contain unwanted technical variations which can obscure meaningful biological signals and affect how this information is applied to personalized healthcare. Many methods have been developed to handle unwanted variations. However, the underlying assumptions of many existing methods only hold for a few specific scenarios. Some tools remove technical variations with models trained on quality control (QC) samples which may not generalize well on subject samples. Additionally, almost none of the existing methods supports datasets with multiple types of QC samples, which greatly limits their performance and flexibility. To address these issues, a non-parametric method TIGER (Technical variation elImination with ensemble learninG architEctuRe) is developed in this study and released as an R package (https://CRAN.R-project.org/package=TIGERr). TIGER integrates the random forest algorithm into an adaptable ensemble learning architecture. Evaluation results show that TIGER outperforms four popular methods with respect to robustness and reliability on three human cohort datasets constructed with targeted or untargeted metabolomics data. Additionally, a case study aiming to identify age-associated metabolites is performed to illustrate how TIGER can be used for cross-kit adjustment in a longitudinal analysis with experimental data of three time-points generated by different analytical kits. A dynamic website is developed to help evaluate the performance of TIGER and examine the patterns revealed in our longitudinal analysis (https://han-siyu.github.io/TIGER_web/). Overall, TIGER is expected to be a powerful tool for metabolomics data analysis.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据