4.3 Article

The Translational Machine: A novel machine-learning approach to illuminate complex genetic architectures

期刊

GENETIC EPIDEMIOLOGY
卷 45, 期 5, 页码 485-536

出版社

WILEY
DOI: 10.1002/gepi.22383

关键词

genetic architecture; genetic heterogeneity; machine learning; psychiatric genetics; random forests; rare variants

资金

  1. NIMH [R01MH077139]
  2. Sylvan C. Herman Foundation
  3. Stanley Medical Research Institute
  4. Swedish Research Council [2009-4959, 2011-4659]
  5. NIMH Grand Opportunity grant [RCMH089905]
  6. Waypoint Research Institute, Waypoint Centre for Mental Health Care

向作者/读者索取更多资源

The Translational Machine (TM) is a machine learning-based analytic pipeline that translates genotypic data into biologically contextualized features, reducing the confounding effects of population substructure and allowing for greater interpretability. The TM consists of three main components: feature engineering, feature filtering, and feature selection, which enable the evaluation of variant contributions under complex genetic architectures. This approach integrates biological information within conceptual frameworks and overcomes some limitations of existing methods.
The Translational Machine (TM) is a machine learning (ML)-based analytic pipeline that translates genotypic/variant call data into biologically contextualized features that richly characterize complex variant architectures and permit greater interpretability and biological replication. It also reduces potentially confounding effects of population substructure on outcome prediction. The TM consists of three main components. First, replicable but flexible feature engineering procedures translate genome-scale data into biologically informative features that appropriately contextualize simple variant calls/genotypes within biological and functional contexts. Second, model-free, nonparametric ML-based feature filtering procedures empirically reduce dimensionality and noise of both original genotype calls and engineered features. Third, a powerful ML algorithm for feature selection is used to differentiate risk variant contributions across variant frequency and functional prediction spectra. The TM simultaneously evaluates potential contributions of variants operative under polygenic and heterogeneous models of genetic architecture. Our TM enables integration of biological information (e.g., genomic annotations) within conceptual frameworks akin to geneset-/pathways-based and collapsing methods, but overcomes some of these methods' limitations. The full TM pipeline is executed in R. Our approach and initial findings from its application to a whole-exome schizophrenia case-control data set are presented. These TM procedures extend the findings of the primary investigation and yield novel results.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.3
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据