期刊
GENETIC EPIDEMIOLOGY
卷 45, 期 5, 页码 485-536出版社
WILEY
DOI: 10.1002/gepi.22383
关键词
genetic architecture; genetic heterogeneity; machine learning; psychiatric genetics; random forests; rare variants
资金
- NIMH [R01MH077139]
- Sylvan C. Herman Foundation
- Stanley Medical Research Institute
- Swedish Research Council [2009-4959, 2011-4659]
- NIMH Grand Opportunity grant [RCMH089905]
- Waypoint Research Institute, Waypoint Centre for Mental Health Care
The Translational Machine (TM) is a machine learning-based analytic pipeline that translates genotypic data into biologically contextualized features, reducing the confounding effects of population substructure and allowing for greater interpretability. The TM consists of three main components: feature engineering, feature filtering, and feature selection, which enable the evaluation of variant contributions under complex genetic architectures. This approach integrates biological information within conceptual frameworks and overcomes some limitations of existing methods.
The Translational Machine (TM) is a machine learning (ML)-based analytic pipeline that translates genotypic/variant call data into biologically contextualized features that richly characterize complex variant architectures and permit greater interpretability and biological replication. It also reduces potentially confounding effects of population substructure on outcome prediction. The TM consists of three main components. First, replicable but flexible feature engineering procedures translate genome-scale data into biologically informative features that appropriately contextualize simple variant calls/genotypes within biological and functional contexts. Second, model-free, nonparametric ML-based feature filtering procedures empirically reduce dimensionality and noise of both original genotype calls and engineered features. Third, a powerful ML algorithm for feature selection is used to differentiate risk variant contributions across variant frequency and functional prediction spectra. The TM simultaneously evaluates potential contributions of variants operative under polygenic and heterogeneous models of genetic architecture. Our TM enables integration of biological information (e.g., genomic annotations) within conceptual frameworks akin to geneset-/pathways-based and collapsing methods, but overcomes some of these methods' limitations. The full TM pipeline is executed in R. Our approach and initial findings from its application to a whole-exome schizophrenia case-control data set are presented. These TM procedures extend the findings of the primary investigation and yield novel results.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据