☆ 4.3 Article

The Translational Machine: A novel machine-learning approach to illuminate complex genetic architectures

GENETIC EPIDEMIOLOGY (2021)

期刊

GENETIC EPIDEMIOLOGY

卷 45, 期 5, 页码 485-536

出版社

WILEY

DOI: 10.1002/gepi.22383

关键词

genetic architecture; genetic heterogeneity; machine learning; psychiatric genetics; random forests; rare variants

类别

Genetics & Heredity Mathematical & Computational Biology

资金

NIMH [R01MH077139]
Sylvan C. Herman Foundation
Stanley Medical Research Institute
Swedish Research Council [2009-4959, 2011-4659]
NIMH Grand Opportunity grant [RCMH089905]
Waypoint Research Institute, Waypoint Centre for Mental Health Care

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

The Translational Machine (TM) is a machine learning-based analytic pipeline that translates genotypic data into biologically contextualized features, reducing the confounding effects of population substructure and allowing for greater interpretability. The TM consists of three main components: feature engineering, feature filtering, and feature selection, which enable the evaluation of variant contributions under complex genetic architectures. This approach integrates biological information within conceptual frameworks and overcomes some limitations of existing methods.

The Translational Machine (TM) is a machine learning (ML)-based analytic pipeline that translates genotypic/variant call data into biologically contextualized features that richly characterize complex variant architectures and permit greater interpretability and biological replication. It also reduces potentially confounding effects of population substructure on outcome prediction. The TM consists of three main components. First, replicable but flexible feature engineering procedures translate genome-scale data into biologically informative features that appropriately contextualize simple variant calls/genotypes within biological and functional contexts. Second, model-free, nonparametric ML-based feature filtering procedures empirically reduce dimensionality and noise of both original genotype calls and engineered features. Third, a powerful ML algorithm for feature selection is used to differentiate risk variant contributions across variant frequency and functional prediction spectra. The TM simultaneously evaluates potential contributions of variants operative under polygenic and heterogeneous models of genetic architecture. Our TM enables integration of biological information (e.g., genomic annotations) within conceptual frameworks akin to geneset-/pathways-based and collapsing methods, but overcomes some of these methods' limitations. The full TM pipeline is executed in R. Our approach and initial findings from its application to a whole-exome schizophrenia case-control data set are presented. These TM procedures extend the findings of the primary investigation and yield novel results.

The Translational Machine: A novel machine-learning approach to illuminate complex genetic architectures

期刊

GENETIC EPIDEMIOLOGY

出版社

WILEY

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

The Translational Machine: A novel machine-learning approach to illuminate complex genetic architectures

期刊

GENETIC EPIDEMIOLOGY

出版社

WILEY

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文