☆ 4.3 Review

Brief review of regression-based and machine learning methods in genetic epidemiology: the Genetic Analysis Workshop 17 experience

GENETIC EPIDEMIOLOGY (2011)

期刊

GENETIC EPIDEMIOLOGY

卷 35, 期 -, 页码 S5-S11

出版社

WILEY

DOI: 10.1002/gepi.20642

关键词

unsupervised learning; supervised learning; cluster analysis; logistic regression; Poisson regression; logic regression; LASSO; ridge regression; decision trees; random forests; cross-validation; software

类别

Genetics & Heredity Mathematical & Computational Biology

资金

National Institute for Arthritis and Musculoskeletal and Skin Diseases
National Human Genome Research Institute
Center for Information Technology of the National Institutes of Health
National Institutes of Health, National Heart, Lung, and Blood Institute [HL100245]
CENTER FOR INFORMATION TECHNOLOGY [ZIACT000268, ZIACT000271] Funding Source: NIH RePORTER
NATIONAL HEART, LUNG, AND BLOOD INSTITUTE [RC1HL100245] Funding Source: NIH RePORTER
NATIONAL HUMAN GENOME RESEARCH INSTITUTE [ZIAHG000153] Funding Source: NIH RePORTER

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Genetics Analysis Workshop 17 provided common and rare genetic variants from exome sequencing data and simulated binary and quantitative traits in 200 replicates. We provide a brief review of the machine learning and regression-based methods used in the analyses of these data. Several regression and machine learning methods were used to address different problems inherent in the analyses of these data, which are high-dimension, low-sample-size data typical of many genetic association studies. Unsupervised methods, such as cluster analysis, were used for data segmentation and, subset selection. Supervised learning methods, which include regression-based methods (e.g., generalized linear models, logic regression, and regularized regression) and tree-based methods (e.g., decision trees and random forests), were used for variable selection (selecting genetic and clinical features most associated or predictive of outcome) and prediction (developing models using common and rare genetic variants to accurately predict outcome), with the outcome being case-control status or quantitative trait value. We include a discussion of cross-validation for model selection and assessment, and a description of available software resources for these methods. Genet. Epidemiol. 35:S5S11, 2011. (C) 2011 Wiley Periodicals, Inc.

Brief review of regression-based and machine learning methods in genetic epidemiology: the Genetic Analysis Workshop 17 experience

期刊

GENETIC EPIDEMIOLOGY

出版社

WILEY

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Brief review of regression-based and machine learning methods in genetic epidemiology: the Genetic Analysis Workshop 17 experience

期刊

GENETIC EPIDEMIOLOGY

出版社

WILEY

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文