4.7 Article

Unsupervised discovery of ancestry-informative markers and genetic admixture proportions in biobank-scale datasets

期刊

AMERICAN JOURNAL OF HUMAN GENETICS
卷 110, 期 2, 页码 313-325

出版社

CELL PRESS
DOI: 10.1016/j.ajhg.2022.12.008

关键词

-

向作者/读者索取更多资源

This paper introduces an unsupervised and scalable method for selecting ancestry-informative SNP markers and estimating admixture proportions. The method, implemented in the open-source software OpenADMIXTURE, shows scalability to modern biobank datasets in simulated and real data examples.
Admixture estimation plays a crucial role in ancestry inference and genome-wide association studies (GWASs). Computer programs such as ADMIXTURE and STRUCTURE are commonly employed to estimate the admixture proportions of sample individuals. However, these programs can be overwhelmed by the computational burdens imposed by the 105 to 106 samples and millions of markers commonly found in modern biobanks. An attractive strategy is to run these programs on a set of ancestry-informative SNP markers (AIMs) that exhibit substantially different frequencies across populations. Unfortunately, existing methods for identifying AIMs require knowing ancestry labels for a subset of the sample. This supervised learning approach creates a chicken and the egg scenario. In this paper, we present an unsupervised, scalable framework that seamlessly carries out AIM selection and likelihood-based estimation of admixture proportions. Our simulated and real data examples show that this approach is scalable to modern biobank datasets. OpenADMIXTURE, our Julia implementation of the method, is open source and available for free.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据