4.6 Article

Estimating F-ST and kinship for arbitrary population structures

期刊

PLOS GENETICS
卷 17, 期 1, 页码 -

出版社

PUBLIC LIBRARY SCIENCE
DOI: 10.1371/journal.pgen.1009241

关键词

-

资金

  1. National Institutes of Health, National Human Genome Research Institute [R01HGOC6448]

向作者/读者索取更多资源

Kinship coefficients and F-ST are key parameters in modern population genetics studies, but existing estimators are biased due to restrictive assumptions when real datasets are considered. This study found that existing estimators can lead to severe biases and proposed a new estimation framework that is practically unbiased for any population structure. The new approach, demonstrated through theory and simulations, has the potential to significantly improve downstream analyses requiring accurate kinship and F-ST estimates.
F-ST and kinship are key parameters often estimated in modern population genetics studies in order to quantitatively characterize structure and relatedness. Kinship matrices have also become a fundamental quantity used in genome-wide association studies and heritability estimation. The most frequently-used estimators of F-ST and kinship are method-of-moments estimators whose accuracies depend strongly on the existence of simple underlying forms of structure, such as the independent subpopulations model of non-overlapping, independently evolving subpopulations. However, modern data sets have revealed that these simple models of structure likely do not hold in many populations, including humans. In this work, we analyze the behavior of these estimators in the presence of arbitrarily-complex population structures, which results in an improved estimation framework specifically designed for arbitrary population structures. After generalizing the definition of F-ST to arbitrary population structures and establishing a framework for assessing bias and consistency of genome-wide estimators, we calculate the accuracy of existing F-ST and kinship estimators under arbitrary population structures, characterizing biases and estimation challenges unobserved under their originally-assumed models of structure. We then present our new approach, which consistently estimates kinship and F-ST when the minimum kinship value in the dataset is estimated consistently. We illustrate our results using simulated genotypes from an admixture model, constructing a one-dimensional geographic scenario that departs nontrivially from the independent subpopulations model. Our simulations reveal the potential for severe biases in estimates of existing approaches that are overcome by our new framework. This work may significantly improve future analyses that rely on accurate kinship and F-ST estimates. Author summary Kinship coefficients and F-ST, which measure relatedness and population structure, respectively, are important quantities needed to accurately perform various analyses on genetic data, including genome-wide association studies and heritability estimation. However, existing estimators require restrictive assumptions of independence that are not met by real human and other datasets. In this work we find that existing estimators can be severely biased under reasonable scenarios, first by theoretically determining their properties, and then using an admixture simulation to illustrate our findings. In particular, we find that existing F-ST estimators are downwardly biased, and that existing kinship matrix estimators have related biases that are on average downward and of similar magnitude but vary for every pair of individuals. These insights led us to a new estimation framework for kinship and F-ST that is practically unbiased for any population structure, as demonstrated by theory and simulations. Our new approaches-available as open-source R packages-are easy to use and are more widely applicable than existing approaches, and they are likely to improve downstream analyses that require accurate kinship and F-ST estimates.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据