4.6 Article

RAFFI: Accurate and fast familial relationship inference in large scale biobank studies using RaPID

期刊

PLOS GENETICS
卷 17, 期 1, 页码 -

出版社

PUBLIC LIBRARY SCIENCE
DOI: 10.1371/journal.pgen.1009315

关键词

-

资金

  1. National Institutes of Health [R01 HG010086, OT2-OD002751, R35-CA197449, U19-CA203654, U01-HG009088]

向作者/读者索取更多资源

Inferring familial relationships from whole-genome genetic data is crucial for genome-wide association studies, but current leading methods may not be efficient for very large cohorts. RAFFI is a fast and accurate relationship inference method that leverages IBD segments, showing robustness against errors and significantly faster performance compared to KING, especially for distant relatives in large cohorts.
Inference of relationships from whole-genome genetic data of a cohort is a crucial prerequisite for genome-wide association studies. Typically, relationships are inferred by computing the kinship coefficients (phi) and the genome-wide probability of zero IBD sharing (pi(0)) among all pairs of individuals. Current leading methods are based on pairwise comparisons, which may not scale up to very large cohorts (e.g., sample size >1 million). Here, we propose an efficient relationship inference method, RAFFI. RAFFI leverages the efficient RaPID method to call IBD segments first, then estimate the phi and pi(0) from detected IBD segments. This inference is achieved by a data-driven approach that adjusts the estimation based on phasing quality and genotyping quality. Using simulations, we showed that RAFFI is robust against phasing/genotyping errors, admix events, and varying marker densities, and achieves higher accuracy compared to KING, the current leading method, especially for more distant relatives. When applied to the phased UK Biobank data with similar to 500K individuals, RAFFI is approximately 18 times faster than KING. We expect RAFFI will offer fast and accurate relatedness inference for even larger cohorts. Author summary Inferring familial relationships has a wide range of applications. Family-based genome-wide association studies and population-based GWAS both require genetic relationships. Inferring relationship is essential for unknown familial structures and can be used to correct pedigree information due to false paternity, sample switches, or unregistered adoption. Current approaches for inferring relationships are not scalable for large cohorts comprising millions of individuals. Here, we present a fast and flexible method, called RAFFI, using Identical by Descent (IBD) segments. IBD segments are uninterrupted DNA segments inherited from a common ancestor. Relationships are usually inferred by computing the kinship coefficients and the genome-wide probability of zero IBD sharing among all pairs of individuals. In the first step, we search for IBD segments using RaPID which avoids a pairwise comparison of all individuals in a haplotype panel. In the second step, we compute the kinship coefficients to infer the relationships. To make our method robust against genotyping and phasing error, the thresholds of kinship coefficients for different degrees of relatedness are adjusted. As a result, the lower detection power of IBD segments due to phasing errors or misspecification of the genotyping error rate will not comprise the inference of relationships.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据