4.7 Article

Quality control and integration of genotypes from two calling pipelines for whole genome sequence data in the Alzheimer's disease sequencing project

期刊

GENOMICS
卷 111, 期 4, 页码 808-818

出版社

ACADEMIC PRESS INC ELSEVIER SCIENCE
DOI: 10.1016/j.ygeno.2018.05.004

关键词

Quality control; Whole genome sequencing; Atlas; GATK; Mendelian inconsistencies; Consensus calling

资金

  1. NIA [U01 AG032984, R01 AG033193, U24AG021886, U01AG016976, U24AG041689]
  2. NHLBI [HHSN268201100005C, HHSN268201100006C, HHSN268201100007C, HHSN268201100008C, HHSN268201100009C, HHSN268201100010C, HHSN268201100011C, HHSN268201100012C, 5RC2HL102419, HL105756]
  3. Austrian Stroke Prevention Study (ASPS) [5RC2HL102419, HL105756]
  4. Cardiovascular Health Study (CHS) [5RC2HL102419, HL105756]
  5. Erasmus Rucphen Family Study (ERF) [5RC2HL102419, HL105756]
  6. Framingham Heart Study (FHS) [5RC2HL102419, HL105756]
  7. Rotterdam Study (RS) [5RC2HL102419, HL105756]
  8. National Heart, Lung, and Blood Institute (NHLBI)
  9. Adult Changes in Thought (ACT)
  10. Alzheimer's Disease Centers (ADC)
  11. Chicago Health and Aging Project (CHAP)
  12. Memory and Aging Project (MAP)
  13. Mayo Clinic (MAYO)
  14. Mayo Parkinson's Disease controls, University of Miami
  15. Multi-Institutional Research in Alzheimer's Genetic Epidemiology Study (MIRAGE)
  16. National Cell Repository for Alzheimer's Disease (NCRAD)
  17. National Institute on Aging Late Onset Alzheimer's Disease Family Study (NIA-LOAD)
  18. Religious Orders Study (ROS)
  19. Texas Alzheimer's Research and Care Consortium (TARC)
  20. Vanderbilt University/Case Western Reserve University (VAN/CWRU)
  21. Washington Heights-Inwood Columbia Aging Project (WHICAP)
  22. Washington University Sequencing Project (WUSP)
  23. Columbia University Hispanic-Estudio Familiar de Influencia Genetica de Alzheimer (EFIGA)
  24. University of Toronto (UT)
  25. Genetic Differences (GD)
  26. Human Genome Sequencing Center at the Baylor College of Medicine [U54 HG003273]
  27. Broad Institute Genome Center [U54HG003067]
  28. Washington University Genome Institute [U54HG003079]
  29. NIH
  30. Intramural Research Program of the National Institutes of health, National Library of Medicine
  31. [UF1AG047133]
  32. [U01AG049505]
  33. [U01AG049506]
  34. [U01AG049507]
  35. [U01AG049508]
  36. [U54AG052427]

向作者/读者索取更多资源

The Alzheimer's Disease Sequencing Project (ADSP) performed whole genome sequencing (WGS) of 584 subjects from 111 multiplex families at three sequencing centers. Genotype calling of single nucleotide variants (SNVs) and insertion-deletion variants (indels) was performed centrally using GATK-HaplotypeCaller and Atlas V2. The ADSP Quality Control (QC) Working Group applied QC protocols to project-level variant call format files (VCEs) from each pipeline, and developed and implemented a novel protocol, termed consensus calling, to combine genotype calls from both pipelines into a single high-quality set. QC was applied to autosomal bi-allelic SNVs and indels, and included pipeline-recommended QC fillers, variant-level QC, and sample-level QC. Low-quality variants or genotypes were excluded, and sample outliers were noted. Quality was assessed by examining Mendelian inconsistencies (MIs) among 67 parent-offspring pairs, and MIs were used to establish additional genotype-specific filters for GATK calls. After QC, 578 subjects remained. Pipeline-specific QC excluded similar to 12.0% of GATK and 14.5% of Atlas SNVs. Between pipelines, similar to 91% of SNV genotypes across all QCed variants were concordant; 4.23% and 4.56% of genotypes were exclusive to Atlas or GATK, respectively; the remaining similar to 0.01% of discordant genotypes were excluded. For indels, variant-level QC excluded similar to 36.8% of GATK and 35.3% of Atlas indels. Between pipelines, similar to 55.6% of indel genotypes were concordant; while 10.3% and 28.3% were exclusive to Atlas or GATK, respectively; and similar to 0.29% of discordant genotypes were. The final WGS consensus dataset contains 27,896,774 SNVs and 3,133,926 indels and is publicly available.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据