4.6 Article

Analysis of the Batch Effect Due to Sequencing Center in Population Statistics Quantifying Rare Events in the 1000 Genomes Project

期刊

GENES
卷 13, 期 1, 页码 -

出版社

MDPI
DOI: 10.3390/genes13010044

关键词

1000 Genomes Project; population genetics; batch effect; sequencing center

资金

  1. Spanish Ministry of Science and Innovation
  2. Centro de Excelencia Severo Ochoa
  3. CERCA Programme/Generalitat de Catalunya
  4. Spanish Ministry of Science and Innovation through the Instituto de Salud Carlos III
  5. Generalitat de Catalunya through the Departament de Salut
  6. Departament d'Empresa i Coneixement
  7. European Regional Development Fund by the Spanish Ministry of Science and Innovation
  8. Secretaria d'Universitats i Recerca, Departament d'Empresa i Coneixement of the Generalitat de Catalunya
  9. Ministerio de Economia y Competitividad (Ministry of Economy and Competitiveness) [RYC-2013-14797, BFU2015-68759-P, PGC2018-098574-B-I00]
  10. Generalitat de Catalunya (Government of Catalonia) [GRC 2017 SGR 937]
  11. Government of Catalonia Agencia de Gestio d'Ajuts Universitaris i de Recerca (Agency for Management of University and Research Grants) [GRC 2014 SGR 615]

向作者/读者索取更多资源

The 1000 Genomes Project (1000G) is a valuable dataset for genomics research. Recent studies have found ghost mutation signals in 1000G, which can affect follow-up studies. This study demonstrates the association between sequencing center and loss of function alleles, singletons, and patterns of archaic introgression in 1000G.
The 1000 Genomes Project (1000G) is one of the most popular whole genome sequencing datasets used in different genomics fields and has boosting our knowledge in medical and population genomics, among other fields. Recent studies have reported the presence of ghost mutation signals in the 1000G. Furthermore, studies have shown that these mutations can influence the outcomes of follow-up studies based on the genetic variation of 1000G, such as single nucleotide variants (SNV) imputation. While the overall effect of these ghost mutations can be considered negligible for common genetic variants in many populations, the potential bias remains unclear when studying low frequency genetic variants in the population. In this study, we analyze the effect of the sequencing center in predicted loss of function (LoF) alleles, the number of singletons, and the patterns of archaic introgression in the 1000G. Our results support previous studies showing that the sequencing center is associated with LoF and singletons independent of the population that is considered. Furthermore, we observed that patterns of archaic introgression were distorted for some populations depending on the sequencing center. When analyzing the frequency of SNPs showing extreme patterns of genotype differentiation among centers for CEU, YRI, CHB, and JPT, we observed that the magnitude of the sequencing batch effect was stronger at MAF < 0.2 and showed different profiles between CHB and the other populations. All these results suggest that data from 1000G must be interpreted with caution when considering statistics using variants at low frequency.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据