☆ 4.7 Article

Efficiency of different strategies to mitigate ascertainment bias when using SNP panels in diversity studies

BMC GENOMICS (2018)

期刊

BMC GENOMICS

卷 19, 期 -, 页码 -

出版社

BMC

DOI: 10.1186/s12864-017-4416-9

关键词

SNP filtering; Ascertainment bias; LD based pruning; SNP panels

类别

Biotechnology & Applied Microbiology Genetics & Heredity

资金

German Federal Ministry of Education and Research [FKZ 0315528E]
Erasmus Mundus (through INSPIRE)

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Background: Single nucleotide polymorphism (SNP) panels have been widely used to study genomic variations within and between populations. Methods of SNP discovery have been a matter of debate for their potential of introducing ascertainment bias, and genetic diversity results obtained from the SNP genotype data can be misleading. We used a total of 42 chicken populations where both individual genotyped array data and pool whole genome resequencing (WGS) data were available. We compared allele frequency distributions and genetic diversity measures (expected heterozygosity (He), fixation index (F-ST) values, genetic distances and principal components analysis (PCA)) between the two data types. With the array data, we applied different filtering options (SNPs polymorphic in samples of two Gallus gallus wild populations, linkage disequilibrium (LD) based pruning and minor allele frequency (MAF) filtering, and combinations thereof) to assess their potential to mitigate the ascertainment bias. Results: Rare SNPs were underrepresented in the array data. Array data consistently overestimated He compared to WGS data, however, with a similar ranking of the breeds, as demonstrated by Spearman's rank correlations ranging between 0.956 and 0.985. LD based pruning resulted in a reduced overestimation of He compared to the other filters and slightly improved the relationship with the WGS results. The raw array data and those with polymorphic SNPs in the wild samples underestimated pairwise FST values between breeds which had low F-ST (< 0.15) in the WGS, and overestimated this parameter for high WGS F-ST (> 0.15). LD based pruned data underestimated FST in a consistent manner. The genetic distance matrix from LD pruned data was more closely related to that of WGS than the other array versions. PCA was rather robust in all array versions, since the population structure on the PCA plot was generally well captured in comparison to the WGS data. Conclusions: Among the tested filtering strategies, LD based pruning was found to account for the effects of ascertainment bias in the relatively best way, producing results which are most comparable to those obtained from WGS data and therefore is recommended for practical use.

Efficiency of different strategies to mitigate ascertainment bias when using SNP panels in diversity studies

期刊

BMC GENOMICS

出版社

BMC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Efficiency of different strategies to mitigate ascertainment bias when using SNP panels in diversity studies

期刊

BMC GENOMICS

出版社

BMC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文