4.6 Article

RNA-Seq Data for Reliable SNP Detection and Genotype Calling: Interest for Coding Variant Characterization and Cis-Regulation Analysis by Allele-Specific Expression in Livestock Species

期刊

FRONTIERS IN GENETICS
卷 12, 期 -, 页码 -

出版社

FRONTIERS MEDIA SA
DOI: 10.3389/fgene.2021.655707

关键词

RNA-seq; SNP calling; genotype calling; SNP annotation; allele-specific expression; livestock; chicken

资金

  1. European Union [633531]
  2. French National Agency of Research [ANR-11-SVS7]
  3. EpiBird ANR project, from French institutions as Institut Agro-AGROCAMPUS OUEST [PCS-09-GENM-010]
  4. INRAE [ELASETIC project (2012)]
  5. France Genomique National infrastructure - Investissement d'avenir program [ANR-10-INBS-09]
  6. Brittany region (France)
  7. INRAE (Animal Genetics Division)
  8. INRAE [Fr-AgENCODE project (2015-2017)]
  9. ChickStress project [ANR-13ADAP]

向作者/读者索取更多资源

RNA-seq data is a valuable yet unexploited resource for detecting SNPs and genotypes in various populations, especially in livestock species. This study compared SNP calling results using RNA-seq data in two chicken populations, proposing thresholds for genotype calling consistency and demonstrating the potential of RNA-seq data for gene expression regulation and population genetic analysis.
In addition to their common usages to study gene expression, RNA-seq data accumulated over the last 10 years are a yet-unexploited resource of SNPs in numerous individuals from different populations. SNP detection by RNA-seq is particularly interesting for livestock species since whole genome sequencing is expensive and exome sequencing tools are unavailable. These SNPs detected in expressed regions can be used to characterize variants affecting protein functions, and to study cis-regulated genes by analyzing allele-specific expression (ASE) in the tissue of interest. However, gene expression can be highly variable, and filters for SNP detection using the popular GATK toolkit are not yet standardized, making SNP detection and genotype calling by RNA-seq a challenging endeavor. We compared SNP calling results using GATK suggested filters, on two chicken populations for which both RNA-seq and DNA-seq data were available for the same samples of the same tissue. We showed, in expressed regions, a RNA-seq precision of 91% (SNPs detected by RNA-seq and shared by DNA-seq) and we characterized the remaining 9% of SNPs. We then studied the genotype (GT) obtained by RNA-seq and the impact of two factors (GT call-rate and read number per GT) on the concordance of GT with DNA-seq; we proposed thresholds for them leading to a 95% concordance. Applying these thresholds to 767 multi-tissue RNA-seq of 382 birds of 11 chicken populations, we found 9.5 M SNPs in total, of which similar to 550,000 SNPs per tissue and population with a reliable GT (call rate >= 50%) and among them, similar to 340,000 with a MAF >= 10%. We showed that such RNA-seq data from one tissue can be used to (i) detect SNPs with a strong predicted impact on proteins, despite their scarcity in each population (16,307 SIFT deleterious missenses and 590 stop-gained), (ii) study, on a large scale, cis-regulations of gene expression, with similar to 81% of protein-coding and 68% of long non-coding genes (TPM >= 1) that can be analyzed for ASE, and with similar to 29% of them that were cis-regulated, and (iii) analyze population genetic using such SNPs located in expressed regions. This work shows that RNA-seq data can be used with good confidence to detect SNPs and associated GT within various populations and used them for different analyses as GTEx studies.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据