4.8 Article

Novel functional sequences uncovered through a bovine multiassembly graph

Publisher

NATL ACAD SCIENCES
DOI: 10.1073/pnas.2101056118

Keywords

pangenome; genome graphs; reference genome; genetic diversity

Funding

  1. Functional Genomics Center Zurich
  2. Swiss NSF [310030_185229]
  3. Swiss Federal Office for Agriculture, Bern
  4. Swiss National Science Foundation (SNF) [310030_185229] Funding Source: Swiss National Science Foundation (SNF)

Ask authors/readers for more resources

This study builds a diverse pangenome from multiple cattle assemblies, revealing genetic differences between different breeds and capturing previously undetected genetic variations. By utilizing whole-genome sequencing and transcriptome data, the research uncovers genes and transcripts that were previously unseen in the traditional reference genome, opening up new avenues for genetic investigations.
Many genomic analyses start by aligning sequencing reads to a linear reference genome. However, linear reference genomes are imperfect, lacking millions of bases of unknown relevance and are unable to reflect the genetic diversity of populations. This makes reference-guided methods susceptible to reference-allele bias. To overcome such limitations, we build a pangenome from six referencequality assemblies from taurine and indicine cattle as well as yak. The pangenome contains an additional 70,329,827 bases compared to the Bos taurus reference genome. Our multiassembly approach reveals 30 and 10.1 million bases private to yak and indicine cattle, respectively, and between 3.3 and 4.4 million bases unique to each taurine assembly. Utilizing transcriptomes from 56 cattle, we show that these nonreference sequences encode transcripts that hitherto remained undetected from the B. taurus reference genome. We uncover genes, primarily encoding proteins contributing to immune response and pathogen-mediated immunomodulation, differentially expressed between Mycobacterium bovis-infected and noninfected cattle that are also undetectable in the B. taurus reference genome. Using whole-genome sequencing data of cattle from five breeds, we show that reads which were previously misaligned against the Bos taurus reference genome now align accurately to the pangenome sequences. This enables us to discover 83,250 polymorphic sites that segregate within and between breeds of cattle and capture genetic differentiation across breeds. Our work makes a so-far unused source of variation amenable to genetic investigations and provides methods and a framework for establishing and exploiting a more diverse reference genome.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available