4.8 Article

AnchorWave: Sensitive alignment of genomes with high sequence diversity, extensive structural and polymorphism, and whole-genome duplication

Publisher

NATL ACAD SCIENCES
DOI: 10.1073/pnas.2113075119

Keywords

sensitive genome alignment whole-genome duplication genome comparison; transposable element variation; regulatory element alignment

Funding

  1. United States Department of Agriculture Agricultural Research Service
  2. NSF [1854828, 1822330]
  3. European Union [825111]
  4. European Union Regional Development Fund [001-P-001723]
  5. National Natural Science Foundation of China [31900486]
  6. NSF Postdoctoral Research Fellowship in Biology [1907343]
  7. Spanish Ministry of Economy, Industry, and Competitiveness under Ramon y Cajal (RYC) [RYC-2016-21104]
  8. Division Of Integrative Organismal Systems
  9. Direct For Biological Sciences [1907343, 1822330] Funding Source: National Science Foundation

Ask authors/readers for more resources

This study introduces a genome alignment method called AnchorWave, which shows significant improvement when applied to species with complex genomes. It can accurately identify multikilobase indels and improve the recall rate of transcription factor-binding sites.
Millions of species are currently being sequenced, and their genomes are being compared. Many of them have more complex genomes than model systems and raise novel challenges for genome alignment. Widely used local alignment strategies often produce limited or incongruous results when applied to genomes with dispersed repeats, long indels, and highly diverse sequences. Moreover, alignment using many-to-many or reciprocal best hit approaches conflicts with well-studied patterns between species with different rounds of whole-genome duplication. Here, we introduce Anchored Wavefront alignment (AnchorWave), which performs whole-genome duplication-informed collinear anchor identification between genomes and performs base pair-resolved global alignment for collinear blocks using a two-piece affine gap cost strategy. This strategy enables AnchorWave to precisely identify multikilobase indels generated by transposable element (TE) presence/absence variants (PAVs). When aligning two maize genomes, AnchorWave successfully recalled 87% of previously reported TE PAVs. By contrast, other genome alignment tools showed low power for TE PAV recall. AnchorWave precisely aligns up to three times more of the genome as position matches or indels than the closest competitive approach when comparing diverse genomes. Moreover, AnchorWave recalls transcription factor-binding sites at a rate of 1.05- to 74.85-fold higher than other tools with significantly lower false-positive alignments. AnchorWave complements available genome alignment tools by showing obvious improvement when applied to genomes with dispersed repeats, active TEs, high sequence diversity, and wholegenome duplication variation.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available