4.8 Article

Large multiple sequence alignments with a root-to-leaf regressive method

Journal

NATURE BIOTECHNOLOGY
Volume 37, Issue 12, Pages 1466-+

Publisher

NATURE PUBLISHING GROUP
DOI: 10.1038/s41587-019-0333-6

Keywords

-

Funding

  1. Centre for Genomic Regulation
  2. Spanish Plan Nacional
  3. Spanish Ministry of Economy and Competitiveness, 'Centro de Excelencia Severo Ochoa'
  4. ERC Consolidator Grant from the European Commission [771209 ChrFL]

Ask authors/readers for more resources

Multiple sequence alignments (MSAs) are used for structural(1,2) and evolutionary predictions(1,2), but the complexity of aligning large datasets requires the use of approximate solutions(3), including the progressive algorithm(4). Progressive MSA methods start by aligning the most similar sequences and subsequently incorporate the remaining sequences, from leaf to root, based on a guide tree. Their accuracy declines substantially as the number of sequences is scaled up(5). We introduce a regressive algorithm that enables MSA of up to 1.4 million sequences on a standard workstation and substantially improves accuracy on datasets larger than 10,000 sequences. Our regressive algorithm works the other way around from the progressive algorithm and begins by aligning the most dissimilar sequences. It uses an efficient divide-and-conquer strategy to run third-party alignment methods in linear time, regardless of their original complexity. Our approach will enable analyses of extremely large genomic datasets such as the recently announced Earth BioGenome Project, which comprises 1.5 million eukaryotic genomes(6).

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available