4.7 Article

SCAMPP: Scaling Alignment-Based Phylogenetic Placement to Large Trees

Publisher

IEEE COMPUTER SOC
DOI: 10.1109/TCBB.2022.3170386

Keywords

Phylogeny; Scalability; Biological system modeling; Numerical models; Maximum likelihood estimation; Licenses; Hamming distance; Phylogenetic placement; maximum likelihood; phylogenetics; pplacer; EPA-ng

Ask authors/readers for more resources

SCAMPP is a technique that extends the scalability of likelihood-based phylogenetic placement methods to ultra-large backbone trees, achieving accurate evolutionary tree classification. It can handle ultra-large backbone trees with 50,000 or more leaves and has higher accuracy compared to other fast phylogenetic placement methods.
Phylogenetic placement, the problem of placing a query sequence into a precomputed phylogenetic backbone tree, is useful for constructing large trees, performing taxon identification of newly obtained sequences, and other applications. The most accurate current methods, such as pplacer and EPA-ng, are based on maximum likelihood and require that the query sequence be provided within a multiple sequence alignment that includes the leaf sequences in the backbone tree. This approach enables high accuracy but also makes these likelihood-based methods computationally intensive on large backbone trees, and can even lead to them failing when the backbone trees are very large (e.g., having 50,000 or more leaves). We present SCAMPP (SCaling AlignMent-based Phylogenetic Placement), a technique to extend the scalability of these likelihood-based placement methods to ultra-large backbone trees. We show that pplacer-SCAMPP and EPA-ng-SCAMPP both scale well to ultra-large backbone trees (even up to 200,000 leaves), with accuracy that improves on APPLES and APPLES-2, two recently developed fast phylogenetic placement methods that scale to ultra-large datasets. EPA-ng-SCAMPP and pplacer-SCAMPP are available at https://github.com/chry04/PLUSplacer.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available