☆ 4.6 Article

TAPER: Pinpointing errors in multiple sequence alignments despite varying rates of evolution

METHODS IN ECOLOGY AND EVOLUTION (2021)

Journal

METHODS IN ECOLOGY AND EVOLUTION

Volume 12, Issue 11, Pages 2145-2158

Publisher

WILEY

DOI: 10.1111/2041-210X.13696

Keywords

alignment filtering; error removal; homology errors; phylogenomics

Funding

NSF [ACI-1053575, IIS-1845967, DEB-1655683]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

The text discusses the issue of erroneous data in sequence datasets and the need for automatic error detection methods as datasets grow larger. It introduces the TAPER method, which detects errors in species-specific stretches of sequence alignments to improve accuracy in downstream analyses.

Erroneous data can creep into sequence datasets for reasons ranging from contamination to annotation and alignment mistakes and reduce the accuracy of downstream analyses. As datasets keep getting larger, it has become difficult to check multiple sequence alignments visually for errors, and thus, automatic error detection methods are needed more than ever before. Alignment masking methods, which are widely used, remove entire aligned sites and may reduce signal as much as or more than they reduce the noise. The alternative we propose here is a surprisingly under-explored approach: looking for errors in small species-specific stretches of the multiple sequence alignments. We introduce a method called TAPER that uses a novel two-dimensional outlier detection algorithm. Importantly, TAPER adjusts its null expectations per site and species, and in doing so, it attempts to distinguish the real heterogeneity (signal) from errors (noise). Our results show that TAPER removes very little data yet finds much of the error. The effectiveness of TAPER depends on several properties of the alignment (e.g. evolutionary divergence levels) and the errors (e.g. their length). By enabling data clean up with minimal loss of signal, TAPER can improve downstream analyses such as phylogenetic reconstruction and selection detection. Data errors, small or large, can reduce confidence in the downstream results, and thus, eliminating them can be beneficial even when downstream analyses are not impacted.

TAPER: Pinpointing errors in multiple sequence alignments despite varying rates of evolution

Journal

METHODS IN ECOLOGY AND EVOLUTION

Publisher

WILEY

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

TAPER: Pinpointing errors in multiple sequence alignments despite varying rates of evolution

Journal

METHODS IN ECOLOGY AND EVOLUTION

Publisher

WILEY

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper