4.6 Article

TAPER: Pinpointing errors in multiple sequence alignments despite varying rates of evolution

Journal

METHODS IN ECOLOGY AND EVOLUTION
Volume 12, Issue 11, Pages 2145-2158

Publisher

WILEY
DOI: 10.1111/2041-210X.13696

Keywords

alignment filtering; error removal; homology errors; phylogenomics

Categories

Funding

  1. NSF [ACI-1053575, IIS-1845967, DEB-1655683]

Ask authors/readers for more resources

The text discusses the issue of erroneous data in sequence datasets and the need for automatic error detection methods as datasets grow larger. It introduces the TAPER method, which detects errors in species-specific stretches of sequence alignments to improve accuracy in downstream analyses.
Erroneous data can creep into sequence datasets for reasons ranging from contamination to annotation and alignment mistakes and reduce the accuracy of downstream analyses. As datasets keep getting larger, it has become difficult to check multiple sequence alignments visually for errors, and thus, automatic error detection methods are needed more than ever before. Alignment masking methods, which are widely used, remove entire aligned sites and may reduce signal as much as or more than they reduce the noise. The alternative we propose here is a surprisingly under-explored approach: looking for errors in small species-specific stretches of the multiple sequence alignments. We introduce a method called TAPER that uses a novel two-dimensional outlier detection algorithm. Importantly, TAPER adjusts its null expectations per site and species, and in doing so, it attempts to distinguish the real heterogeneity (signal) from errors (noise). Our results show that TAPER removes very little data yet finds much of the error. The effectiveness of TAPER depends on several properties of the alignment (e.g. evolutionary divergence levels) and the errors (e.g. their length). By enabling data clean up with minimal loss of signal, TAPER can improve downstream analyses such as phylogenetic reconstruction and selection detection. Data errors, small or large, can reduce confidence in the downstream results, and thus, eliminating them can be beneficial even when downstream analyses are not impacted.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available