4.5 Article

An Efficient, Nonphylogenetic Method for Detecting Genes Sharing Evolutionary Signals in Phylogenomic Data Sets

Journal

GENOME BIOLOGY AND EVOLUTION
Volume 13, Issue 9, Pages -

Publisher

OXFORD UNIV PRESS
DOI: 10.1093/gbe/evab187

Keywords

bioinformatics; phylogenomics; gene coevolution; clustering; Archaea

Funding

  1. Simons Foundation Collaboration on the Origins of Life [339603]
  2. NSF Integrated Earth Systems Program [1615426]
  3. Geisel School of Medicine at Dartmouth's Center for Quantitative Biology through a grant from the National Institute of General Medical Sciences of the National Institutes of Health [P20GM130454]
  4. CNPq
  5. Division Of Earth Sciences
  6. Directorate For Geosciences [1615426] Funding Source: National Science Foundation

Ask authors/readers for more resources

I-ES is a method for assessing shared evolution between gene families using a weighted orthogonal distance regression model applied to sequence distances, avoiding comparisons between gene tree topologies. It allows for many-to-many pairing of similarly evolving gene families and shows comparable accuracy to popular tree-based methods on simulated gene family data sets.
Assessing the compatibility between gene family phylogenies is a crucial and often computationally demanding step in many phylogenomic analyses. Here, we describe the Evolutionary Similarity Index (I-ES), a means to assess shared evolution between gene families using a weighted orthogonal distance regression model applied to sequence distances. The utilization of pairwise distance matrices circumvents comparisons between gene tree topologies, which are inherently uncertain and sensitive to evolutionary model choice, phylogenetic reconstruction artifacts, and other sources of error. Furthermore, I-ES enables the many-to-many pairing of multiple copies between similarly evolving gene families. This is done by selecting non-overlapping pairs of copies, one from each assessed family, and yielding the least sum of squared residuals. Analyses of simulated gene family data sets show that I-ES's accuracy is on par with popular tree-based methods while also less susceptible to noise introduced by sequence alignment and evolutionary model fitting. Applying I-ES to an empirical data set of 1,322 genes from 42 archaeal genomes identified eight major clusters of gene families with compatible evolutionary trends. The most cohesive cluster consisted of 62 genes with compatible evolutionary signal, which occur as both single-copy and multiple homologs per genome; phylogenetic analysis of concatenated alignments from this cluster produced a tree closely matching previously published species trees for Archaea. Four other clusters are mainly composed of accessory genes with limited distribution among Archaea and enriched toward specific metabolic functions. Pairwise evolutionary distances obtained from these accessory gene clusters suggest patterns of interphyla horizontal gene transfer. An IES implementation is available at https://github.com/lthiberiol/evolSimIndex.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available