4.8 Article

A comprehensive vertebrate phylogeny using vector representations of protein sequences from whole genomes

Journal

MOLECULAR BIOLOGY AND EVOLUTION
Volume 19, Issue 4, Pages 554-562

Publisher

OXFORD UNIV PRESS
DOI: 10.1093/oxfordjournals.molbev.a004111

Keywords

genomics; mitochondrial DNA; molecular phylogenetics; molecular systematics; sequence analysis; singular value decomposition

Ask authors/readers for more resources

We recently developed a method for producing comprehensive gene and species phylogenies from unaligned whole,genome data using singular value decomposition (SVD) to analyze character string frequencies. This work provides an integrated gene and species phylogeny for 64 vertebrate mitochondrial genomes composed of 832 total proteins. In addition, to provide a theoretical basis for the method, we present a graphical interpretation of both the original frequency matrix and the SVD-derived matrix. These large matrices describe high-dimensional Euclidean spaces within which biomolecular sequences can be uniquely represented as vectors. In particular, the SVD-derived vector space describes each protein relative to a restricted set of newly defined, independent axes, each of which represents a novel form of conserved motif, termed a correlated peptide motif. A quantitative comparison of the relative orientations of protein vectors in this space provides accurate and straightforward estimates of sequence similarity, which can in turn be used to produce comprehensive gene trees. Alternatively. the vector representations of genes from individual species can be summed, allowing species trees to be produced.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available