☆ 4.6 Article

Pathological rate matrices: from primates to pathogens

BMC BIOINFORMATICS (2008)

Journal

BMC BIOINFORMATICS

Volume 9, Issue -, Pages -

Publisher

BIOMED CENTRAL LTD

DOI: 10.1186/1471-2105-9-550

Keywords

Funding

ARC
NHMRC
NIH [P01DK078669]
Singapore Ministry of Education AcRF [R-155-050-054-133/R-155-050-054-101]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Background: Continuous-time Markov models allow flexible, parametrically succinct descriptions of sequence divergence. Non-reversible forms of these models are more biologically realistic but are challenging to develop. The instantaneous rate matrices defined for these models are typically transformed into substitution probability matrices using a matrix exponentiation algorithm that employs eigendecomposition, but this algorithm has characteristic vulnerabilities that lead to significant errors when a rate matrix possesses certain 'pathological' properties. Here we tested whether pathological rate matrices exist in nature, and consider the suitability of different algorithms to their computation. Results: We used concatenated protein coding gene alignments from microbial genomes, primate genomes and independent intron alignments from primate genomes. The Taylor series expansion and eigendecomposition matrix exponentiation algorithms were compared to the less widely employed, but more robust, Pade with scaling and squaring algorithm for nucleotide, dinucleotide, codon and trinucleotide rate matrices. Pathological dinucleotide and trinucleotide matrices were evident in the microbial data set, affecting the eigendecomposition and Taylor algorithms respectively. Even using a conservative estimate of matrix error (occurrence of an invalid probability), both Taylor and eigendecomposition algorithms exhibited substantial error rates: similar to 100% of all exonic trinucleotide matrices were pathological to the Taylor algorithm while similar to 10% of codon positions 1 and 2 dinucleotide matrices and intronic trinucleotide matrices, and similar to 30% of codon matrices were pathological to eigendecomposition. The majority of Taylor algorithm errors derived from occurrence of multiple unobserved states. A small number of negative probabilities were detected from the Pade algorithm on trinucleotide matrices that were attributable to machine precision. Although the Pade algorithm does not facilitate caching of intermediate results, it was up to 3x faster than eigendecomposition on the same matrices. Conclusion: Development of robust software for computing non-reversible dinucleotide, codon and higher evolutionary models requires implementation of the Pade with scaling and squaring algorithm.

Pathological rate matrices: from primates to pathogens

Journal

BMC BIOINFORMATICS

Publisher

BIOMED CENTRAL LTD

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Pathological rate matrices: from primates to pathogens

Journal

BMC BIOINFORMATICS

Publisher

BIOMED CENTRAL LTD

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper