4.8 Article

PSI-BLAST pseudocounts and the minimum description length principle

Journal

NUCLEIC ACIDS RESEARCH
Volume 37, Issue 3, Pages 815-824

Publisher

OXFORD UNIV PRESS
DOI: 10.1093/nar/gkn981

Keywords

-

Funding

  1. Intramural NIH HHS Funding Source: Medline

Ask authors/readers for more resources

Position specific score matrices (PSSMs) are derived from multiple sequence alignments to aid in the recognition of distant protein sequence relationships. The PSI-BLAST protein database search program derives the column scores of its PSSMs with the aid of pseudocounts, added to the observed amino acid counts in a multiple alignment column. In the absence of theory, the number of pseudocounts used has been a completely empirical parameter. This article argues that the minimum description length principle can motivate the choice of this parameter. Specifically, for realistic alignments, the principle supports the practice of using a number of pseudocounts essentially independent of alignment size. However, it also implies that more highly conserved columns should use fewer pseudocounts, increasing the inter-column contrast of the implied PSSMs. A new method for calculating pseudocounts that significantly improves PSI-BLAST's retrieval accuracy is now employed by default.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available