Journal
NUCLEIC ACIDS RESEARCH
Volume 37, Issue 3, Pages 815-824Publisher
OXFORD UNIV PRESS
DOI: 10.1093/nar/gkn981
Keywords
-
Categories
Funding
- Intramural NIH HHS Funding Source: Medline
Ask authors/readers for more resources
Position specific score matrices (PSSMs) are derived from multiple sequence alignments to aid in the recognition of distant protein sequence relationships. The PSI-BLAST protein database search program derives the column scores of its PSSMs with the aid of pseudocounts, added to the observed amino acid counts in a multiple alignment column. In the absence of theory, the number of pseudocounts used has been a completely empirical parameter. This article argues that the minimum description length principle can motivate the choice of this parameter. Specifically, for realistic alignments, the principle supports the practice of using a number of pseudocounts essentially independent of alignment size. However, it also implies that more highly conserved columns should use fewer pseudocounts, increasing the inter-column contrast of the implied PSSMs. A new method for calculating pseudocounts that significantly improves PSI-BLAST's retrieval accuracy is now employed by default.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available