4.7 Article

Improved BLAST searches using longer words for protein seeding

Journal

BIOINFORMATICS
Volume 23, Issue 21, Pages 2949-2951

Publisher

OXFORD UNIV PRESS
DOI: 10.1093/bioinformatics/btm479

Keywords

-

Funding

  1. Intramural NIH HHS Funding Source: Medline

Ask authors/readers for more resources

Motivation: The blastp and tblastn modules of BLAST are widely used methods for searching protein queries against protein and nucleotide databases, respectively. One heuristic used in BLAST is to consider only database sequences that contain a high-scoring match of length at most 5 to the query. We implemented the capability to use words of length 6 or 7. We demonstrate an improved trade-off between running time and retrieval accuracy, controlled by the score threshold used for short word matches. For example, the running time can be reduced by 20-30 while achieving ROC (receiver operator characteristic) scores similar to those obtained with current default parameters.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available