4.7 Article

Accurate formula for p-values of gapped local sequence and profile alignments

期刊

JOURNAL OF MOLECULAR BIOLOGY
卷 300, 期 3, 页码 649-659

出版社

ACADEMIC PRESS LTD- ELSEVIER SCIENCE LTD
DOI: 10.1006/jmbi.2000.3875

关键词

statistical significance; protein sequence; protein profile; sequence alignment

向作者/读者索取更多资源

A simple general approximation for the distribution of gapped local alignment scores is presented, suitable for assessing significance of comparisons between two protein sequences or a sequence and a profile. The approximation takes account of the scoring scheme (i.e. gap penalty and substitution matrix or profile), sequence composition and length. Use of this formula means it is unnecessary to fit an extreme-value distribution to simulations or to the results of databank searches. The method is based on the theoretical ideas introduced by R. Mott and R. Tribe in 1999. Extensive simulation studies show that score-thresholds produced by the method are accurate to within +/-5% 95% of the time. We also investigate factors which effect the accuracy of alignment statistics, and show that any method based on asymptotic theory is limited because asymptotic behaviour is not strictly achieved for many real protein sequences, due to extreme composition effects. Consequently, it may not be practicable to find a general formula that is significantly more accurate until the sub-asymptotic behaviour of alignments is better understood. (C) 2000 Academic Press.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据