期刊
PROTEIN ENGINEERING DESIGN & SELECTION
卷 23, 期 8, 页码 617-632出版社
OXFORD UNIV PRESS
DOI: 10.1093/protein/gzq030
关键词
error estimation; homology modeling; model quality assessment; protein structure prediction; regression analysis; threading
资金
- National Institute of General Medical Sciences of the National Institutes of Health [U24GM077905, R01GM075004]
- National Science Foundation [DMS604776, DMS800568, IIS0915801, EF0850009]
- Purdue Research Foundation
- Department of Biological Sciences, Purdue University
- Emerging Frontiers
- Direct For Biological Sciences [0850009] Funding Source: National Science Foundation
Computational protein tertiary structure prediction has made significant progress over the past years. However, most of the existing structure prediction methods are not equipped with functionality to predict accuracy of constructed models. Knowing the accuracy of a structure model is crucial for its practical use since the accuracy determines potential applications of the model. Here we have developed quality assessment methods, which predict real value of the global and local quality of protein structure models. The global quality of a model is defined as the root mean square deviation (RMSD) and the LGA score to its native structure. The local quality is defined as the distance between the corresponding C alpha positions of a model and its native structure when they are superimposed. Three regression methods are employed to combine different types of quality assessment measures of models, including alignment-level scores, residue-position level scores, atomic-detailed structure level scores and composite scores. The regression models were tested on a large benchmark data set of template-based protein structure models of various qualities. In predicting RMSD and the LGA score, a combination of two terms, length-normalized SPAD, a score that assesses alignment stability by considering suboptimal alignments, and Verify3D normalized by the square of the model length shows a significant performance, achieving 97.1 and 83.6% accuracy in identifying models with an RMSD of < 2 and 6 A, respectively. For predicting the local quality of models, we find that a two-step approach, in which the global RMSD predicted in the first step is further combined with the other terms, can dramatically increase the accuracy. Finally, the developed regression equations are applied to assess the quality of structure models of whole E. coli proteome.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据