4.7 Article

Active site prediction using evolutionary and structural information

Journal

BIOINFORMATICS
Volume 26, Issue 5, Pages 617-624

Publisher

OXFORD UNIV PRESS
DOI: 10.1093/bioinformatics/btq008

Keywords

-

Funding

  1. National Science Foundation [0238311, 0732065]
  2. National Institutes of Health [HG002769, GM35393]
  3. Department of Energy [BER KP110201]
  4. NIH/NIGMS [R01 GM071749]
  5. Direct For Biological Sciences [0732065] Funding Source: National Science Foundation
  6. Direct For Biological Sciences
  7. Div Of Biological Infrastructure [0238311] Funding Source: National Science Foundation
  8. Div Of Molecular and Cellular Bioscience [0732065] Funding Source: National Science Foundation

Ask authors/readers for more resources

Motivation: The identification of catalytic residues is a key step in understanding the function of enzymes. While a variety of computational methods have been developed for this task, accuracies have remained fairly low. The best existing method exploits information from sequence and structure to achieve a precision (the fraction of predicted catalytic residues that are catalytic) of 18.5% at a corresponding recall (the fraction of catalytic residues identified) of 57% on a standard benchmark. Here we present a new method, Discern, which provides a significant improvement over the state-of-the-art through the use of statistical techniques to derive a model with a small set of features that are jointly predictive of enzyme active sites. Results: In cross-validation experiments on two benchmark datasets from the Catalytic Site Atlas and CATRES resources containing a total of 437 manually curated enzymes spanning 487 SCOP families, Discern increases catalytic site recall between 12% and 20% over methods that combine information from both sequence and structure, and by >= 50% over methods that make use of sequence conservation signal only. Controlled experiments show that Discern's improvement in catalytic residue prediction is derived from the combination of three ingredients: the use of the INTREPID phylogenomic method to extract conservation information; the use of 3D structure data, including features computed for residues that are proximal in the structure; and a statistical regularization procedure to prevent overfitting.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available