4.5 Review

Graphlet Kernels for Prediction of Functional Residues in Protein Structures

Journal

JOURNAL OF COMPUTATIONAL BIOLOGY
Volume 17, Issue 1, Pages 55-72

Publisher

MARY ANN LIEBERT, INC
DOI: 10.1089/cmb.2009.0029

Keywords

algorithms; graphs; kernel methods; machine learning; protein structure; protein function

Funding

  1. NIH [1R21CA113711]
  2. NSF [IIS-0447773, DBI-0321756, DBI-0644017]
  3. NATIONAL CANCER INSTITUTE [R21CA113711] Funding Source: NIH RePORTER

Ask authors/readers for more resources

We introduce a novel graph-based kernel method for annotating functional residues in protein structures. A structure is first modeled as a protein contact graph, where nodes correspond to residues and edges connect spatially neighboring residues. Each vertex in the graph is then represented as a vector of counts of labeled non-isomorphic subgraphs (graphlets), centered on the vertex of interest. A similarity measure between two vertices is expressed as the inner product of their respective count vectors and is used in a supervised learning framework to classify protein residues. We evaluated our method on two function prediction problems: identification of catalytic residues in proteins, which is a well-studied problem suitable for benchmarking, and a much less explored problem of predicting phosphorylation sites in protein structures. The performance of the graphlet kernel approach was then compared against two alternative methods, a sequence-based predictor and our implementation of the FEATURE framework. On both tasks, the graphlet kernel performed favorably; however, the margin of difference was considerably higher on the problem of phosphorylation site prediction. While there is data that phosphorylation sites are preferentially positioned in intrinsically disordered regions, we provide evidence that for the sites that are located in structured regions, neither the surface accessibility alone nor the averaged measures calculated from the residue microenvironments utilized by FEATURE were sufficient to achieve high accuracy. The key benefit of the graphlet representation is its ability to capture neighborhood similarities in protein structures via enumerating the patterns of local connectivity in the corresponding labeled graphs.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available