4.3 Article

Amino acid alphabet reduction preserves fold information contained in contact interactions in proteins

Journal

PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS
Volume 83, Issue 12, Pages 2198-2216

Publisher

WILEY
DOI: 10.1002/prot.24936

Keywords

protein structure; amino acid sequence; contact potential; knowledge-based potential; sequence representation; threading

Ask authors/readers for more resources

To reduce complexity, understand generalized rules of protein folding, and facilitate de novo protein design, the 20-letter amino acid alphabet is commonly reduced to a smaller alphabet by clustering amino acids based on some measure of similarity. In this work, we seek the optimal alphabet that preserves as much of the structural information found in long-range (contact) interactions among amino acids in natively-folded proteins. We employ the Information Maximization Device, based on information theory, to partition the amino acids into well-defined clusters. Numbering from 2 to 19 groups, these optimal clusters of amino acids, while generated automatically, embody well-known properties of amino acids such as hydrophobicity/polarity, charge, size, and aromaticity, and are demonstrated to maintain the discriminative power of long-range interactions with minimal loss of mutual information. Our measurements suggest that reduced alphabets (of less than 10) are able to capture virtually all of the information residing in native contacts and may be sufficient for fold recognition, as demonstrated by extensive threading tests. In an expansive survey of the literature, we observe that alphabets derived from various approaches-including those derived from physicochemical intuition, local structure considerations, and sequence alignments of remote homologs-fare consistently well in preserving contact interaction information, highlighting a convergence in the various factors thought to be relevant to the folding code. Moreover, we find that alphabets commonly used in experimental protein design are nearly optimal and are largely coherent with observations that have arisen in this work. (C) 2015 Wiley Periodicals, Inc.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.3
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available