Journal
BIOINFORMATICS
Volume 30, Issue 6, Pages 846-851Publisher
OXFORD UNIV PRESS
DOI: 10.1093/bioinformatics/btt619
Keywords
-
Categories
Funding
- Bulgarian Science Fund [DCVNP 02-1/2009, IO 7/1]
Ask authors/readers for more resources
Motivation: Allergenicity, like antigenicity and immunogenicity, is a property encoded linearly and non-linearly, and therefore the alignment-based approaches are not able to identify this property unambiguously. A novel alignment-free descriptor-based fingerprint approach is presented here and applied to identify allergens and non-allergens. The approach was implemented into a four step algorithm. Initially, the protein sequences are described by amino acid principal properties as hydrophobicity, size, relative abundance, helix and beta-strand forming propensities. Then, the generated strings of different length are converted into vectors with equal length by autoand cross-covariance (ACC). The vectors were transformed into binary fingerprints and compared in terms of Tanimoto coefficient. Results: The approach was applied to a set of 2427 known allergens and 2427 non-allergens and identified correctly 88% of them with Matthews correlation coefficient of 0.759. The descriptor fingerprint approach presented here is universal. It could be applied for any classification problem in computational biology. The set of E-descriptors is able to capture the main structural and physicochemical properties of amino acids building the proteins. The ACC transformation overcomes the main problem in the alignment-based comparative studies arising from the different length of the aligned protein sequences. The conversion of protein ACC values into binary descriptor fingerprints allows similarity search and classification.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available