☆ 4.3 Article

Achieving 80% ten-fold cross-validated accuracy for secondary structure prediction by large-scale training

PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS (2007)

Journal

PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS

Volume 66, Issue 4, Pages 838-845

Publisher

WILEY

DOI: 10.1002/prot.21298

Keywords

solvent accessibility; solvent accessible surface area; neural network

Funding

NIGMS NIH HHS [R01 GM 068530, R01 GM 966049] Funding Source: Medline

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

An integrated system of neural networks, called SPINE, is established and optimized for predicting structural properties of proteins. SPINE is applied to three-state secondary-structure and residue-solvent-accessibility (RSA) prediction in this paper. The integrated neural networks are carefully trained with a large dataset of 2640 chains, sequence profiles generated from multiple sequence alignment, representative amino acid properties, a slow learning rate, overfitting protection, and an optimized sliding-widow size. More than 200,000 weights in SPINE are optimized by maximizing the accuracy measured by Q(3) (the percentage of correctly classified residues). SPINE yields a 10-fold cross-validated accuracy of 79.5% (80.0% for chains of length between 50 and 300) in secondary-structure prediction after one-month (CPU time) training on 22 processors. An accuracy of 87.5% is achieved for exposed residues (RSA > 95%). The latter approaches the theoretical upper limit of 88-90% accuracy in assigning secondary structures. An accuracy of 73% for three-state solvent-accessibility prediction (25%/75% cutoff) and 79.3% for two-state prediction (25% cutoff) is also obtained.

Achieving 80% ten-fold cross-validated accuracy for secondary structure prediction by large-scale training

Journal

PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS

Publisher

WILEY

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Achieving 80% ten-fold cross-validated accuracy for secondary structure prediction by large-scale training

Journal

PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS

Publisher

WILEY

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper