4.7 Article Proceedings Paper

Protein Fold Recognition by Combining Support Vector Machines and Pairwise Sequence Similarity Scores

Publisher

IEEE COMPUTER SOC
DOI: 10.1109/TCBB.2020.2966450

Keywords

Proteins; Support vector machines; Feature extraction; Machine learning; Training; Hidden Markov models; Protein engineering; Protein fold recognition; SVMs; Template-based method; Pairwise sequence similarity scores

Funding

  1. National Natural Science Foundation of China [61822306, 61672184, 61876051]
  2. Beijing Natural Science Foundation [JQ19019]
  3. Fok Ying-Tung Education Foundation for Young Teachers in the Higher Education Institutions of China [161063]
  4. Scientific Research Foundation in Shenzhen [JCYJ20150626110425228, JCYJ20170307152201596, JCYJ20180306172207178]
  5. Guangzhou Science and Technology Planning Project [201804010347]
  6. National Postdoctoral Program for Innovative Talent [BX20190100]
  7. Key laboratory project of Shenzhen Municipal Science and Technology Innovation Council [ZDSYS20190902093015527]

Ask authors/readers for more resources

The study proposed two novel algorithms, TSVM-fold and ESVM-fold, utilizing sequence similarity scores generated by multiple template-based methods for protein fold recognition prediction. Experimental results showed that these algorithms outperform some state-of-the-art methods in rigorous benchmark datasets.
Protein fold recognition is one of the most essential steps for protein structure prediction, aiming to classify proteins into known protein folds. There are two main computational approaches: one is the template-based method based on the alignment scores between query-template protein pairs and the other is the machine learning method based on the feature representation and classifier. These two approaches have their own advantages and disadvantages. Can we combine these methods to establish more accurate predictors for protein fold recognition? In this study, we made an initial attempt and proposed two novel algorithms: TSVM-fold and ESVM-fold. TSVM-fold was based on the Support Vector Machines (SVMs), which utilizes a set of pairwise sequence similarity scores generated by three complementary template-based methods, including HHblits, SPARKS-X, and DeepFR. These scores measured the global relationships between query sequences and templates. The comprehensive features of the attributes of the sequences were fed into the SVMs for the prediction. Then the TSVM-fold was further combined with the HHblits algorithm so as to improve its generalization ability. The combined method is called ESVM-fold. Experimental results in two rigorous benchmark datasets (LE and YK datasets) showed that the proposed methods outperform some state-of-the-art methods, indicating that the TSVM-fold and ESVM-fold are efficient predictors for protein fold recognition.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available