4.7 Article

DEEPStack-RBP: Accurate identification of RNA-binding proteins based on autoencoder feature selection and deep stacking ensemble classifier

Journal

KNOWLEDGE-BASED SYSTEMS
Volume 256, Issue -, Pages -

Publisher

ELSEVIER
DOI: 10.1016/j.knosys.2022.109875

Keywords

RNA-binding proteins; Multi-information fusion; Bidirectional long short-term memory; Gated recurrent unit; Autoencoder; Stacked ensemble classifier

Funding

  1. National Natural Science Foundation of China
  2. Natural Science Foundation of Shandong Province of China
  3. Key Laboratory Open Foundation of Hainan Province, China
  4. [62172248]
  5. [61863010]
  6. [ZR2021MF098]
  7. [JSKX202001]

Ask authors/readers for more resources

This article introduces a novel RBPs prediction tool, DEEPStack-RBP, based on deep learning and ensemble learning. The tool utilizes various feature extraction methods and employs autoencoder and sample balancing techniques for prediction. Experimental results show that DEEPStack-RBP achieves high accuracy and MCC values, making it a powerful tool for RBPs prediction.
RNA-binding proteins (RBPs) are involved in a number of biological processes such as RNA synthesis, protein folding, alternative splicing, etc. Predicting RBPs can facilitate the discovery and treatment of human diseases, such as muscle atrophy, nervous system diseases, and cancer. However, there are still various challenges in identifying RBPs using experimental methods. Computational methods, and in particular Deep Learning, are being deployed to alleviate some of these challenges and provide new avenues of investigation in the field of RBPs prediction. Here, we propose DEEPStack-RBP, a novel RBPs prediction tool based on deep learning and ensemble learning. First, conjoint triad (CT), local descriptors (LD), pseudo amino acid composition (PseAAC), multivariate mutual information (MMI) and position specific scoring matrix-transition probability composition (PSSM-TPC) are applied to extract multiple features from the proteins. Subsequently, autoencoder (AE) is used to eliminate redundancy in features, and SMOTE-ENN is employed to balance the samples by minimizing the number difference between positive and negative cases. Finally, the stacked ensemble classifier composed of bidirectional long short-term memory (BiLSTM), gated recurrent unit (GRU), and support vector machine (SVM) is used for prediction. On the training dataset RBP9873, the ACC value of DEEPStack-RBP reaches 98.76% with a MCC value of 0.9508. For the three independent test datasets of Human, S. cerevisiae and A. thaliana, the accuracy of the model is 97.16%, 97.67% and 99.57% respectively, and the MCC is 0.9405, 0.9499 and 0.9906 respectively. These results show that DEEPStack-RBP can be used as a powerful tool for RBPs prediction.(c) 2022 Elsevier B.V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available