4.7 Article

Identification of adaptor proteins by incorporating deep learning and PSSM profiles

期刊

METHODS
卷 209, 期 -, 页码 10-17

出版社

ACADEMIC PRESS INC ELSEVIER SCIENCE
DOI: 10.1016/j.ymeth.2022.11.001

关键词

PSSM; biLSTM; Adaptor proteins; Deep learning

向作者/读者索取更多资源

Adaptor proteins play an important role in signal transduction pathways and are closely related to diseases. Previous studies used PSSM and neural network methods to identify adaptor proteins, but ordinary neural networks cannot handle the contextual information in PSSM profiles well. This study proposes an efficient method combining 1-D CNN and biLSTM, which shows better performance in identifying adaptor proteins.
Adaptor proteins, also known as signal transduction adaptor proteins, are important proteins in signal trans-duction pathways, and play a role in connecting signal proteins for signal transduction between cells. Studies have shown that adaptor proteins are closely related to some diseases, such as tumors and diabetes. Therefore, it is very meaningful to construct a relevant model to accurately identify adaptor proteins. In recent years, many studies have used a position-specific scoring matrix (PSSM) and neural network methods to identify adaptor proteins. However, ordinary neural network models cannot correlate the contextual information in PSSM profiles well, so these studies usually process 20 x N (N > 20) PSSM into 20 x 20 dimensions, which results in the loss of a large amount of protein information; This research proposes an efficient method that combines one-dimensional convolution (1-D CNN) and a bidirectional long short-term memory network (biLSTM) to identify adaptor proteins. The complete PSSM profiles are the input of the model, and the complete information of the protein is retained during the training process. We perform cross-validation during model training and test the performance of the model on an independent test set; in the data set with 1224 adaptor proteins and 11,078 non-adaptor proteins, five indicators including specificity, sensitivity, accuracy, area under the receiver operating characteristic curve (AUC) metric and Matthews correlation coefficient (MCC), were employed to evaluate model performance. On the independent test set, the specificity, sensitivity, accuracy and MCC were 0.817, 0.865, 0.823 and 0.465, respectively. Those results show that our method is better than the state-of-the art methods. This study is committed to improve the accuracy of adaptor protein identification, and laid a foundation for further research on diseases related to adaptor protein. This research provided a new idea for the application of deep learning related models in bioinformatics and computational biology.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据