4.5 Article

Predicting S-nitrosylation proteins and sites by fusing multiple features

期刊

MATHEMATICAL BIOSCIENCES AND ENGINEERING
卷 18, 期 6, 页码 9132-9147

出版社

AMER INST MATHEMATICAL SCIENCES-AIMS
DOI: 10.3934/mbe.2021450

关键词

S-nitrosylation; random forest; post-translational modification; multiple features; identification

资金

  1. National Natural Science Foundation of China [31760315, 62162032, 61761023]
  2. Natural Science Foundation of Jiangxi Province, China [20202BAB202007]

向作者/读者索取更多资源

Two models were proposed for identifying S-nitrosylation proteins and their PTM sites. By extracting features from protein sequences and using synthetic minority oversampling technique to balance data sets, state-of-the-art classifiers and feature fusion strategies were employed, leading to promising results in five-fold cross-validation tests.
Protein S-nitrosylation is one of the most important post-translational modifications, a well-grounded understanding of S-nitrosylation is very significant since it plays a key role in a variety of biological processes. For an uncharacterized protein sequence, it is a very meaningful problem for both basic research and drug development when we can firstly identify whether it is a S-nitrosylation protein or not, and then predict the specific S-nitrosylation site(s). This work has proposed two models for identifying S-nitrosylation protein and its PTM sites. Firstly, three kinds of features are extracted from protein sequence: KNN scoring of functional domain annotation, PseAAC and bag-of-words based on the physical and chemical properties of amino acids. Secondly, the synthetic minority oversampling technique is used to balance the data sets, and some state-of-the-art classifiers and feature fusion strategies are performed on the balanced data sets. In the five-fold cross-validation for predicting S-nitrosylation proteins, the results of Accuracy (ACC), Matthew's correlation coefficient (MCC) and area under ROC curve (AUC) are 81.84%, 0.5178, 0.8635, respectively. Finally, a model for predicting S-nitrosylation sites has been constructed on the basis of tripeptide composition (TPC) and the composition of k-spaced amino acid pairs (CKSAAP). To eliminate redundant information and improve work efficiency, elastic nets are employed for feature selection. The five-fold cross-validation tests have indicated the promising success rates of the proposed model. For the convenience of related researchers, the web-server named RF-SNOPS has been established at http://www.jci-bioinfo.cn/RF-SNOPS

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据