4.6 Article

Identification of DNA-Binding Proteins via Hypergraph Based Laplacian Support Vector Machine

期刊

CURRENT BIOINFORMATICS
卷 17, 期 1, 页码 108-117

出版社

BENTHAM SCIENCE PUBL LTD
DOI: 10.2174/1574893616666210806091922

关键词

DNA-binding proteins; feature extraction; laplacian support vector machine; multiple kernel learning; hypergraph learning; PDB

资金

  1. National Natural Science Foundation of China [NSFC 61902271, 62172076, 62073231, 61772357]
  2. Natural Science Research of Jiangsu Higher Education Institutions of China [19KJB520014]
  3. Special Science Foundation of Quzhou [2020D003, 2021D004]

向作者/读者索取更多资源

In this study, we developed a sequence-based machine learning model to predict DNA binding proteins (DBP). By extracting six types of features and using multiple kernel learning and hypergraph model, our model achieved good predictive accuracy on multiple datasets.
Background: The identification of DNA binding proteins (DBP) is an important research field. Experiment-based methods are time-consuming and labor-intensive for detecting DBP. Objective: To solve the problem of large-scale DBP identification, some machine learning methods are proposed. However, these methods have insufficient predictive accuracy. Our aim is to develop a se-quence-based machine learning model to predict DBP. Methods: In our study, we extracted six types of features (including NMBAC, GE, MCD, PSSM-AB, PSSM-DWT, and PsePSSM) from protein sequences. We used Multiple Kernel Learning based on Hil -bert-Schmidt Independence Criterion (MKL-HSIC) to estimate the optimal kernel. Then, we construct-ed a hypergraph model to describe the relationship between labeled and unlabeled samples. Finally, La-placian Support Vector Machines (LapSVM) is employed to train the predictive model. Our method is tested on PDB186, PDB1075, PDB2272 and PDB14189 data sets. Results: Compared with other methods, our model achieved best results on benchmark data sets. Conclusion: The accuracy of 87.1% and 74.2% are achieved on PDB186 (Independent test of PDB1075) and PDB2272 (Independent test of PDB14189), respectively.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据