4.6 Article

Analysis and prediction of single-stranded and double-stranded DNA binding proteins based on protein sequences

期刊

BMC BIOINFORMATICS
卷 18, 期 -, 页码 -

出版社

BIOMED CENTRAL LTD
DOI: 10.1186/s12859-017-1715-8

关键词

SSBs (Single-stranded DNA-binding proteins); DSBs (Double-stranded DNA-binding proteins); Binding specificity; Protein sequence

资金

  1. Science and Technology Research Key Project of Educational Department of Henan Province [16A520016, 17B520002, 17B520036]
  2. Key Project of Science and Technology Department of Henan Province [142102210056]
  3. National Natural Science Foundation of China [61402153]
  4. China Postdoctoral Science Foundation
  5. Ph.D. Research Startup Foundation of Henan Normal University [qd15130, qd15132, qd15129]

向作者/读者索取更多资源

Background: DNA-binding proteins perform important functions in a great number of biological activities. DNA-binding proteins can interact with ssDNA (single-stranded DNA) or dsDNA (double-stranded DNA), and DNA-binding proteins can be categorized as single-stranded DNA-binding proteins (SSBs) and double-stranded DNA-binding proteins (DSBs). The identification of DNA-binding proteins from amino acid sequences can help to annotate protein functions and understand the binding specificity. In this study, we systematically consider a variety of schemes to represent protein sequences: OAAC (overall amino acid composition) features, dipeptide compositions, PSSM (position-specific scoring matrix profiles) and split amino acid composition (SAA), and then we adopt SVM (support vector machine) and RF (random forest) classification model to distinguish SSBs from DSBs. Results: Our results suggest that some sequence features can significantly differentiate DSBs and SSBs. Evaluated by 10 fold cross-validation on the benchmark datasets, our prediction method can achieve the accuracy of 88.7% and AUC (area under the curve) of 0.919. Moreover, our method has good performance in independent testing. Conclusions: Using various sequence-derived features, a novel method is proposed to distinguish DSBs and SSBs accurately. The method also explores novel features, which could be helpful to discover the binding specificity of DNA-binding proteins.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据