4.8 Article

A flexible integrative approach based on random forest improves prediction of transcription factor binding sites

期刊

NUCLEIC ACIDS RESEARCH
卷 40, 期 14, 页码 -

出版社

OXFORD UNIV PRESS
DOI: 10.1093/nar/gks283

关键词

-

资金

  1. ICT Department of Ghent University
  2. Flanders Institute for Biotechnology (VIB)
  3. Research Foundation Flanders (FWO)
  4. Agency for Innovation through Science and Technology in Flanders (IWT) [SB-091213]

向作者/读者索取更多资源

Transcription factor binding sites (TFBSs) are DNA sequences of 6-15 base pairs. Interaction of these TFBSs with transcription factors (TFs) is largely responsible for most spatiotemporal gene expression patterns. Here, we evaluate to what extent sequence-based prediction of TFBSs can be improved by taking into account the positional dependencies of nucleotides (NPDs) and the nucleotide sequence-dependent structure of DNA. We make use of the random forest algorithm to flexibly exploit both types of information. Results in this study show that both the structural method and the NPD method can be valuable for the prediction of TFBSs. Moreover, their predictive values seem to be complementary, even to the widely used position weight matrix (PWM) method. This led us to combine all three methods. Results obtained for five eukaryotic TFs with different DNA-binding domains show that our method improves classification accuracy for all five eukaryotic TFs compared with other approaches. Additionally, we contrast the results of seven smaller prokaryotic sets with high-quality data and show that with the use of high-quality data we can significantly improve prediction performance. Models developed in this study can be of great use for gaining insight into the mechanisms of TF binding.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据