4.0 Article

Bayesian Markov models improve the prediction of binding motifs beyond first order

期刊

NAR GENOMICS AND BIOINFORMATICS
卷 3, 期 2, 页码 -

出版社

OXFORD UNIV PRESS
DOI: 10.1093/nargab/lqab026

关键词

-

资金

  1. DFG [SPP1935 CR 117/6-1]
  2. International Max Planck Research School for Genome Science [IMPRS-GS]

向作者/读者索取更多资源

Predicting transcription factor binding affinities accurately is crucial for understanding transcriptional regulation, with models that can learn dependencies between positions showing improved predictions. However, these models are also more prone to overfitting and learning patterns merely correlated with TF binding.
Transcription factors (TFs) regulate gene expression by binding to specific DNA motifs. Accurate models for predicting binding affinities are crucial for quantitatively understanding of transcriptional regulation. Motifs are commonly described by position weight matrices, which assume that each position contributes independently to the binding energy. Models that can learn dependencies between positions, for instance, induced by DNA structure preferences, have yielded markedly improved predictions for most TFs on in vivo data. However, they are more prone to overfit the data and to learn patterns merely correlated with rather than directly involved in TF binding. We present an improved, faster version of our Bayesian Markov model software, BaMMmotif2. We tested it with state-of-the-art motif discovery tools on a large collection of ChIP-seq and HT-SELEX datasets. BaMMmotif2 models of fifth-order achieved a median false-discovery-rate-averaged recall 13.6% and 12.2% higher than the next best tool on 427 ChIP-seq datasets and 164 HT-SELEX datasets, respectively, while being 8 to 1000 times faster. BaMMmotif2 models showed no signs of overtraining in cross-cell line and cross-platform tests, with similar improvements on the next-best tool. These results demonstrate that dependencies beyond first order clearly improve binding models for most TFs.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.0
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据