4.8 Article

Theoretical and empirical quality assessment of transcription factor-binding motifs

期刊

NUCLEIC ACIDS RESEARCH
卷 39, 期 3, 页码 808-824

出版社

OXFORD UNIV PRESS
DOI: 10.1093/nar/gkq710

关键词

-

资金

  1. Consejo Nacional de Ciencia y Tecnologia (Mexico)
  2. European Communities [LSHG-CT-2003-503265]
  3. Belgian Federal Science Policy Office [P6/25 (BioMaGNet)]
  4. European Commission [222886-2]
  5. Actions de Recherches Concertees de la Communaute Francaise de Belgique (ARC) [04/09-307]
  6. Bureau des Relations Internationales et de Cooperation (BRIC, Universite Libre de Bruxelles)
  7. UNAM
  8. National Institutes of Health [R01 GM071962-05]
  9. Alexander von Humbold Stiftung

向作者/读者索取更多资源

Position-specific scoring matrices (PSSMs) are routinely used to predict transcription factor (TF)-binding sites in genome sequences. However, their reliability to predict novel binding sites can be far from optimum, due to the use of a small number of training sites or the inappropriate choice of parameters when building the matrix or when scanning sequences with it. Measures of matrix quality such as E-value and information content rely on theoretical models, and may fail in the context of full genome sequences. We propose a method, implemented in the program 'matrix-quality', that combines theoretical and empirical score distributions to assess reliability of PSSMs for predicting TF-binding sites. We applied 'matrix-quality' to estimate the predictive capacity of matrices for bacterial, yeast and mouse TFs. The evaluation of matrices from RegulonDB revealed some poorly predictive motifs, and allowed us to quantify the improvements obtained by applying multi-genome motif discovery. Interestingly, the method reveals differences between global and specific regulators. It also highlights the enrichment of binding sites in sequence sets obtained from high-throughput ChIP-chip (bacterial and yeast TFs), and ChIP-seq and experiments (mouse TFs). The method presented here has many applications, including: selecting reliable motifs before scanning sequences; improving motif collections in TFs databases; evaluating motifs discovered using high-throughput data sets.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据