4.8 Article

Evaluation of methods for modeling transcription factor sequence specificity

Journal

NATURE BIOTECHNOLOGY
Volume 31, Issue 2, Pages 126-134

Publisher

NATURE PUBLISHING GROUP
DOI: 10.1038/nbt.2486

Keywords

-

Funding

  1. Canadian Institutes of Health Research (CIHR)
  2. Canadian Institute for Advanced Research (CIFAR)
  3. Ontario Research Fund
  4. Genome Canada through the Ontario Genomics Institute
  5. March of Dimes
  6. CIHR [MOP-77721]
  7. US National Institutes of Health/National Human Genome Research Institute [R01 HG003985]
  8. US National Institutes of Health [R01HG003008, U54CAl21852]
  9. John Simon Guggenheim Foundation
  10. Academy of Finland [260403]
  11. EU ERASysBio ERA-NET
  12. European Community [HEALTH-F4-2009-223575]
  13. Israel Science Foundation [802/08]
  14. Edmond J. Safra Bioinformatics Program at Tel Aviv University
  15. Ministry of Culture of Saxony-Anhalt [XP3624HP/0606T]
  16. US National Science Foundation (NSF) [PHY-1022140]
  17. NSF [PHY-0957573]
  18. Simons Center for Quantitative Biology at Cold Spring Harbor Laboratory
  19. Direct For Computer & Info Scie & Enginr
  20. Div Of Information & Intelligent Systems [1218201] Funding Source: National Science Foundation
  21. Academy of Finland (AKA) [260403, 260403] Funding Source: Academy of Finland (AKA)

Ask authors/readers for more resources

Genomic analyses often involve scanning for potential transcription factor (TF) binding sites using models of the sequence specificity of DNA binding proteins. Many approaches have been developed to model and learn a protein's DNA-binding specificity, but these methods have not been systematically compared. Here we applied 26 such approaches to in vitro protein binding microarray data for 66 mouse TFs belonging to various families. For nine TFs, we also scored the resulting motif models on in vivo data, and found that the best in vitro-derived motifs performed similarly to motifs derived from the in vivo data. Our results indicate that simple models based on mononucleotide position weight matrices trained by the best methods perform similarly to more complex models for most TFs examined, but fall short in specific cases (<10% of the TFs examined here). In addition, the best-performing motifs typically have relatively low information content, consistent with widespread degeneracy in eukaryotic TF sequence preferences.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available