4.7 Article

A transcription factor affinity-based code for mammalian transcription initiation

期刊

GENOME RESEARCH
卷 19, 期 4, 页码 644-656

出版社

COLD SPRING HARBOR LAB PRESS, PUBLICATIONS DEPT
DOI: 10.1101/gr.085449.108

关键词

-

资金

  1. NHGRI NIH HHS [R01 HG004065, R01 HG 004065] Funding Source: Medline
  2. NIGMS NIH HHS [P50GM081883, P50 GM081883] Funding Source: Medline

向作者/读者索取更多资源

The recent arrival of large-scale cap analysis of gene expression (CAGE) data sets in mammals provides a wealth of quantitative information on coding and noncoding RNA polymerase II transcription start sites (TSS). Genome-wide CAGE studies reveal that a large fraction of TSS exhibit peaks where the vast majority of associated tags map to a particular location (similar to 45%), whereas other active regions contain a broader distribution of initiation events. The presence of a strong single peak suggests that transcription at these locations may be mediated by position-specific sequence features. We therefore propose a new model for single-peaked TSS based solely on known transcription factors (TFs) and their respective regions of positional enrichment. This probabilistic model leads to near-perfect classification results in cross-validation (auROC = 0.98), and performance in genomic scans demonstrates that TSS prediction with both high accuracy and spatial resolution is achievable for a specific but large subgroup of mammalian promoters. The interpretable model structure suggests a DNA code in which canonical sequence features such as TATA-box, Initiator, and GC content do play a significant role, but many additional TFs show distinct spatial biases with respect to TSS location and are important contributors to the accurate prediction of single-peak transcription initiation sites. The model structure also reveals that CAGEtag clusters distal fromannotated gene starts have distinct characteristics compared to those close to gene 59-ends. Using this high-resolution single-peakmodel, we predict TSS for similar to 70% of mammalian microRNAs based on currently available data.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据