4.7 Article

An Integrative Framework for Combining Sequence and Epigenomic Data to Predict Transcription Factor Binding Sites Using Deep Learning

出版社

IEEE COMPUTER SOC
DOI: 10.1109/TCBB.2019.2901789

关键词

Bioinformatics; machine learning; transcription factors binding sites; convolutional neural networks; DNA accessibility; histone modification

资金

  1. National Natural Science Foundation of China [61873202, 61473232, 11661141019, 61621003]
  2. Strategic Priority Research Program of the Chinese Academy of Sciences (CAS) [XDB13040600]
  3. National Ten Thousand Talent Program for Young Top-notch Talents
  4. Key Research Program of the Chinese Academy of Sciences [KFZD-SW-219]
  5. CAS Frontier Science Research Key Project for Top Young Scientist [QYZDB-SSW-SYS008]

向作者/读者索取更多资源

The study focuses on predicting transcription factor binding sites using convolutional neural networks and explores the role of histone modifications and chromatin accessibility in this process. By evaluating different network structures and sample lengths, it is found that the contributions from these three types of data are complementary. The integrative CNN framework outperforms traditional machine learning methods.
Knowing the transcription factor binding sites (TFBSs) is essential for modeling the underlying binding mechanisms and follow-up cellular functions. Convolutional neural networks (CNNs) have outperformed methods in predicting TFBSs from the primary DNA sequence. In addition to DNA sequences, histone modifications and chromatin accessibility are also important factors influencing their activity. They have been explored to predict TFBSs recently. However, current methods rarely take into account histone modifications and chromatin accessibility using CNN in an integrative framework. To this end, we developed a general CNN model to integrate these data for predicting TFBSs. We systematically benchmarked a series of architecture variants by changing network structure in terms of width and depth, and explored the effects of sample length at flanking regions. We evaluated the performance of the three types of data and their combinations using 256 ChIP-seq experiments and also compared it with competing machine learning methods. We find that contributions from these three types of data are complementary to each other. Moreover, the integrative CNN framework is superior to traditional machine learning methods with significant improvements.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据