4.7 Article

CAE-CNN: Predicting transcription factor binding site with convolutional autoencoder and convolutional neural network

Journal

EXPERT SYSTEMS WITH APPLICATIONS
Volume 183, Issue -, Pages -

Publisher

PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.eswa.2021.115404

Keywords

Transcription factor binding sites; Convolutional neural networks; Motif discovery; Bioinformatics; Autoencoder

Funding

  1. National Natural Science Foundation of China [61772091, 61802035, 61702058, 61962006, 61962038, U1802271, U2001212, 62072311]
  2. China Postdoctoral Science Foundation [2017M612948]
  3. CCF-Huawei Database System Innovation Research Plan [CCF-HuaweiDBIR2020004A]
  4. Sichuan Science and Technology Program [2021JDJQ0021, 2020YFG0153, 20YYJC2785, 2020YJ0481, 2020YFS0466, 2020YJ0430, 2020JDR0164, 2020YFS0399, 2019YFS0067]
  5. Natural Science Foundation of Guangxi [2018GXNSFDA138005]
  6. Guangdong Basic and Applied Basic Research Foundation [2020B1515120028]
  7. Guangxi Bagui Teams for Innovation and Research [201979]
  8. Major Project of Digital Key Laboratory of Sichuan Province in Sichuan Conservatory of Music [21DMAKL02]

Ask authors/readers for more resources

A new method is proposed that combines a convolutional autoencoder with a convolutional neural network to predict transcription factor binding sites using positive DNA nucleotide samples, outperforming existing methods in accuracy and performance.
Transcription factor binding site (TFBS) is a DNA sequence that binds to transcription factor and regulates the transcription process of the gene. Although deep learning algorithms are superior to traditional methods in predicting transcription factor binding site, they often rely too much on negative sample data, which cannot be verified by experiment. In particular, a training model with such negative samples can generate a lot of noisy data and affect the classification performance. In order to cope with the aforementioned drawbacks, we propose a new architecture by combining a convolutional autoencoder with convolutional neural network, which is called CAE-CNN (Convolutional AutoEncoder and Convolutional Neural Network). Specifically, motivated by the image reconstruction, we use a convolutional autoencoder to extract useful features from the positive samples in DNA nucleotides. Consequently, the learned features will be used by the convolutional neural network in the phase of training. Furthermore, we employ a highway connection layer to better capture the features of DNA nucleotides through a gated unit. Extensive experiments based on human and mouse TFBS datasets evaluate the effectiveness of the proposed method for the motif discovery task, outperforming the state-of-the-art methods in accuracy, precision, recall, and AUC value. To the best of our knowledge, the original contribution of this work lies in integrating unsupervised and supervised learning methods to study the TFBS, thereby being able to build a more robust and generative TFBS prediction model.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available