4.7 Article

SAResNet: self-attention residual network for predicting DNA-protein binding

Journal

BRIEFINGS IN BIOINFORMATICS
Volume 22, Issue 5, Pages -

Publisher

OXFORD UNIV PRESS
DOI: 10.1093/bib/bbab101

Keywords

DNA-protein binding; self-attention mechanism; deep residual network; transfer learning; sequence analysis; bioinformatics

Funding

  1. National Natural Science Foundation of China [62072243, 61772273]
  2. Natural Science Foundation of Jiangsu [BK20201304]
  3. Fundamental Research Funds for the Central Universities [30918011104]
  4. National Health and Medical Research Council of Australia (NHMRC) [1092262]
  5. Australian Research Council (ARC) [LP110200333, DP120104460]
  6. National Institute of Allergy and Infectious Diseases of the National Institutes of Health [R01 AI111965]
  7. Major Inter-Disciplinary Research (IDR) project - Monash University

Ask authors/readers for more resources

Understanding the specificity of DNA-protein binding is crucial for gene expression, regulation, and gene therapy mechanisms. The proposed SAResNet method combines the self-attention mechanism and residual network structure, improving the network's learning ability through pre-training strategy and achieving significant performance in ChIP-seq experiments.
Knowledge of the specificity of DNA-protein binding is crucial for understanding the mechanisms of gene expression, regulation and gene therapy. In recent years, deep-learning-based methods for predicting DNA-protein binding from sequence data have achieved significant success. Nevertheless, the current state-of-the-art computational methods have some drawbacks associated with the use of limited datasets with insufficient experimental data. To address this, we propose a novel transfer learning-based method, termed SAResNet, which combines the self-attention mechanism and residual network structure. More specifically, the attention-driven module captures the position information of the sequence, while the residual network structure guarantees that the high-level features of the binding site can be extracted. Meanwhile, the pre-training strategy used by SAResNet improves the learning ability of the network and accelerates the convergence speed of the network during transfer learning. The performance of SAResNet is extensively tested on 690 datasets from the ChIP-seq experiments with an average AUC of 92.0%, which is 4.4% higher than that of the best state-of-the-art method currently available. When tested on smaller datasets, the predictive performance is more clearly improved. Overall, we demonstrate that the superior performance of DNA-protein binding prediction on DNA sequences can be achieved by combining the attention mechanism and residual structure, and a novel pipeline is accordingly developed. The proposed methodology is generally applicable and can be used to address any other sequence classification problems.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available