4.7 Article

DeepZF: improved DNA-binding prediction of C2H2-zinc-finger proteins by deep transfer learning

Journal

BIOINFORMATICS
Volume 38, Issue -, Pages ii62-ii67

Publisher

OXFORD UNIV PRESS
DOI: 10.1093/bioinformatics/btac469

Keywords

-

Funding

  1. Israel Science Foundation [358/21]
  2. [ECCB2022]

Ask authors/readers for more resources

This study presents DeepZF, a deep-learning-based pipeline for predicting the binding of C2H2-ZF proteins and their DNA-binding preferences. By using in vivo and in vitro datasets and transfer learning, DeepZF achieved an average Pearson correlation greater than 0.94 for predicting DNA binding positions, outperforming existing methods.
Motivation: Cys2His2 zinc-finger (C2H2-ZF) proteins are the largest class of human transcription factors and hence play central roles in gene regulation and cell function. C2H2-ZF proteins are characterized by a DNA-binding domain containing multiple ZFs. A subset of the ZFs bind diverse DNA triplets. Despite their central roles, little is known about which of their ZFs are binding and how the DNA-binding preferences are encoded in the amino acid sequence of each ZF. Results: We present DeepZF, a deep-learning-based pipeline for predicting binding ZFs and their DNA-binding preferences given only the amino acid sequence of a C2H2-ZF protein. To the best of our knowledge, we compiled the first in vivo dataset of binding and non-binding ZFs for training the first ZF-binding classifier. Our classifier, which is based on a novel protein transformer, achieved an average AUROC of 0.71. Moreover, we took advantage of both in vivo and in vitro datasets to learn the recognition code of ZF-DNA binding through transfer learning. Our newly developed model, which is the first to utilize deep learning for the task, achieved an average Pearson correlation greater than 0.94 over each of the three DNA binding positions. Together, DeepZF outperformed extant methods in the task of C2H2-ZF protein DNA-binding preferences prediction: it achieved an average Pearson correlation of 0.42 in motif similarity compared with an average correlation smaller than 0.1 achieved by extant methods. By applying established interpretability techniques, we show that DeepZF inferred biologically relevant binding principles, such as the effect of amino acid residue positions on ZF DNA-binding potential.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available