4.7 Article

EpiTEAmDNA: Sequence feature representation via transfer learning and ensemble learning for identifying multiple DNA epigenetic modification types across species

Journal

COMPUTERS IN BIOLOGY AND MEDICINE
Volume 160, Issue -, Pages -

Publisher

PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.compbiomed.2023.107030

Keywords

Bioinformatics; DNA methylation; Transfer learning; Ensemble learning; Nucleotide sequence representation

Ask authors/readers for more resources

DNA methylation is a major epigenetic modification that regulates biological processes without altering the DNA sequence. This study proposes a feature representation framework called EpiTEAmDNA, which integrates convolutional neural network and conventional machine learning methods. It shows improved performances compared to existing deep learning methods on small datasets across multiple DNA methylation types.
Methylation is a major DNA epigenetic modification for regulating the biological processes without altering the DNA sequence, and multiple types of DNA methylations have been discovered, including 6mA, 5hmC, and 4mC. Multiple computational approaches were developed to automatically identify the DNA methylation residues using machine learning or deep learning algorithms. The machine learning (ML) based methods are difficult to be transferred to the other predicting tasks of the DNA methylation sites using additional knowledge. Deep learning (DL) may facilitate the transfer learning of knowledge from similar tasks, but they are often ineffective on small datasets. This study proposes an integrated feature representation framework EpiTEAmDNA based on the stra-tegies of transfer learning and ensemble learning, which is evaluated on multiple DNA methylation types across 15 species. EpiTEAmDNA integrates convolutional neural network (CNN) and conventional machine learning methods, and shows improved performances than the existing DL-based methods on small datasets when no additional knowledge is available. The experimental data suggests that the EpiTEAmDNA models may be further improved via transfer learning based on additional knowledge. The evaluation experiments on the independent test datasets also suggest that the proposed EpiTEAmDNA framework outperforms the existing models in most prediction tasks of the 3 DNA methylation types across 15 species. The source code, pre-trained global model, and the EpiTEAmDNA feature representation framework are freely available at http://www.healthinformat icslab.org/supp/.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available