4.7 Article

EpiTEAmDNA: Sequence feature representation via transfer learning and ensemble learning for identifying multiple DNA epigenetic modification types across species

Related references

Note: Only part of the references are listed.
Article Biochemistry & Molecular Biology

4mCBERT: A computing tool for the identification of DNA N4-methylcytosine sites by sequence- and chemical-derived information based on ensemble learning strategies

Sen Yang et al.

Summary: In this study, a model called 4mCBERT is developed to identify 4mC sites by encoding DNA sequence segments using sequence characteristics such as one-hot encoding, electron-ion interaction pseudopotential, nucleotide chemical property, word2vec, and chemical bidirectional encoder representations from transformers (chemical BERT). The 4mCBERT model shows higher performance than other state-of-the-art models on six independent benchmark datasets, and users can use it to predict 4mC sites and retrain prediction models.

INTERNATIONAL JOURNAL OF BIOLOGICAL MACROMOLECULES (2023)

Article Biochemical Research Methods

EDCNN: identification of genome-wide RNA-binding proteins using evolutionary deep convolutional neural network

Yawei Wang et al.

Summary: In this study, a new algorithm, Evolutionary Deep Convolutional Neural Network (EDCNN), was proposed for identifying protein-RNA interactions. Experimental results demonstrated the superior performance of EDCNN on large-scale CLIP-seq datasets. Furthermore, analyses of time complexity, parameters, and motifs confirmed the effectiveness of the proposed algorithm.

BIOINFORMATICS (2022)

Article Biochemical Research Methods

PreRBP-TL: prediction of species-specific RNA-binding proteins based on transfer learning

Jun Zhang et al.

Summary: A computational method called PreRBP-TL is introduced in this study for identifying species-specific RBPs based on transfer learning, which outperforms other computational methods. The authors have also established a web server for the convenience of researchers.

BIOINFORMATICS (2022)

Article Biochemical Research Methods

StackTADB: a stacking-based ensemble learning model for predicting the boundaries of topologically associating domains (TADs) accurately in fruit flies

Hao Wu et al.

Summary: This study proposes a novel ensemble learning framework called StackTADB for predicting the boundaries of TADs. Through data analysis and performance comparison, StackTADB is shown to have superior performance in predicting TAD boundaries. Additionally, Kmers-based features play an important role in predicting TAD boundaries in fruit flies.

BRIEFINGS IN BIOINFORMATICS (2022)

Article Multidisciplinary Sciences

The WID-BC-index identifies women with primary poor prognostic breast cancer based on DNA methylation in cervical samples

James E. Barrett et al.

Summary: The study developed a DNA methylation-based index called WID-BC-index that can identify women with breast cancer using cervical samples, with high accuracy. The researchers also found that CpGs at progesterone receptor binding sites, which are hypomethylated in breast tissue of women with breast cancer, are also hypomethylated in cervical samples of women with poor prognostic breast cancer, indicating a systemic epigenetic programming defect prevalent in women who develop breast cancer. Validation of the WID-BC-index may have clinical implications in monitoring breast cancer risk.

NATURE COMMUNICATIONS (2022)

Article Biochemical Research Methods

DisoLipPred: accurate prediction of disordered lipid-binding residues in protein sequences with deep recurrent networks and transfer learning

Akila Katuwawala et al.

Summary: DisoLipPred is the first predictor of disordered lipid-binding residues, utilizing innovative features including transfer learning, a bypass module, and expanded inputs to improve predictive quality. The results are accurate and surpass existing tools, providing complementary predictions to current methods.

BIOINFORMATICS (2022)

Article Biochemical Research Methods

scHiCStackL: a stacking ensemble learning-based method for single-cell Hi-C classification using cell embedding

Hao Wu et al.

Summary: This study proposes a high accuracy cell classification algorithm, scHiCStackL, based on single-cell Hi-C data. The algorithm improves the data preprocessing method and constructs a two-layer stacking ensemble model for classifying cells. Experimental results show that scHiCStackL achieves superior performance in predicting cell types using single-cell Hi-C data.

BRIEFINGS IN BIOINFORMATICS (2022)

Article Biochemical Research Methods

HLAB: learning the BiLSTM features from the ProtBert-encoded proteins for the class I HLA-peptide binding prediction

Yaqi Zhang et al.

Summary: This study presents a novel HLAB feature engineering algorithm for detecting HLA-I binding peptides using natural language processing and deep neural networks. The experimental results show that the proposed algorithm outperforms existing methods in predicting peptides binding to specific HLA alleles, achieving the best performance in most prediction tasks.

BRIEFINGS IN BIOINFORMATICS (2022)

Article Biochemical Research Methods

Hyb4mC: a hybrid DNA2vec-based model for DNA N4-methylcytosine sites prediction

Ying Liang et al.

Summary: The study proposes a flexible deep learning-based framework called Hyb4mC for predicting 4mC sites. It adopts the DNA2vec method for sequence embedding and uses two different subnets for further analysis. The experimental results show that Hyb4mC can significantly enhance the performance of predicting 4mC sites.

BMC BIOINFORMATICS (2022)

Article Biochemical Research Methods

Deep4mC: systematic assessment and computational prediction for DNA N4-methylcytosine sites by deep learning

Haodong Xu et al.

Summary: This study reviewed the computational prediction of 4mC sites and developed Deep4mC, a deep learning-based model, which showed high accuracy and robust performance in predicting putative 4mC sites in genomes of various species. With feature optimization and reinforcement learning, Deep4mC achieved significant improvement in performance compared to previous tools.

BRIEFINGS IN BIOINFORMATICS (2021)

Article Biochemical Research Methods

DeepTorrent: a deep learning-based approach for predicting DNA N4-methylcytosine sites

Quanzhong Liu et al.

Summary: A deep learning approach named DeepTorrent is proposed for improved prediction of 4mC sites from DNA sequences, utilizing multi-layer convolutional neural networks with an inception module integrated with bidirectional long short-term memory to learn higher-order feature representations. Dimension reduction and concatenated feature maps from filters of different sizes are applied for prediction.

BRIEFINGS IN BIOINFORMATICS (2021)

Article Biochemical Research Methods

DeepYY1: a deep learning approach to identify YY1-mediated chromatin loops

Fu-Ying Dao et al.

Summary: YY1 protein forms dimers that enhance enhancer-promoter interactions, a general feature of mammalian gene control. A deep learning algorithm named DeepYY1 has been developed to efficiently identify YY1-mediated chromatin loops. Sequences play a crucial role in the formation of YY1-mediated chromatin loops.

BRIEFINGS IN BIOINFORMATICS (2021)

Review Biochemical Research Methods

iDNA-ABT: advanced deep learning model for detecting DNA methylation with adaptive features and transductive information maximization

Yingying Yu et al.

Summary: DNA methylation is important in disease development and understanding. The iDNA-ABT deep learning model outperforms existing methods in predicting different DNA methylation types, demonstrating strong adaptability and generalization across species.

BIOINFORMATICS (2021)

Review Oncology

DNA methylation age as a biomarker for cancer

Chung-Ho E. Lau et al.

Summary: Cancer is known to be associated with aging, with declines in DNA repair and epigenetic maintenance mechanisms potentially contributing to cancer development. DNA methylation levels, as assessed through epigenetic clocks, are promising markers for predicting cancer risk and mortality. Further research on biological age biomarkers is likely to provide more insights into the connections between aging and cancer.

INTERNATIONAL JOURNAL OF CANCER (2021)

Article Biochemical Research Methods

BERT4Bitter: a bidirectional encoder representations from transformers (BERT)-based model for improving the prediction of bitter peptides

Phasit Charoenkwan et al.

Summary: The BERT4Bitter model presented in this study utilizes a bidirectional encoder representation from transformers to predict bitter peptides directly from their amino acid sequence, achieving superior performance. Compared to traditional machine learning models, BERT4Bitter shows significant improvements in accuracy and correlation coefficient, offering a new approach for identifying bitter peptides.

BIOINFORMATICS (2021)

Article Biochemical Research Methods

The stacking strategy-based hybrid framework for identifying non-coding RNAs

Xin Wang et al.

Summary: The study presents a hybrid framework for identifying ncRNAs, incorporating eight features including predicted peptides. The framework performs well in cross-species ncRNAs identification, especially achieving high accuracy in datasets of Arabidopsis, worm, and zebrafish.

BRIEFINGS IN BIOINFORMATICS (2021)

Article Biochemical Research Methods

A transformer architecture based on BERT and 2D convolutional neural network to identify DNA enhancers from sequence information

Nguyen Quoc Khanh Le et al.

Summary: The study incorporated BERT-based multilingual model in bioinformatics to represent DNA sequence information, showing significant improvement in sensitivity, specificity, accuracy, and Matthews correlation coefficient for DNA enhancer prediction. Advanced experiments revealed the potential of deep learning, particularly through 2D CNN, in learning BERT features for biological modeling.

BRIEFINGS IN BIOINFORMATICS (2021)

Article Biochemical Research Methods

Transfer learning via multi-scale convolutional neural layers for human-virus protein-protein interaction prediction

Xiaodi Yang et al.

Summary: This study uses machine learning and transfer learning methods to predict human-virus protein interactions, utilizing a combination of Siamese CNN architecture and multi-layer perceptron for improved predictions. The introduced transfer learning methods reliably predict interactions in different domains by retraining CNN layers.

BIOINFORMATICS (2021)

Article Biochemical Research Methods

NeuroPpred-Fuse: an interpretable stacking model for prediction of neuropeptides by fusing sequence information and feature selection methods

Mingming Jiang et al.

Summary: This study developed an interpretable stacking model, NeuroPpred-Fuse, for the prediction of neuropeptides through fusing sequence-derived features and feature selection methods. The model achieved 90.6% accuracy and 95.8% AUC on the independent test set, outperforming current state-of-the-art models, demonstrating strong generalization ability.

BRIEFINGS IN BIOINFORMATICS (2021)

Article Multidisciplinary Sciences

Attention-based multi-label neural networks for integrated prediction and interpretation of twelve widely occurring RNA modifications

Zitao Song et al.

Summary: Recent study introduces MultiRM, a method that predicts and interprets twelve common post-transcriptional RNA modifications simultaneously, revealing potential associations among different types of RNA modifications. This research offers a solution for detecting multiple RNA modifications and gaining a deeper understanding of the mechanisms behind sequence-based RNA modifications.

NATURE COMMUNICATIONS (2021)

Article Genetics & Heredity

The DNA methylation landscape of advanced prostate cancer

Shuang G. Zhao et al.

NATURE GENETICS (2020)

Article Medicine, Research & Experimental

An Interpretable Prediction Model for Identifying N7-Methylguanosine Sites Based on XGBoost and SHAP

Yue Bi et al.

MOLECULAR THERAPY-NUCLEIC ACIDS (2020)

Article Biochemical Research Methods

MTTFsite: cross-cell type TF binding site prediction by using multi-task learning

Jiyun Zhou et al.

BIOINFORMATICS (2019)

Article Medicine, Research & Experimental

iPseU-CNN: Identifying RNA Pseudouridine Sites Using Convolutional Neural Networks

Muhammad Tahir et al.

MOLECULAR THERAPY-NUCLEIC ACIDS (2019)

Review Oncology

DNA Methylation Readers and Cancer: Mechanistic and Therapeutic Applications

Niaz Mahmood et al.

FRONTIERS IN ONCOLOGY (2019)

Article Genetics & Heredity

SNNRice6mA: A Deep Learning Method for Predicting DNA N6-Methyladenine Sites in Rice Genome

Haitao Yu et al.

FRONTIERS IN GENETICS (2019)

Article Computer Science, Information Systems

4mCCNN: Identification of N4-Methylcytosine Sites in Prokaryotes Using Convolutional Neural Network

Jhabindra Khanal et al.

IEEE ACCESS (2019)

Review Cell Biology

Dynamics and function of DNA methylation in plants

Huiming Zhang et al.

NATURE REVIEWS MOLECULAR CELL BIOLOGY (2018)

Article Cell Biology

Pan-Cancer Landscape of Aberrant DNA Methylation across Human Tumors

Sadegh Saghafinia et al.

CELL REPORTS (2018)

Article Multidisciplinary Sciences

Knowledge-transfer learning for prediction of matrix metalloprotease substrate-cleavage sites

Yanan Wang et al.

SCIENTIFIC REPORTS (2017)

Article Biochemical Research Methods

HIPred: an integrative approach to predicting haploinsufficient genes

Hashem A. Shihab et al.

BIOINFORMATICS (2017)

Article Biochemistry & Molecular Biology

Identification of methylated deoxyadenosines in vertebrates reveals diversity in DNA modifications

Magdalena J. Koziol et al.

NATURE STRUCTURAL & MOLECULAR BIOLOGY (2016)

Article Biochemistry & Molecular Biology

Identification of methylated deoxyadenosines in vertebrates reveals diversity in DNA modifications

Magdalena J. Koziol et al.

NATURE STRUCTURAL & MOLECULAR BIOLOGY (2016)

Article Biochemistry & Molecular Biology

N6-Methyldeoxyadenosine Marks Active Transcription Start Sites in Chlamydomonas

Ye Fu et al.

Review Neurosciences

DNA Methylation and Its Basic Function

Lisa D. Moore et al.

NEUROPSYCHOPHARMACOLOGY (2013)

Article Developmental Biology

Tet family proteins and 5-hydroxymethylcytosine in development and disease

Li Tan et al.

DEVELOPMENT (2012)

Article Computer Science, Artificial Intelligence

Feature Selection with Conjunctions of Decision Stumps and Learning from Microarray Data

Mohak Shah et al.

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (2012)

Article Biochemical Research Methods

Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences

Weizhong Li et al.

BIOINFORMATICS (2006)

Review Genetics & Heredity

DNA methylation and human disease

KD Robertson

NATURE REVIEWS GENETICS (2005)