4.6 Article

DEGnext: classification of differentially expressed genes from RNA-seq data using a convolutional neural network with transfer learning

Journal

BMC BIOINFORMATICS
Volume 23, Issue 1, Pages -

Publisher

BMC
DOI: 10.1186/s12859-021-04527-4

Keywords

Differentially expressed genes; Convolutional neural network; Classification; Transfer learning; Disease biomarkers

Funding

  1. Fulbright-Nehru Fellowship

Ask authors/readers for more resources

In this study, a CNN-based model called DEGnext was developed to predict upregulating (UR) and downregulating (DR) genes from RNA-seq cancer datasets. The results showed that DEGnext outperforms traditional machine learning methods and effectively transfers learned feature maps to unseen datasets, which is significant for biomarker research and analysis of other disease datasets.
Background: A limitation of traditional differential expression analysis on small datasets involves the possibility of false positives and false negatives due to sample variation. Considering the recent advances in deep learning (DL) based models, we wanted to expand the state-of-the-art in disease biomarker prediction from RNA-seq data using DL. However, application of DL to RNA-seq data is challenging due to absence of appropriate labels and smaller sample size as compared to number of genes. Deep learning coupled with transfer learning can improve prediction performance on novel data by incorporating patterns learned from other related data. With the emergence of new disease datasets, biomarker prediction would be facilitated by having a generalized model that can transfer the knowledge of trained feature maps to the new dataset. To the best of our knowledge, there is no Convolutional Neural Network (CNN)-based model coupled with transfer learning to predict the significant upregulating (UR) and downregulating (DR) genes from both trained and untrained datasets. Results: We implemented a CNN model, DEGnext, to predict UR and DR genes from gene expression data obtained from The Cancer Genome Atlas database. DEGnext uses biologically validated data along with logarithmic fold change values to classify differentially expressed genes (DEGs) as UR and DR genes. We applied transfer learning to our model to leverage the knowledge of trained feature maps to untrained cancer datasets. DEGnext's results were competitive (ROC scores between 88 and 99%) with those of five traditional machine learning methods: Decision Tree, K-Nearest Neighbors, Random Forest, Support Vector Machine, and XGBoost. DEGnext was robust and effective in terms of transferring learned feature maps to facilitate classification of unseen datasets. Additionally, we validated that the predicted DEGs from DEGnext were mapped to significant Gene Ontology terms and pathways related to cancer. Conclusions: DEGnext can classify DEGs into UR and DR genes from RNA-seq cancer datasets with high performance. This type of analysis, using biologically relevant fine-tuning data, may aid in the exploration of potential biomarkers and can be adapted for other disease datasets.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available