4.6 Article

Triphasic DeepBRCA-A Deep Learning-Based Framework for Identification of Biomarkers for Breast Cancer Stratification

Journal

IEEE ACCESS
Volume 9, Issue -, Pages 103347-103364

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/ACCESS.2021.3093616

Keywords

Breast cancer; Biomarkers; Gene expression; Neural networks; Deep learning; Cancer; Tools; Auto-encoder; biomarker genes; breast cancer subtype classification; deep learning; Innvestigate tool; TCGA

Ask authors/readers for more resources

Through the deep learning framework Triphasic DeepBRCA, classification of breast cancer subtypes based on gene expression data was achieved, identifying 54 most variant genes, with over 30 genes significantly linked to prognostic outcomes.
Breast cancer being major death-leading cancer demands utmost attention. Recently, the next-generation sequencing techniques capable of capturing gene expression data have been used successfully for the detection of breast cancer. The proposed work identifies a small set of biomarker genes for molecular stratification of breast cancer subtypes. In this work, we have proposed Triphasic DeepBRCA - a novel deep learning framework, for breast cancer subtype detection and biomarker discovery. In the first phase, an autoencoder is used for extracting a compact representation of the gene expression data which is provided as an input to a supervised feed-forward neural network for classification of breast cancer subtypes in the second phase. In the third phase, the proposed Biomarker Gene Discovery Algorithm (BGDA) leverages the neural network classifier of the second phase to estimate the relevance of various genes. Next, Wilcoxon rank-sum test with False Discovery Rate (FDR) Correction is applied to identify the most differentiating genes. Using the TCGA BRCA RNASeq data, the proposed framework enabled us to discover a set of 54 most-variant genes. Using 10-fold cross-validation, we obtained a mean accuracy of 0.899 +/- 0.04 at 95% confidence interval. We also validated our results on METABRIC dataset. Gene Set Analysis revealed statistically enriched pathways. Heatmap of the expression levels and t-SNE visualization reveals that these genes have an aggregated capability to distinguish amongst the different breast cancer subtypes. Further, the prognostic evaluation using 54 biomarkers revealed that over 30 genes out of 54 are significantly linked to the prognostic outcome.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available