4.7 Article

Machine learning assisted analysis of breast cancer gene expression profiles reveals novel potential prognostic biomarkers for triple-negative breast cancer

Journal

COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL
Volume 20, Issue -, Pages 1618-1631

Publisher

ELSEVIER
DOI: 10.1016/j.csbj.2022.03.019

Keywords

TNBC; Differential gene expression; Distant-metastasis free survival; Prognostic gene signatures; POU2AF1; S100B

Funding

  1. Department of Biotechnology (DBT), Government of India [BT/PR40151/BTIS/137/5/2021]
  2. Council of Scientific and Industrial Research (CSIR), Government of India [09/512(0227)/2017-EMR-I, 09/512 (0212)/2016-EMR-I]
  3. GlaxoSmithKline (GSK, India) [RCB/PhD-BI/2020/1016]

Ask authors/readers for more resources

Tumor heterogeneity and unclear metastasis mechanisms are the main challenges in treating Triple-negative breast cancer (TNBC). In this study, gene expression datasets were analyzed to identify gene signatures that can differentiate TNBC from other breast cancer subtypes. Machine learning algorithms were used to evaluate the performance of these signatures, and potential prognostic genes were discovered. Pathway enrichment analyses revealed the functional role of these genes in the metastasis cascade.
Tumor heterogeneity and the unclear metastasis mechanisms are the leading cause for the unavailability of effective targeted therapy for Triple-negative breast cancer (TNBC), a breast cancer (BrCa) subtype characterized by high mortality and high frequency of distant metastasis cases. The identification of prognostic biomarker can improve prognosis and personalized treatment regimes. Herein, we collected gene expression datasets representing TNBC and Non-TNBC BrCa. From the complete dataset, a subset reflecting solely known cancer driver genes was also constructed. Recursive Feature Elimination (RFE) was employed to identify top 20, 25, 30, 35, 40, 45, and 50 gene signatures that differentiate TNBC from the other BrCa subtypes. Five machine learning algorithms were employed on these selected features and on the basis of model performance evaluation, it was found that for the complete and driver dataset, XGBoost performs the best for a subset of 25 and 20 genes, respectively. Out of these 45 genes from the two datasets, 34 genes were found to be differentially regulated. The Kaplan-Meier (KM) analysis for Distant Metastasis Free Survival (DMFS) of these 34 differentially regulated genes revealed four genes, out of which two are novel that could be potential prognostic genes (POU2AF1 and S100B). Finally, interactome and pathway enrichment analyses were carried out to investigate the functional role of the identified potential prognostic genes in TNBC. These genes are associated with MAPK, PI3-AkT, Wnt, TGF-b, and other signal transduction pathways, pivotal in metastasis cascade. These gene signatures can provide novel molecular-level insights into metastasis.(c) 2022 The Authors. Published by Elsevier B.V. on behalf of Research Network of Computational and Structural Biotechnology. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available