4.7 Article

Machine learning integrated ensemble of feature selection methods followed by survival analysis for predicting breast cancer subtype specific miRNA biomarkers

Journal

COMPUTERS IN BIOLOGY AND MEDICINE
Volume 131, Issue -, Pages -

Publisher

PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.compbiomed.2021.104244

Keywords

Breast cancer; Cox regression; Drug repurposing; Feature selection; Machine learning; miRNA sequencing

Funding

  1. Department of Science and Technology,India [DST/INT/POL/P-36/2016]

Ask authors/readers for more resources

Breast cancer is the second most common cancer type among females, and microRNAs play a crucial role in regulating gene expressions in the post-transcriptional phase. Through the use of Next Generation Sequencing techniques, significant miRNA biomarkers have been identified, and a Machine Learning approach has been proposed for feature selection and survival analysis.
Breast cancer is the second leading cancer type among females. In this regard, it is found that microRNAs play an important role by regulating the gene expressions at the post-transcriptional phase. However, identification of the most influencing miRNAs in breast cancer subtypes is a challenging task, while the recent advancement in Next Generation Sequencing techniques allows analyzing high throughput expression data of miRNAs. Thus, we have conducted this research with the help of NGS data of breast cancer in order to identify the most significant miRNA biomarkers. The selected miRNA biomarkers are highly associated with the multiple breast cancer subtypes. For this purpose, a two-phase technique, called Machine Learning Integrated Ensemble of Feature Selection Methods, followed by survival analysis, is proposed. In the first phase, we have selected the best among seven machine learning techniques based on classification accuracy using the entire set of features (in this case miRNAs). Subsequently, eight different feature selection methods are used separately in order to rank the fea-tures and validate each set of top features using the selected machine learning technique by considering a multi-class classification task of the breast cancer subtypes. In the second phase, based on the classification accuracy values, the top features from each feature selection method are considered to make an ensemble to provide further categorization of the miRNAs as 8*, 7* up to 1*. The 8* miRNAs provide the highest average classifi-cation accuracy of 86% after 10-fold cross-validation. Thereafter, 27 miRNAs are identified from the list that is confined within 8* to 4* miRNAs based on their importance in survival for breast cancer subtypes using Cox regression based survival analysis. Moreover, expression analysis, regulatory network analysis, protein-protein interaction analysis, KEGG pathway and gene ontology enrichment analysis are performed in order to validate biological significance of the proposed solution. Additionally, we have prepared a miRNA-protein-drug inter-action network to identify possible drug for the selected miRNAs. Thus, our findings may be considered during a clinical trial for the treatment of breast cancer patients.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available