☆ 4.6 Article

Identification of Triple Negative Breast Cancer Genes Using Rough Set Based Feature Selection Algorithm & Ensemble Classifier

HUMAN-CENTRIC COMPUTING AND INFORMATION SCIENCES (2022)

Journal

HUMAN-CENTRIC COMPUTING AND INFORMATION SCIENCES

Volume 12, Issue -, Pages -

Publisher

KOREA INFORMATION PROCESSING SOC

DOI: 10.22967/HCIS.2022.12.054

Keywords

Ensemble Classifier; Machine-Learning Technique; Microarray Data; Robust Multi-Array Average Technique; Rough Set Theory; Triple Negative Breast Cancer

Funding

Ministry of Education, Youth, and Sports [SP2022/18, SP2022/34, SP2022/5]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

Microarray datasets have been important in detecting triple negative breast cancer (TNBC) in recent decades. However, classifying microarray data is challenging due to redundant and irrelevant features. Feature selection is essential to eliminate non-required features from the system. The rough set-based feature selection algorithm is used to choose optimal feature values, and the ensemble classification technique is employed for classifying low-risk genes (non-TNBC) and high-risk genes (TNBC). Experimental evaluation demonstrates that the ensemble-based rough set model achieves a mean accuracy of 97.24%, surpassing other comparative machine learning techniques.

In recent decades, microarray datasets have played an important role in triple negative breast cancer (TNBC) detection. Microarray data classification is a challenging process due to the presence of numerous redundant and irrelevant features. Therefore, feature selection becomes irreplaceable in this research field that eliminates non-required feature vectors from the system. The selection of an optimal number of features significantly reduces the NP hard problem, so a rough set-based feature selection algorithm is used in this manuscript for selecting the optimal feature values. Initially, the datasets related to TNBC are acquired from gene expression omnibuses like GSE45827, GSE76275, GSE65194, GSE3744, GSE21653, and GSE7904. Then, a robust multi-array average technique is used for eliminating the outlier samples of TNBC/non-TNBC which helps enhancing classification performance. Further, the pre-processed microarray data are fed to a rough set theory for optimal gene selection, and then the selected genes are given as the inputs to the ensemble classification technique for classifying low-risk genes (non-TNBC) and high-risk genes (TNBC). The experimental evaluation showed that the ensemble-based rough set model obtained a mean accuracy of 97.24%, which superior related to other comparative machine learning techniques.

Identification of Triple Negative Breast Cancer Genes Using Rough Set Based Feature Selection Algorithm & Ensemble Classifier

Journal

HUMAN-CENTRIC COMPUTING AND INFORMATION SCIENCES

Publisher

KOREA INFORMATION PROCESSING SOC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Identification of Triple Negative Breast Cancer Genes Using Rough Set Based Feature Selection Algorithm & Ensemble Classifier

Journal

HUMAN-CENTRIC COMPUTING AND INFORMATION SCIENCES

Publisher

KOREA INFORMATION PROCESSING SOC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper