4.7 Review

Gene reduction and machine learning algorithms for cancer classification based on microarray gene expression data: A comprehensive review

Journal

EXPERT SYSTEMS WITH APPLICATIONS
Volume 213, Issue -, Pages -

Publisher

PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.eswa.2022.118946

Keywords

Microarray gene expression; Data reduction; Feature selection; Feature extraction; Machine learning; Cancer classification

Ask authors/readers for more resources

This review explores the applications of machine learning-based data reduction and classification algorithms in microarray gene expression data. It summarizes various data preprocessing methods, reviews different feature selection algorithms, and discusses feature extraction and hybrid methods. It also examines widely used machine learning algorithms for tumor and nontumor classification. Finally, the challenges and unanswered questions in accurate cancer classification and detection are highlighted.
Disease diagnosis and prediction methods in biotechnology and medicine have significantly advanced over time. Consequently, analyzing raw gene expression is crucial for identifying diseases such as cancer. Interest-ingly, microarrays are a tool that records gene expression from deoxyribonucleic acid (DNA) or ribonucleic acid. This technique exhibits intriguing characteristics, such as generating high-dimensional data with a small sample size. However, in the case of such dataset, the classification model is prone to overfitting. This limitation can be overcome by reducing the dimensions of the microarray datasets to a reasonable number. Machine learning (ML)-based data reduction has recently achieved considerable attention in genomic research. Therefore, this review examines recent studies that present state-of-the-art data reduction and classification algorithms for microarray gene expression data to diagnose tumors and analyzes their performance. To the best of our knowledge, this is the first review that provides a comprehensive view of data preprocessing, dimensionality reduction, including feature (i.e., gene) selection, feature extraction, and their hybrid, and ML algorithms. The paper is structured as follows. First, this review summarizes several data preprocessing methods applied to gene expression datasets. Then, a detailed review of various ML-based feature selection algorithms, including filter, wrapper, embedded, ensemble, and hybrid algorithms, is discussed. These algorithms are examined under three main classes-supervised, unsupervised, and semisupervised ML. Next, the feature extraction and hybrid of feature extraction and selection algorithms are thoroughly reviewed. Furthermore, a detailed review of broadly applied ML algorithms to simplify tumor and nontumor classification using microarray datasets is presented. Finally, the challenges and open questions related to gene expression datasets for accurate cancer classification and detection are highlighted.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available