4.6 Article

Predictive modelling for molecular cancer profile classification using hybrid learning techniques

期刊

SOFT COMPUTING
卷 -, 期 -, 页码 -

出版社

SPRINGER
DOI: 10.1007/s00500-023-08126-8

关键词

Cancer prediction; Random forest; Decision tree; t-SNE algorithm; Dimensionality reduction; Classification

向作者/读者索取更多资源

Cancer is caused by the transformation of a normal cell into an uncontrollable malignant tumor cell. Microarray analysis is used to classify and predict cancers. Although gene expression classifications remain a challenge, recent advances in microarray technologies have made it possible to discover tumor-specific biomarkers and improve accuracy.
Cancer is caused when a common cell is progressively transformed into an uncontrollable tumorous cell which is a malignant one. Microarray analysis is used in classifications and prognosis of cancers. Recent advances in microarray technologies have made it possible to process thousands of gene expressions parallelly to discover novel tumour-specific biomarkers from the gene expressions. Though several studies have classified cancers using statistical, probabilistic, and machine learning-based approaches, gene expression classifications remain a challenge as they are matters of life and death for patients. However, according to earlier studies, the microarray in genomic data has two significant challenges: first, selecting prominent genes from microarray profiles, and second, choosing the minimum number of genes for good diagnostic classification. A hybrid feature selection technique that combines RF (random forest) with PSO (particle swarm optimization) is suggested in this research work. Moreover, PCA (principal component analysis) is used to select gene features and reduce dimensionalities. Subsequently, AE (autoencoder) forecasts cancer types using filtered gene inputs. An empirical study reveals that the proposed hybrid approach yielded 98.77% accuracy. The Cancer Genome Atlas RNA-Seq PANCAN dataset containing gene expressions for five types of cancer patients is used for experimentation. The identical set of genes was fed into five benchmark classification algorithms to assess the model's efficacy. The system's effectiveness is assessed using the performance metrics of accuracy, precision, recall, F1-score, and ROC. Comparative investigation demonstrates that our model performed better than others and reached higher accuracy. Though we reached 98.77% accuracy, our system lacks to handle the dataset imbalance problem. For this, we look forward to use a data augmentation strategy to generate synthetic data samples and build a robust learning model. We also aim to design a dataset of omics and non-omics features for a better understanding of a disease.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据