4.2 Article

Predicting breast cancer survivability based on machine learning and features selection algorithms: a comparative study

Journal

Publisher

SPRINGER HEIDELBERG
DOI: 10.1007/s12652-020-02590-y

Keywords

BI-RADS; Breast cancer (BC); Classification; Feature selection; Machine learning algorithms; WBC; WDBC; WPBC

Funding

  1. Deanship of Scientific Research at Princess Nourah bint Abdulrahman University through the Fast-track Research Funding Program

Ask authors/readers for more resources

This study aims to identify breast cancer early through machine learning algorithms and feature selection methods. Experimental results indicate that the classification based on RF technique with Genetic Algorithm as a feature selection method in WBC dataset achieved the best accuracy of 96.82%.
Breast cancer (BC) is considered the most common cause of cancer deaths in women. This study aims to identify BC early based on machine learning algorithms and features selection methods. The overall methodology of this work was modified based on knowledge data discovery (KDD) process, which include four datasets, preprocessing phase (data cleaning, data splitting to training and testing sets), processing phase (feature selection, k-folds validation, and classification) and finally model evaluation. This paper presents a comparison between different classifiers such as decision tree (DT), random forest (RF), logistic regression (LR), Naive Bayes (NB), K-nearest neighbor (KNN), and support vector machine (SVM). Four different breast cancer datasets (Wisconsin prognosis breast cancer (WPBC), Wisconsin diagnosis breast cancer (WDBC), Wisconsin Breast Cancer (WBC), and Mammographic Mass Dataset (MM-Dataset) based on BI-RADS findings) are conducted in the experiments. The proposed models were evaluated by utilizing classification accuracy and confusion matrix. The experimental results indicate that the classification based on RF technique with the Genetic Algorithm (GA) as a feature selection method is better than the other classifiers with an accuracy value 96.82% using WBC dataset. In WDBC dataset, the results indicate that the classification utilizing C-SVM technique with the applied kernel function RBF (Radial Basis Function) is superior to the other classifiers with an accuracy value 99.04%. In WPBC dataset, the results indicate that the classification using RF technique with recursive feature elimination (RFE) as a feature selection method is better than the other classifiers with an accuracy value 74.13%. In MM-Dataset, the results indicate that the classification using DT technique is better than the other classifiers with an accuracy value 83.74%. The findings indicate that the proposed models are effective by comparing with others existing models.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.2
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available