4.6 Article

Auxiliary Diagnosis of Breast Cancer Based on Machine Learning and Hybrid Strategy

Journal

IEEE ACCESS
Volume 11, Issue -, Pages 96374-96386

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/ACCESS.2023.3312305

Keywords

Breast cancer; Sampling; Predictive models; Machine learning; Feature extraction; Data models; Classification algorithms; Clinical diagnosis; machine learning; sample balancing; feature selection; classification forecast

Ask authors/readers for more resources

This study focuses on breast cancer and proposes a hybrid strategy combined with machine learning methods to build an accurate and efficient breast cancer auxiliary diagnosis model. Experimental results show that the new approach achieves better prediction results compared to previous methods.
Breast cancer has replaced lung cancer as the number one cancer among women worldwide. In this paper, we take breast cancer as the research object, and pioneer a hybrid strategy to process the data, and combine the machine learning method to build a more accurate and efficient breast cancer auxiliary diagnosis model. First, the combined sampling method SMOTE-ENN is used to solve the problem of sample imbalance, and the data are standardized to make the data have better separability. Then, the features of the dataset are initially screened using the mutual information method, and further secondary feature selection is performed using the recursive feature elimination method based on the XGBoost algorithm. Thus, the feature dimensionality of the dataset is reduced and the generalization ability of the model is improved. Finally, five different machine learning models are used for classification prediction, the best combination of parameters for each model is found using a grid search method, and the final results of each model are derived using a 10-fold cross-validation method. The experiments are conducted using the Wisconsin Diagnostic Breast Cancer dataset (WDBC), and the results of the study find that after the data are processed by the hybrid strategy, the best prediction results are obtained using the RF model with 99.52% accuracy, which is better than the previous research methods.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available