4.7 Article

An intelligent decision support system for the accurate diagnosis of cervical cancer

Journal

KNOWLEDGE-BASED SYSTEMS
Volume 245, Issue -, Pages -

Publisher

ELSEVIER
DOI: 10.1016/j.knosys.2022.108634

Keywords

Machine learning; Decision support system; Cervical cancer; Genetic algorithm; SMOTE

Ask authors/readers for more resources

This paper presents an intelligent decision support system for cervical cancer diagnosis using risk factors from a publicly available dataset. A novel hybrid resampling technique is proposed to address class imbalance, while a Genetic Algorithm (GA) is applied to identify key risk factors. The combination of the two provides the best possible performance for cervical cancer diagnosis.
Cervical cancer is the fourth most commonly diagnosed cancer and one of the leading causes of cancer-related deaths among females worldwide. In this paper, we present an intelligent decision support system for the diagnosis of cervical cancer using risk factors outlined in a publicly available dataset. The dataset contains a large imbalance between positive and negative instances. Although sampling techniques may be utilized to address this, due to the high level of imbalance, oversampling or undersampling alone is insufficient to create an adequate balance between the classes, which is crucial for appropriate diagnosis. Hence, we propose a novel resampling technique that hybridizes oversampling and undersampling to induce a proper balance between the two classes. The hybrid strategy ensures that neither the majority class nor the minority class suffers from a reduction in performance or gets overfitted, as would be the case if oversampling or undersampling were used unilaterally. To further enhance the performance of the classifiers, Genetic Algorithm (GA) is applied to identify the key risk factors for cervical cancer diagnosis. Using the optimized feature set of only 8 features out of 32 procured by GA, the Random Forest classifier provided the maximum G-mean score of 94.47%, along with a sensitivity and specificity of 94.25% and 94.69%, respectively. Thus, our proposed hybrid resampling strategy effectively addresses class imbalance, while GA identifies the most important features to maximize the class separation, and the combination of the two provides the best possible performance for the diagnosis of cervical cancer. (C)& nbsp;2022 Elsevier B.V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available