4.2 Article

Classification of lung cancer stages with machine learning over big data healthcare framework

Journal

Publisher

SPRINGER HEIDELBERG
DOI: 10.1007/s12652-020-02071-2

Keywords

Benign; Health care; Machine learning; Malignant; Map-reduce; Sputum

Ask authors/readers for more resources

This study demonstrates the effectiveness of combining machine learning algorithms and Apache Spark to design an architecture for effective classification of lung cancer images and lesion severity, showcasing the superiority of SVM in this arena and achieving promising results.
With the fast pace in collating big data healthcare framework and accurate prediction in detection of lung cancer at early stages, machine learning gives the best of both worlds. In this paper, a streamlining of machine learning algorithms together with apache spark designs an architecture for effective classification of images and stages of lung cancer to the greatest extent. We experiment on a combination of binary classification (SVM-non linear SVM with Radial Basis Function RBF) and Multi-class classification (WTA-SVM winner-takes-all with support vector machine) with threshold technique (T-BMSVM) to classify nodules into malignant or benign nodules and also their malignancy levels respectively. The dataset used for processing is sputum cell images that have been collected from microscope lab images. We have argued for handling and processing large sizes of data sets as sputum cell images in the field of classification using the map-reduce framework in MATLAB and Pyspark, which works better with Apache spark. Our approach outperforms the other methods by achieving stability even in increasing dataset size in leaps and bounds and with a minimum error rate. It achieves 86% accuracy and other metrics are AUC-0.88, misclassification rate through which it was proved that Support Vector Machine (SVM) outperforms other classifiers. These outsourced outcomes reveal that extracting properties of features extracted from the lung cancer images successfully and SVM combined with binary classification, even classification works better with Multi-class rather than SVM, therefore, may be considered as a promising tool to diagnose the stages of nodules and classify the severity of cancer. Also, Scalability and convergence analysis embed to prove the improving results of multi-class classification than SVM.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.2
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available