4.5 Article

A Novel Scalable Kernelized Fuzzy Clustering Algorithms Based on In-Memory Computation for Handling Big Data

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TETCI.2020.3016302

Keywords

Clustering algorithms; Kernel; Big Data; Optimization; Kernelized clustering algorithms; Big Data; Nonlinear separable; in-memory computation

Funding

  1. Council of Scientific and Industrial Research [22(0750)/17/EMR-II]

Ask authors/readers for more resources

The study proposes a novel Kernelized Scalable Random Sampling with Iterative Optimization Fuzzy c-Means (KSRSIO-FCM) clustering algorithm for efficiently clustering non-linear separable data in a Big Data framework. Experimental results demonstrate that the KSRSIO-FCM algorithm achieves significant improvements in time/space complexity and evaluation metrics compared to other scalable clustering algorithms.
Traditional scalable clustering algorithms mainly deal with the clustering of linearly separable data, but it is challenging to cluster the non-linear separable data efficiently in the feature space. In this article, we propose a novel Kernelized Scalable Random Sampling with Iterative Optimization Fuzzy c-Means (KSRSIO-FCM) clustering algorithm using Big Data framework. To propose the KSRSIO-FCM, we also propose the Kernelized version of Scalable Literal Fuzzy c-Means (KSLFCM) clustering algorithm, which is an integral part of the proposed KSRSIO-FCM algorithm. These kernelized clustering algorithms are evolved to deal with the non-linear separable problems by applying a kernel Radial Basis Functions (RBF) which maps the input data space non-linearly into a high dimensional feature space. We aim to design and implement the kernelized fuzzy clustering algorithms on Apache Spark, which performs the efficient clustering of Big Data due to its in-memory cluster computing technique. Exhaustive experiments are performed on various big datasets to show the effectiveness of proposed KSRSIO-FCM in comparison with other scalable clustering algorithms, i.e., KSLFCM, SRSIO-FCM, and SLFCM. The reported experimental results show that the KSRSIO-FCM algorithm in comparison with KSLFCM, SRSIO-FCM, and SLFCM achieves significant improvement in terms of time and space complexity, Normalized Mutual Information (NMI), Adjusted Rand Index (ARI), and F-score, respectively. Furthermore, we have carried out a performance analysis of KSRSIO-FCM versus KSLFCM. Thus, the reported results show that the KSRSIO-FCM implemented on Apache Spark has better potential for Big Data clustering as compared to traditional scalable fuzzy clustering methods.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available