☆ 4.6 Article

Random Partition Based Adaptive Distributed Kernelized SVM for Big Data

IEEE ACCESS (2022)

Journal

IEEE ACCESS

Volume 10, Issue -, Pages 95623-95637

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/ACCESS.2022.3204114

Keywords

Support vector machines; Distributed databases; Training data; Big Data; Data models; Optimization; Learning systems; Distributed processing; Storage management; Classification algorithms; Distributed learning; large datasets; SVM; classification; distributed processing; distributed storage

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

In this paper, a distributed classification technique for big data is presented, which efficiently utilizes distributed storage architecture and data processing units of a cluster. The proposed method does not require pre-structured data partitioning technique and is adaptive to big data mining tools. Extensive empirical analysis shows the effectiveness of the classifiers on benchmark datasets compared to other existing approaches.

In this paper, we present a distributed classification technique for big data by efficiently using distributed storage architecture and data processing units of a cluster. While handling such large data, the existing approaches consider specific data partitioning techniques which demand complete data be processed before partitioning. This leads to an excessive overhead of high computation and data communication. The proposed method does not require any pre-structured data partitioning technique and is also adaptive to big data mining tools. We hypothesize that an effective aggregation of the information generated from data partitions by subprocesses of the complete learning process can lead to accurate prediction results while reducing the overall time complexity. We build three SVM based classifiers, namely one phase voting SVM (1PVSVM), two phase voting SVM (2PVSVM), and similarity based SVM (SIMSVM). Each of these classifiers utilizes the support vectors as the local information to construct the synthesized learner for efficiently reducing the training time and ensuring minimal communication between processing units. In this context, an extensive empirical analysis demonstrates the effectiveness of our classifiers when compared to other existing approaches on several benchmark datasets. However, among existing methods and three of our proposed (1PVSVM, 2PVSIM, and SIMSVM) methods, SIMSVM is the most efficient. Considering MNIST dataset, SIMSVM achieves an average speedup ratio of 0.78 and minimum scalability of 73% when the data size is scaled up to 10 times. It also retains high accuracy (99%) similar to centralized approaches.

Random Partition Based Adaptive Distributed Kernelized SVM for Big Data

Journal

IEEE ACCESS

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Random Partition Based Adaptive Distributed Kernelized SVM for Big Data

Journal

IEEE ACCESS

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper