☆ 4.6 Article

A new parallel data geometry analysis algorithm to select training data for support vector machine

AIMS MATHEMATICS (2021)

Journal

AIMS MATHEMATICS

Volume 6, Issue 12, Pages 13931-13953

Publisher

AMER INST MATHEMATICAL SCIENCES-AIMS

DOI: 10.3934/math.2021806

Keywords

support vector machine; sample reduction; geometry analysis; Mahalanobis distance; parallel

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This paper proposes a new parallel data geometry analysis (PDGA) algorithm for Support Vector Machine (SVM), which introduces Mahalanobis distance and cosine angle distance analysis methods to reduce the training set of SVM and improve training efficiency without sacrificing classification accuracy. The algorithm is implemented in parallel, significantly reducing training time and memory requirements, and outperforming five competitive algorithms in terms of performance.

Support vector machine (SVM) is one of the most powerful technologies of machine learning, which has been widely concerned because of its remarkable performance. However, when dealing with the classification problem of large-scale datasets, the high complexity of SVM model leads to low e fficiency and become impractical. Due to the sparsity of SVM in the sample space, this paper presents a new parallel data geometry analysis (PDGA) algorithm to reduce the training set of SVM, which helps to improve the e fficiency of SVM training. The PDGA introduce Mahalanobis distance to measure the distance from each sample to its centroid. And based on this, proposes a method that can identify non support vectors and outliers at the same time to help remove redundant data. When the training set is further reduced, cosine angle distance analysis method is proposed to determine whether the samples are redundant data, ensure that the valuable data are not removed. Di fferent from the previous data geometry analysis methods, the PDGA algorithm is implemented in parallel, which greatly saving the computational cost. Experimental results on artificial dataset and 6 real datasets show that the algorithm can adapt to di fferent sample distributions. Which significantly reduce the training time and memory requirements without sacrificing the classification accuracy, and its performance is obviously better than the other five competitive algorithms.

A new parallel data geometry analysis algorithm to select training data for support vector machine

Journal

AIMS MATHEMATICS

Publisher

AMER INST MATHEMATICAL SCIENCES-AIMS

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

A new parallel data geometry analysis algorithm to select training data for support vector machine

Journal

AIMS MATHEMATICS

Publisher

AMER INST MATHEMATICAL SCIENCES-AIMS

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper