☆ 4.7 Article

An Accurate and Efficient Large-Scale Regression Method Through Best Friend Clustering

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS (2022)

Journal

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS

Volume 33, Issue 11, Pages 3129-3140

Publisher

IEEE COMPUTER SOC

DOI: 10.1109/TPDS.2021.3134336

Keywords

Clustering algorithms; Training; Mathematical models; Computational modeling; Libraries; Kernel; Support vector machines; Distributed machine learning; scalable algorithm; large-scale clustering; parallel regression

Funding

National Natural Science Foundation of China [61972376, 62072431, 62032023]
Science Foundation of Beijing [L182053]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

In this article, a novel data structure is proposed to capture the most important information among data samples, supporting a hierarchical clustering strategy. A parallel library that combines clustering and regression techniques is utilized to accelerate computation and improve accuracy.

As the data size in Machine Learning fields grows exponentially, it is inevitable to accelerate the computation by utilizing the ever-growing large number of available cores provided by high-performance computing hardware. However, existing parallel methods for clustering or regression often suffer from problems of low accuracy, slow convergence, and complex hyperparameter-tuning. Furthermore, the parallel efficiency is usually difficult to improve while striking a balance between preserving model properties and partitioning computing workloads on distributed systems. In this article, we propose a novel and simple data structure capturing the most important information among data samples. It has several advantageous properties supporting a hierarchical clustering strategy that contains well-defined metrics for determining optimal hierarchy, balanced partition for maintaining the clustering property, and efficient parallelization for accelerating computation phases. Then we combine the clustering with regression techniques as a parallel library and utilize a hybrid structure of data and model parallelism to make predictions. Experiments illustrate that our library obtains remarkable performance on convergence, accuracy, and scalability.

An Accurate and Efficient Large-Scale Regression Method Through Best Friend Clustering

Journal

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS

Publisher

IEEE COMPUTER SOC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

An Accurate and Efficient Large-Scale Regression Method Through Best Friend Clustering

Journal

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS

Publisher

IEEE COMPUTER SOC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper