☆ 4.0 Article

JAMPI: Efficient Matrix Multiplication in Spark Using Barrier Execution Mode

BIG DATA AND COGNITIVE COMPUTING (2020)

Journal

BIG DATA AND COGNITIVE COMPUTING

Volume 4, Issue 4, Pages -

Publisher

MDPI

DOI: 10.3390/bdcc4040032

Keywords

Apache Spark; distributed computing; distributed matrix algebra; deep learning; matrix primitives; 68W15

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

The new barrier mode in Apache Spark allows for embedding distributed deep learning training as a Spark stage to simplify the distributed training workflow. In Spark, a task in a stage does not depend on any other tasks in the same stage, and hence it can be scheduled independently. However, several algorithms require more sophisticated inter-task communications, similar to the MPI paradigm. By combining distributed message passing (using asynchronous network IO), OpenJDK's new auto-vectorization and Spark's barrier execution mode, we can add non-map/reduce-based algorithms, such as Cannon's distributed matrix multiplication to Spark. We document an efficient distributed matrix multiplication using Cannon's algorithm, which significantly improves on the performance of the existing MLlib implementation. Used within a barrier task, the algorithm described herein results in an up to 24% performance increase on a 10,000 x 10,000 square matrix with a significantly lower memory footprint. Applications of efficient matrix multiplication include, among others, accelerating the training and implementation of deep convolutional neural network-based workloads, and thus such efficient algorithms can play a ground-breaking role in the faster and more efficient execution of even the most complicated machine learning tasks.

JAMPI: Efficient Matrix Multiplication in Spark Using Barrier Execution Mode

Journal

BIG DATA AND COGNITIVE COMPUTING

Publisher

MDPI

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

JAMPI: Efficient Matrix Multiplication in Spark Using Barrier Execution Mode

Journal

BIG DATA AND COGNITIVE COMPUTING

Publisher

MDPI

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper