期刊
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING
卷 129, 期 -, 页码 83-98出版社
ACADEMIC PRESS INC ELSEVIER SCIENCE
DOI: 10.1016/j.jpdc.2017.10.013
关键词
Sparse tensors; Decomposition; Parallel; Performance analysis; Canonical decomposition/parallel factorization; Alternating least squares
Tensor decomposition, the higher-order analogue to singular value decomposition, has emerged as a useful tool for finding relationships in large, sparse, multidimensional data. As this technique matures and is applied to increasingly larger data sets, the need for high performance implementations becomes critical. A better understanding of the performance characteristics of tensor decomposition on large and sparse tensors can help drive the development of such implementations. In this work, we perform an objective empirical evaluation of three state of the art parallel tools that implement the Canonical Decomposition/Parallel Factorization tensor decomposition algorithm using alternating least squares fitting (CP-ALS): SPLATT, DFacTo, and ENSIGN. We conduct performance studies across a variety of data sets and evaluate the tools with respect to total memory required, processor stall cycles, execution time, data distribution, and communication patterns. Furthermore, we investigate the performance of the implementations on tensors with up to 6 dimensions and when executing high rank decompositions. We find that tensor data structure layout and distribution choices can result in differences as large as 14.6x with respect to memory usage and 39.17x with respect to execution time. We provide an outline of a distributed heterogeneous CP-ALS implementation that addresses the performance issues we observe. Published by Elsevier Inc.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据