期刊
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
卷 30, 期 6, 页码 1136-1149出版社
IEEE COMPUTER SOC
DOI: 10.1109/TKDE.2017.2785326
关键词
NMF; block-wise updates; frequent updates; lazy updates; concurrent updates; MapReduce
类别
资金
- US National Science Foundation [CNS-1217284, CCF-1018114, CCF-1017828]
Nonnegative Matrix Factorization (NMF) has been applied with great success on a wide range of applications. As NMF is increasingly applied to massive datasets such as web-scale dyadic data, it is desirable to leverage a cluster of machines to store those datasets and to speed up the factorization process. However, it is challenging to efficiently implement NMF in a distributed environment. In this paper, we show that by leveraging a new form of update functions, we can perform local aggregation and fully explore parallelism. Therefore, the new form is much more efficient than the traditional form in distributed implementations. Moreover, under the new form of update functions, we can perform frequent updates and lazy updates, which aim to use the most recently updated data whenever possible and avoid unnecessary computations. As a result, frequent updates and lazy updates are more efficient than their traditional concurrent counterparts. Through a series of experiments on a local cluster as well as the Amazon EC2 cloud, we demonstrate that our implementations with frequent updates or lazy updates are up to two orders of magnitude faster than the existing implementation with the traditional form of update functions.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据