4.4 Article

MapReduce and Streaming Algorithms for Diversity Maximization in Metric Spaces of Bounded Doubling Dimension

期刊

PROCEEDINGS OF THE VLDB ENDOWMENT
卷 10, 期 5, 页码 469-480

出版社

ASSOC COMPUTING MACHINERY
DOI: 10.14778/3055540.3055541

关键词

-

资金

  1. MIUR of Italy
  2. University of Padova [CPDA152255/15]
  3. NSF [IIS-1247581]
  4. NIH [R01-CA180776]
  5. Direct For Computer & Info Scie & Enginr
  6. Div Of Information & Intelligent Systems [1247581] Funding Source: National Science Foundation

向作者/读者索取更多资源

Given a dataset of points in a metric space and an integer k, a diversity maximization problem requires determining a subset of k points maximizing some diversity objective measure, e.g., the minimum or the average distance between two points in the subset. Diversity maximization is computationally hard, hence only approximate solutions can be hoped for. Although its applications are mainly in massive data analysis, most of the past research on diversity maximization focused on the sequential setting. In this work we present space and pass/round-efficient diversity maximization algorithms for the Streaming and MapReduce models and analyze their approximation guarantees for the relevant class of metric spaces of bounded doubling dimension. Like other approaches in the literature, our algorithms rely on the determination of high-quality core-sets, i.e., (much) smaller subsets of the input which contain good approximations to the optimal solution for the whole input. For a variety of diversity objective functions, our algorithms attain an (alpha+epsilon)approximation ratio, for any constant epsilon > 0, where alpha is the best approximation ratio achieved by a polynomial-time, linear-space sequential algorithm for the same diversity objective. This improves substantially over the approximation ratios attainable in Streaming and MapReduce by state-of-the-art algorithms for general metric spaces. We provide extensive experimental evidence of the effectiveness of our algorithms on both real world and synthetic datasets, scaling up to over a billion points.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.4
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据