4.5 Article

Provable randomized rounding for minimum-similarity diversification

期刊

DATA MINING AND KNOWLEDGE DISCOVERY
卷 36, 期 2, 页码 709-738

出版社

SPRINGER
DOI: 10.1007/s10618-021-00811-2

关键词

Diversification; Recommender systems; Randomized rounding; Quadratic programming

资金

  1. Academy of Finland project AIDA [317085]
  2. Academy of Finland project MLDB [325117]
  3. ERC Advanced Grant REBOUND [834862]
  4. EC H2020 RIA project SoBigData++ [871042]
  5. Wallenberg AI, Autonomous Systems and Software Program (WASP) - Knut and Alice Wallenberg Foundation
  6. European Research Council (ERC) [834862] Funding Source: European Research Council (ERC)
  7. Academy of Finland (AKA) [325117, 325117] Funding Source: Academy of Finland (AKA)

向作者/读者索取更多资源

When searching for information in a data collection, it is often important to not only find relevant items but also assemble a diverse set to explore different concepts in the data. This paper addresses the problem of finding a diverse set of items when item relatedness is measured by a similarity function. The authors propose a new minimization objective and employ a randomized rounding strategy to find good solutions efficiently. They also introduce a novel bound for the ratio of Poisson-Binomial densities, which has applications beyond this problem. The proposed algorithm outperforms greedy approaches commonly used in the literature according to experiments on benchmark datasets.
When searching for information in a data collection, we are often interested not only in finding relevant items, but also in assembling a diverse set, so as to explore different concepts that are present in the data. This problem has been researched extensively. However, finding a set of items with minimal pairwise similarities can be computationally challenging, and most existing works striving for quality guarantees assume that item relatedness is measured by a distance function. Given the widespread use of similarity functions in many domains, we believe this to be an important gap in the literature. In this paper we study the problem of finding a diverse set of items, when item relatedness is measured by a similarity function. We formulate the diversification task using a flexible, broadly applicable minimization objective, consisting of the sum of pairwise similarities of the selected items and a relevance penalty term. To find good solutions we adopt a randomized rounding strategy, which is challenging to analyze because of the cardinality constraint present in our formulation. Even though this obstacle can be overcome using dependent rounding, we show that it is possible to obtain provably good solutions using an independent approach, which is faster, simpler to implement and completely parallelizable. Our analysis relies on a novel bound for the ratio of Poisson-Binomial densities, which is of independent interest and has potential implications for other combinatorial-optimization problems. We leverage this result to design an efficient randomized algorithm that provides a lower-order additive approximation guarantee. We validate our method using several benchmark datasets, and show that it consistently outperforms the greedy approaches that are commonly used in the literature.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据