期刊
VLDB JOURNAL
卷 25, 期 4, 页码 449-472出版社
SPRINGER
DOI: 10.1007/s00778-016-0424-7
关键词
Data stream algorithms; Quantiles; Ordinary least squares
资金
- HKRGC [GRF-621413, GRF-16211614, GRF-16200415]
- Microsoft [MRA14EG05]
- European Research Council [ERC-2014-CoG 647557]
- Yahoo Faculty Research and Engagement Program
- Royal Society-Wolfson Research Merit Award
A fundamental problem in data management and analysis is to generate descriptions of the distribution of data. It is most common to give such descriptions in terms of the cumulative distribution, which is characterized by the quantiles of the data. The design and engineering of efficient methods to find these quantiles has attracted much study, especially in the case where the data are given incrementally, and we must compute the quantiles in an online, streaming fashion. While such algorithms have proved to be extremely useful in practice, there has been limited formal comparison of the competing methods, and no comprehensive study of their performance. In this paper, we remedy this deficit by providing a taxonomy of different methods and describe efficient implementations. In doing so, we propose new variants that have not been studied before, yet which outperform existing methods. To illustrate this, we provide detailed experimental comparisons demonstrating the trade-offs between space, time, and accuracy for quantile computation.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据