3.8 Proceedings Paper

An Effective Single-Pass Approach for Estimating the Φ-quantile in Data Streams

出版社

SPRINGER INTERNATIONAL PUBLISHING AG
DOI: 10.1007/978-3-030-95384-3_48

关键词

Quantile estimation; Phi-quantile; Data stream; Single pass

资金

  1. High level talent foundation of Henan University of Technology [2018BS057]

向作者/读者索取更多资源

This paper proposes a novel method for estimating the Phi-quantile item in a data stream. The method achieves high accuracy and requires minimal extra space by dividing the data stream into groups, selecting extreme values from each group, and summarizing them to obtain the estimated quantiles. The method allows for estimating the Phi-quantile in a single pass and ensures the expected rank of the result item matches the desired rank.
Random sampling is a common method to deal with large-scale data sets and in particular to deal with data streams. However, the accuracy of this method decreases greatly with the reduction of sampling size when dealing with quantile-related problems. It is known that Omega(epsilon(-1) log log delta(-1)) space are required, to estimate the rank of any query item up to additive error epsilon n with probability of at least 1 - delta, where n is the total number of items in the data stream. In many cases, however, we cannot predict the scale of a data stream in advance. This paper proposes a novel approach for estimating the Phi-quantile item in a data stream. The F-quantile result of the approach has high accuracy, and the required extra space is very little. Specifically, the general idea is as follows: firstly, we properly divide the items in the data stream into several groups, then the extreme values in each group are properly selected, and the estimated quantiles are finally obtained based on summarizing each extreme value. In the above process, we carefully calculate the proper number of groups, the extreme value of each group, and design the summarizing of the extreme values, to estimate the Phi-quantile in the data stream with a single-pass. At the same time, ensuring that the expected rank of the result item is exactly equal to the rank of the desired item. A preliminary analysis of the cost of the proposed approach is also given. Moreover, an extensive empirical study using synthetic data sets is reported to verify the effectiveness of the methods.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

3.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据