☆ 3.8 Proceedings Paper

An Effective Single-Pass Approach for Estimating the Φ-quantile in Data Streams

ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2021, PT I (2022)

期刊

ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2021, PT I

卷 13155, 期 -, 页码 775-789

出版社

SPRINGER INTERNATIONAL PUBLISHING AG

DOI: 10.1007/978-3-030-95384-3_48

关键词

Quantile estimation; Phi-quantile; Data stream; Single pass

类别

Computer Science, Software Engineering Computer Science, Theory & Methods

资金

High level talent foundation of Henan University of Technology [2018BS057]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This paper proposes a novel method for estimating the Phi-quantile item in a data stream. The method achieves high accuracy and requires minimal extra space by dividing the data stream into groups, selecting extreme values from each group, and summarizing them to obtain the estimated quantiles. The method allows for estimating the Phi-quantile in a single pass and ensures the expected rank of the result item matches the desired rank.

Random sampling is a common method to deal with large-scale data sets and in particular to deal with data streams. However, the accuracy of this method decreases greatly with the reduction of sampling size when dealing with quantile-related problems. It is known that Omega(epsilon(-1) log log delta(-1)) space are required, to estimate the rank of any query item up to additive error epsilon n with probability of at least 1 - delta, where n is the total number of items in the data stream. In many cases, however, we cannot predict the scale of a data stream in advance. This paper proposes a novel approach for estimating the Phi-quantile item in a data stream. The F-quantile result of the approach has high accuracy, and the required extra space is very little. Specifically, the general idea is as follows: firstly, we properly divide the items in the data stream into several groups, then the extreme values in each group are properly selected, and the estimated quantiles are finally obtained based on summarizing each extreme value. In the above process, we carefully calculate the proper number of groups, the extreme value of each group, and design the summarizing of the extreme values, to estimate the Phi-quantile in the data stream with a single-pass. At the same time, ensuring that the expected rank of the result item is exactly equal to the rank of the desired item. A preliminary analysis of the cost of the proposed approach is also given. Moreover, an extensive empirical study using synthetic data sets is reported to verify the effectiveness of the methods.

An Effective Single-Pass Approach for Estimating the Φ-quantile in Data Streams

期刊

ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2021, PT I

出版社

SPRINGER INTERNATIONAL PUBLISHING AG

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

An Effective Single-Pass Approach for Estimating the Φ-quantile in Data Streams

期刊

ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2021, PT I

出版社

SPRINGER INTERNATIONAL PUBLISHING AG

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文