☆ 4.5 Article

Estimating Cardinality for Arbitrarily Large Data Stream With Improved Memory Efficiency

IEEE-ACM TRANSACTIONS ON NETWORKING (2020)

Journal

IEEE-ACM TRANSACTIONS ON NETWORKING

Volume 28, Issue 2, Pages 433-446

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TNET.2020.2970860

Keywords

Data streams; cardinality estimation; random hashing

Funding

National Natural Science Foundation of China [61872080]
Jiangsu Provincial Key Laboratory of Computer Networking Technology
National Science Foundation of the United States [STC-1562485, CNS-1719222]
Florida Cybersecurity Center

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Cardinality estimation is the task of determining the number of distinct elements (or the cardinality) in a data stream, under a stringent constraint that the input data stream can be scanned by just one single pass. This is a fundamental problem with many practical applications, such as traffic monitoring of high-speed networks and query optimization of Internet-scale database. To solve the problem, we propose an algorithm named HLL-TailCut, which implements the estimation standard error $1.0 / \sqrt {m}$ using the memory units of four or three bits each, whose cost is much smaller than the five-bit memory units used by HyperLogLog, the best previously known cardinality estimator. This makes it possible to reduce the memory cost of HyperLogLog by 20%-45%. For example, when the target estimation error is 1.1%, state-of-the-art HyperLogLog needs 5.6 kilobytes memory. By contrast, our new algorithm only needs 3 kilobytes memory consumption for attaining the same accuracy. Additionally, our algorithm is able to support the estimation of very large stream cardinalities, even on the Tera and Peta scale.

Estimating Cardinality for Arbitrarily Large Data Stream With Improved Memory Efficiency

Journal

IEEE-ACM TRANSACTIONS ON NETWORKING

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Estimating Cardinality for Arbitrarily Large Data Stream With Improved Memory Efficiency

Journal

IEEE-ACM TRANSACTIONS ON NETWORKING

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper