Journal
IEEE-ACM TRANSACTIONS ON NETWORKING
Volume 28, Issue 2, Pages 433-446Publisher
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TNET.2020.2970860
Keywords
Data streams; cardinality estimation; random hashing
Categories
Funding
- National Natural Science Foundation of China [61872080]
- Jiangsu Provincial Key Laboratory of Computer Networking Technology
- National Science Foundation of the United States [STC-1562485, CNS-1719222]
- Florida Cybersecurity Center
Ask authors/readers for more resources
Cardinality estimation is the task of determining the number of distinct elements (or the cardinality) in a data stream, under a stringent constraint that the input data stream can be scanned by just one single pass. This is a fundamental problem with many practical applications, such as traffic monitoring of high-speed networks and query optimization of Internet-scale database. To solve the problem, we propose an algorithm named HLL-TailCut, which implements the estimation standard error $1.0 / \sqrt {m}$ using the memory units of four or three bits each, whose cost is much smaller than the five-bit memory units used by HyperLogLog, the best previously known cardinality estimator. This makes it possible to reduce the memory cost of HyperLogLog by 20%-45%. For example, when the target estimation error is 1.1%, state-of-the-art HyperLogLog needs 5.6 kilobytes memory. By contrast, our new algorithm only needs 3 kilobytes memory consumption for attaining the same accuracy. Additionally, our algorithm is able to support the estimation of very large stream cardinalities, even on the Tera and Peta scale.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available