4.5 Article

Identifying and Estimating Persistent Items in Data Streams

Journal

IEEE-ACM TRANSACTIONS ON NETWORKING
Volume 26, Issue 6, Pages 2429-2442

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TNET.2018.2865125

Keywords

Persistent item; identification; estimation; Bloom filter; Raptor code

Funding

  1. National Natural Science Foundation of China [61502229, 61872178, 61373130, 61672276, 61472184, 61321491]
  2. Fundamental Research Funds for the Central Universities [021014380079]
  3. Key Research and Development Project of Jiangsu Province [BE2015154, BE2016120]
  4. Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing University
  5. National Science Foundation [CNS-1616273, CNS-1616317]
  6. Jiangsu Innovation and Entrepreneurship (Shuangchuang) Program

Ask authors/readers for more resources

This paper addresses the fundamental problem of finding persistent items and estimating the number of times each persistent item occurred in a given data stream during a given period of time at any given observation point. We propose a novel scheme, PIE, that can not only accurately identify each persistent item with a probability greater than any desired false negative rate (FNR), but can also accurately estimate the number of occurrences of each persistent item. The key idea of PIE is that it uses Raptor codes to encode the ID of each item that appears at the observation point during a measurement period and stores only a few bits of the encoded ID in the memory. The item that is persistent occurs in enough measurement periods that enough encoded bits for the ID can be retrieved from the observation point to decode them correctly and get the ID of the persistent item. To estimate the number of occurrences of any given persistent item, PIE uses maximum likelihood estimation-based statistical techniques on the information already recorded during the measurement periods. We implemented and evaluated PIE using three real network traffic traces and compared its performance with three prior schemes. Our results show that PIE not only achieves the desire FNR in every scenario, its average FNR can be 19.5 times smaller than the FNR of the adapted prior scheme. Our results also show that PIE achieves any desired success probability in estimating the number of occurrences of persistent items.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available