4.6 Article

FlexSketch: Estimation of Probability Density for Stationary and Non-Stationary Data Streams

Journal

SENSORS
Volume 21, Issue 4, Pages -

Publisher

MDPI
DOI: 10.3390/s21041080

Keywords

probability density estimation; streaming data; sensor system

Funding

  1. Samsung Science and Technology Foundation [SSTF-BA1501-52]
  2. Samsung Research Funding & Incubation Center of Samsung Electronics [SRFC-IT1801-10]

Ask authors/readers for more resources

Efficient and accurate estimation of probability distribution for non-stationary data streams is a crucial problem in sensor systems, requiring agile adaptation for concept drift. The proposed FlexSketch algorithm utilizes an ensemble of histograms to generate probability distribution, swiftly detecting and responding to concept drift, achieving high update speed and accuracy with limited memory. Experimental results show improved speed and accuracy compared to existing methods for both stationary and non-stationary data streams.
Efficient and accurate estimation of the probability distribution of a data stream is an important problem in many sensor systems. It is especially challenging when the data stream is non-stationary, i.e., its probability distribution changes over time. Statistical models for non-stationary data streams demand agile adaptation for concept drift while tolerating temporal fluctuations. To this end, a statistical model needs to forget old data samples and to detect concept drift swiftly. In this paper, we propose FlexSketch, an online probability density estimation algorithm for data streams. Our algorithm uses an ensemble of histograms, each of which represents a different length of data history. FlexSketch updates each histogram for a new data sample and generates probability distribution by combining the ensemble of histograms while monitoring discrepancy between recent data and existing models periodically. When it detects concept drift, a new histogram is added to the ensemble and the oldest histogram is removed. This allows us to estimate the probability density function with high update speed and high accuracy using only limited memory. Experimental results demonstrate that our algorithm shows improved speed and accuracy compared to existing methods for both stationary and non-stationary data streams.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available