4.8 Article

Clustering of Data Streams With Dynamic Gaussian Mixture Models: An IoT Application in Industrial Processes

Journal

IEEE INTERNET OF THINGS JOURNAL
Volume 5, Issue 5, Pages 3533-3547

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/JIOT.2018.2840129

Keywords

Concept drift; data stream; dynamic clustering; Gaussian mixture models (GMM); industrial Internet of Things (IIoT)

Funding

  1. Spanish Centre for the Development of Industrial Technology through LearnIIoT Project [IDI-20180156]
  2. Spanish Ministry of Economy and Competitiveness [TIN2016-79684-P]
  3. Regional Government of Madrid [S2013/ICE-2845-CASICAM-CM]
  4. Fundacion BBVA

Ask authors/readers for more resources

In industrial Internet of Things applications with sensors sending dynamic process data at high speed, producing actionable insights at the right time is challenging. A key problem concerns processing a large amount of data, while the underlying dynamic phenomena related to the machine is possibly evolving over time due to factors, such as degradation. This makes any actionable model become obsolete and necessary to be updated. To cope with this problem, in this paper we propose a new unsupervised learning algorithm based on Gaussian mixture models called Gaussian-based dynamic probabilistic clustering (GDPC) mainly based on integrating and adapting three well known algorithms for use in dynamic scenarios: the expectationmaximization (EM) algorithm to estimate the model parameters and the Page-Hinkley test and Chernoff bound to detect concept drifts. Unlike other unsupervised methods, the model induced by the GDPC provides the membership probabilities of each instance to each cluster. This allows to determine, through a Brier score analysis, the robustness of the instance assignment and its evolution each time a concept drift is detected. Also, the algorithm works with very little data and significantly less computing power being able to decide whether (and when) to change the model. The algorithm is tested using synthetic data and data streams from an industrial testbed, where different operational states are automatically identified, giving good results in terms of classification accuracy, sensitivity, and specificity.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available