4.8 Article

GLEAN: Generalized-Deduplication-Enabled Approximate Edge Analytics

Journal

IEEE INTERNET OF THINGS JOURNAL
Volume 10, Issue 5, Pages 4006-4020

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/JIOT.2022.3166455

Keywords

Index Terms-Approximate analytics; data compression; edge computing; Internet of Things (IoT)

Ask authors/readers for more resources

The Internet of Things (IoT) has led to a significant increase in sensor data, necessitating efficient and novel solutions for data transmission, storage, and analytics in sustainable IoT ecosystems. This article presents a thorough stress test of existing methods for direct analytics of generalized deduplication (GD) compressed data and identifies the need for optimization. A new version of GD is developed, and a framework called generalized deduplication-enabled approximate edge analytics (GLEAN) is proposed to address challenges related to data transmission, storage, and analytics in the IoT. Impressive analytics performance is achieved with GLEAN, offering a median increase in clustering error of only 2% relative to uncompressed data, while being significantly faster and requiring less storage at the Edge server compared to universal compressors.
The Internet of Things (IoT) has brought about exponential growth in sensor data. This has led to increasing demands for efficient and novel data transmission, storage, and analytics solutions for sustainable IoT ecosystems. It has been shown that the generalized deduplication (GD) compression algorithm offers not only competitive compression ratio and throughput but also random access properties that enable direct analytics of compressed data. In this article, we thoroughly stress test existing methods for direct analytics of GD compressed data with a diverse collection of 103 data sets, identify the need to optimize GD for analytics, and develop a new version of GD to this end. We also propose the generalized deduplication-enabled approximate edge analytics (GLEAN) framework. This framework applies the aforementioned analytics techniques at the Edge server to deliver end-to-end lossless data compression and high-quality Edge analytics in the IoT, thereby addressing challenges related to data transmission, storage, and analytics. Impressive analytics performance was achieved using this framework, with a median increase in $k$ -means clustering error of just 2% relative to analytics performed on uncompressed data, while running $7.5\times $ faster and requiring $3.9\times $ less storage at the Edge server compared to universal compressors.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available