4.6 Article

Increasing the Huffman generation code algorithm to equalize compression ratio and time in lossless 16-bit data archiving

Journal

MULTIMEDIA TOOLS AND APPLICATIONS
Volume 82, Issue 16, Pages 24031-24068

Publisher

SPRINGER
DOI: 10.1007/s11042-022-14130-1

Keywords

Huffman; Compression; Binary; Lossless

Ask authors/readers for more resources

Compression is a crucial process in digitizing data, especially in the era of Big Data. Lossless compression reduces data size without losing any information, making it ideal for archiving files. Huffman's algorithm is an effective method for 8-bit data compression.
Compression is a process that is always carried out in terms of digitizing data, which is considered very important, especially in the development and growth of the Big Data era. Lossless compression is the process of reducing the size of the data but with the condition that it can be returned to its original source during the decompression process. One of the purposes of doing Lossless compression is to archive a file, usually the file is RAW and has a large file size with a minimum 16-Bit file system (65,536 possible differences in values). Huffman's algorithm is currently still very effective in compressing 8-bit data, which can be grouped into Static, Dynamic, and Adaptive extensions, but its performance cannot be determined if it is performed on data with complex variables and probabilities such as WAV format audio data. Based on a literature review, the compression performance measurement for file archiving uses the Compression Ratio (CR) and Compression Time (CT) indicators. This research resulted in a new scheme which we named 4-ary/MQ, the architectural basis of which is based on entropy coding rooted in the static, dynamic and adaptive variants of the Huffman scheme. For the variable code length characteristics, it follows the Quad Tree dynamic branching (FGK rule), the node symbol setting adopts an adaptive method, namely adding a maximum of 2 variables with a value of '0' to maintain the root of the branch after the root always has 4 branches. Based on descriptive analysis of compression results, deviation, average, ANOVA and DMRT, 4-ary/MQ produces optimal CR with fast CT when compared to various variants of the Huffman algorithm and other lossless compression applications such as (PKZIP, WinZip, 7-Zip, and Monkeys Audio). From the results of trial analysis based on manual mathematical and statistical calculations, it is certain that 4-ary/MQ provides high compression results with a very fast process, so it has many benefits if it is used to compress data on local storage media, hosting/cloud and bandwidth.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available