Skip to main content
Log in

Increasing the Huffman generation code algorithm to equalize compression ratio and time in lossless 16-bit data archiving

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Compression is a process that is always carried out in terms of digitizing data, which is considered very important, especially in the development and growth of the Big Data era. Lossless compression is the process of reducing the size of the data but with the condition that it can be returned to its original source during the decompression process. One of the purposes of doing Lossless compression is to archive a file, usually the file is RAW and has a large file size with a minimum 16-Bit file system (65,536 possible differences in values). Huffman’s algorithm is currently still very effective in compressing 8-bit data, which can be grouped into Static, Dynamic, and Adaptive extensions, but its performance cannot be determined if it is performed on data with complex variables and probabilities such as WAV format audio data. Based on a literature review, the compression performance measurement for file archiving uses the Compression Ratio (CR) and Compression Time (CT) indicators. This research resulted in a new scheme which we named 4-ary/MQ, the architectural basis of which is based on entropy coding rooted in the static, dynamic and adaptive variants of the Huffman scheme. For the variable code length characteristics, it follows the Quad Tree dynamic branching (FGK rule), the node symbol setting adopts an adaptive method, namely adding a maximum of 2 variables with a value of ‘0’ to maintain the root of the branch after the root always has 4 branches. Based on descriptive analysis of compression results, deviation, average, ANOVA and DMRT, 4-ary/MQ produces optimal CR with fast CT when compared to various variants of the Huffman algorithm and other lossless compression applications such as (PKZIP, WinZip, 7-Zip, and Monkeys Audio). From the results of trial analysis based on manual mathematical and statistical calculations, it is certain that 4-ary/MQ provides high compression results with a very fast process, so it has many benefits if it is used to compress data on local storage media, hosting/cloud and bandwidth.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data availability

Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.

References

  1. Alakuijala J, Vandevenne L (2013) Data compression using Zopfli. Google Inc. Css-Ig.Net. Available at: http://css-ig.net/data/zopfli-deflate-in.pdf

  2. Alakuijala J, Kliuchnikov E, Szabadka Z, Vandevenne L (2016) Comparison of Brotli, deflate, Zopfli, LZMA, LZHAM and Bzip2 compression algorithms. Google Inc. Css-Ig.Net

  3. Alakuijala J et al (2019) Brotli. ACM Trans Inf Syst 37(1):1–30 Available at: http://dl.acm.org/citation.cfm?doid=3289475.3231935

    Article  Google Scholar 

  4. Al-Bahadili H, Hussain SM (2010) A bit-level text compression scheme based on the ACW algorithm. Int J Autom Comput 7(1):123–131. https://doi.org/10.1007/s11633-010-0123-6

    Article  Google Scholar 

  5. Ali AS, Hussein AS, Tolba MF, Yousef AH (2011) Large-scale vector data visualization using high performance computing. J Software 6(2):298–305. https://doi.org/10.4304/jsw.6.2.298-305

  6. Arshad R, Saleem A, Khan D (2016) Performance comparison of Huffman Coding and Double Huffman Coding. In: 2016 Sixth International Conference on Innovative Computing Technology (INTECH). IEEE, pp 361–364

  7. Baer MB (2006) A general framework for codes involving redundancy minimization. IEEE Trans Inf Theory 52(1):344–349 Available at: http://www.indianjournals.com/ijor.aspx?target=ijor:quest&volume=13&issue=1&article=005

    Article  MathSciNet  MATH  Google Scholar 

  8. Benjamin A (2010) Music compression algorithms and why you should care. "Whitepaper: http://ese.wustl.edu/ContentFiles/Research/UndergraduateResearch/CompletedProjects/Web-Pages/su10/AlexBenjamin_AudioCompression.pdf. Created 9 Dec 2010

  9. Bentley JL, Sleator DD, Tarjan RE, Wei VK (1986) A locally adaptive data compression scheme. Commun ACM 29(4):320–330 Available at: http://portal.acm.org/citation.cfm?doid=5684.5688

    Article  MathSciNet  MATH  Google Scholar 

  10. Bosch JJ, Janer J, Fuhrmann F, Herrera P (2012) A comparison of sound segregation techniques for predominant instrument recognition in musical audio signals. Proceedings of the 13th International Society for Music Information Retrieval Conference, ISMIR 2012, (Ismir), pp 559–564. Available at: http://mtg.upf.es/system/files/publications/Bosch-ISMIR2012.pdf

  11. Bosch JJ, Marxer R, Gómez E (2016) Evaluation and combination of pitch estimation methods for melody extraction in symphonic classical music. J New Music Res 45(2):101–117. https://doi.org/10.1080/09298215.2016.1182191

    Article  Google Scholar 

  12. Breaban MC, Graur A, Potorac AD, Balan DG (2018) Bandwidth management application in directory service environment. 2018 international conference on development and application systems (DAS). May 2018 IEEE, pp. 88–92

  13. Bryukhanov YA, Lukashevich YA (2017) Nonlinear distortion modulating signals in quantization. 2017 IEEE East-West Design & Test Symposium (EWDTS). IEEE, pp 1–5

  14. Calim AS, Kaya C, Yuksel H (2021) Big data based archiving management system. In: 2021 6th International Conference on Computer Science and Engineering (UBMK). IEEE, pp 21–26

  15. Chambliss D, Pandey P, Thakur T, Fleshler A, Clark T, Ruddy JA, Gougherty KD, Kalos M, Merithew L, Thompson JG, Yudenfriend HM (2008) An architecture for storage-hosted application extensions. IBM J Res Dev 52(4–5):427–437. https://doi.org/10.1147/rd.524.0427

  16. Chen HC, Wang YL, Lan YF (1999) A memory-efficient and fast Huffman decoding algorithm. Inf Process Lett 69(3):119–122

    Article  MathSciNet  MATH  Google Scholar 

  17. Chowdhury RA, Kaykobad M, King I (2002) An efficient decoding technique for Huffman codes. Inf Process Lett 81(6):305–308.https://doi.org/10.1016/S0020-0190(01)00243-5

  18. Chung K (1997) Efficient Huffman decoding. Inf Process Lett 61:97–99

    Article  MathSciNet  MATH  Google Scholar 

  19. Djusdek DF, Studiawan H, Ahmad T (2016) Adaptive image compression using adaptive Huffman and LZW. In: 2016 international conference on Information & Communication Technology and systems (ICTS). IEEE, pp. 101–106

  20. Elakkiya S, Thivya KS (2022) Comprehensive review on lossy and lossless compression techniques. Journal of The Institution of Engineers (India): Series B, 103(3):1003–1012. Available at: https://link.springer.com/10.1007/s40031-021-00686-3

  21. Faller N (1973) An adaptive system for data compression. In record of the 7th Asilomar conference on circuits, systems, and computers. 1973 pp. 593–597

  22. Fenwick PM (1995) Huffman code efficiencies for extensions of sources. IEEE transactions on communications 43(2/3/4):163–165 Available at: http://ieeexplore.ieee.org/document/380027/

  23. Fonseca E et al. (2019) Learning sound event classifiers from web audio with Noisy labels. ICASSP, IEEE international conference on acoustics, speech and signal processing - proceedings, 2019-may(688382), pp.21–25. https://doi.org/10.1109/ICASSP.2019.8683158

  24. Fruchtman A, Gross Y, Klein ST, Shapira D (2020) Weighted adaptive Huffman coding. 2020 data compression conference (DCC). March 2020 IEEE, pp. 368–368

  25. Gallager R (1978) Variations on a theme by Huffman. IEEE Trans Inf Theory 24(6):668–674. https://doi.org/10.1109/TIT.1978.1055959

  26. Guido RC (2016) ZCR-aided neurocomputing: a study with applications. Knowl-Based Syst 105:248–269. https://doi.org/10.1016/j.knosys.2016.05.011

  27. Gupta A, Bansal A, Khanduja V (2017) Modern lossless compression techniques: review, comparison and analysis. In: 2017 second international conference on electrical, computer and communication technologies (ICECCT). IEEE, pp 1–8

  28. Habib A, Rahman MS (2017) Balancing decoding speed and memory usage for Huffman codes using quaternary tree. Appl Inf 4:1–5 Available at: http://applied-informatics-j.springeropen.com/articles/10.1186/s40535-016-0032-z

    Google Scholar 

  29. Habib A, Hoque ASML, Hussain MR (2013) H-HIBASE: compression enhancement of HIBASE technique using Huffman coding. J Comput 8(5):1175–1183 Available at: http://ojs.academypublisher.com/index.php/jcp/article/view/8706

    Article  Google Scholar 

  30. Habib A, Islam MJ, Rahman MS (2018) Huffman based code generation algorithms: data compression perspectives. J Comput Sci 14(12):1599–1610. https://doi.org/10.3844/jcssp.2018.1599.1610

    Article  Google Scholar 

  31. Hashemian R (1995) Memory efficient and high-speed search Huffman coding. IEEE Trans Commun 43(10):2576–2581. https://doi.org/10.1109/26.469442

  32. Hermassi H, Rhouma R, Belghith S (2010) Joint compression and encryption using chaotically mutated Huffman trees. Commun Nonlinear Sci Numer Simul 15(10):2987–2999 Available at: https://linkinghub.elsevier.com/retrieve/pii/S1007570409006108

    Article  MathSciNet  MATH  Google Scholar 

  33. Hidayat T (2019) Reformat the file uncompressed into lossy based on audio compression method using Huffman shift coding scheme. International Journal of Advanced Trends in Computer Science and Engineering (IJATCSE) 8(1.5):317–326. https://doi.org/10.30534/ijatcse/2019/5381.52019

  34. Hidayat T, Zakaria MH, Pee ANC (2018) Lossless coding scheme for data audio 2 channel using huffman and shannon-Fano. J Theor Appl Inf Technol 96(11):3467–3477

    Google Scholar 

  35. Hidayat T, Zakaria MH, Che Pee N (2019a) A critical assessment of advanced coding standards for lossless audio compression. Int J Simul: Syst, Sci Technol 19(5):31.1–31.10. https://doi.org/10.5013/IJSSST.a.19.05.31

    Article  Google Scholar 

  36. Hidayat T, Zakaria MH, Che Pee N (2019b) Comparison of lossless compression schemes for WAV audio data 16-bit between Huffman and coding arithmetic. Int J Simul: Syst, Sci Technol 19(6):36.1–36.7. https://doi.org/10.5013/IJSSST.a.19.06.36

    Article  Google Scholar 

  37. Hidayat T, Zakaria MH, Pee ANC (2020) Survey of performance measurement indicators for lossless compression technique based on the objectives. 2020 3rd international conference on information and communications technology (ICOIACT). IEEE, pp 170–175

  38. Huffman D (1952) A method for the construction of minimum-redundancy codes. Proc IRE 40(9):1098–1101. https://doi.org/10.1007/BF02839372

    Article  MATH  Google Scholar 

  39. Hussain AJ, Al-Fayadh A, Radi N (2018) Image compression techniques: a survey in lossless and lossy algorithms. Neurocomputing 300:44–69. https://doi.org/10.1016/j.neucom.2018.02.094

    Article  Google Scholar 

  40. Islam MT, Ema RR, Islam MR, Islam T (2019) A lossless bit isolation algorithm for data compression by using static dictionary. In: 2019 international conference on computer, communication, chemical, materials and electronic engineering (IC4ME2), IEEE, pp 1–5

  41. Johnson PD Jr, Harris GA, Hankerson DC (2003) Introduction to Information Theory and Data Compression, Chapman and Hall/CRC

  42. Katona G, Nemetz O (1976) Huffman codes and self-information. IEEE Trans Inf Theory 22(3):337–340. https://doi.org/10.1109/TIT.1976.1055554

  43. Kaur M (2015) Lossless text data compression algorithm using modified Huffman coding - a review. Int J Advan Technol Eng Sci 4(4):1273–1276

    Google Scholar 

  44. Kavousianos X, Kalligeros E, Nikolos D (2008) Test data compression based on variable-to-variable Huffman encoding with Codeword reusability. IEEE Trans Comput-Aided Design Integra Circ Syst 27(7):1333–1338 Available at: http://ieeexplore.ieee.org/document/4544866/

    Article  Google Scholar 

  45. Knuth DE (1985) Dynamic huffman coding. J Algorithms 6(2):163–180 Available at: https://linkinghub.elsevier.com/retrieve/pii/0196677485900367

    Article  MathSciNet  MATH  Google Scholar 

  46. Kodituwakku SR, Amarasinghe US (2010) Comparison of lossless data compression algorithms for text data. Indian J Comput Sci Eng 1(4):416–425

    Google Scholar 

  47. Li S, Lian G, Peng Z (2015) Research on the loss-less compression algorithm of the ultrasonic testing of rail. In: 2015 Symposium on Piezoelectricity, Acoustic Waves, and Device Applications (SPAWDA). IEEE, pp. 458–461

  48. Li Q, Li H, Zhang K (2019) Storage performance prediction. In: 2019 IEEE 10th international conference on software engineering and service science (ICSESS). IEEE, pp. 1–4

  49. Lin YK, Huang SC, Yang CH (2012) A fast algorithm for Huffman decoding based on a recursion Huffman tree. J Syst Softw 85(4):974–980. https://doi.org/10.1016/j.jss.2011.11.1019

    Article  Google Scholar 

  50. McAnlis C, Haecky A (2016). Understanding compression: data compression for modern developers 1st ed. O’Reilly Media, Inc.

  51. Mentzer F, Tschannen M (2020) Learning better lossless compression using lossy compression. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.6638--6647.

  52. Metwally A, Agrawal D, El Abbadi A (2004) Efficient computation of frequent and top-k elements in data streams. In: Proceedings of the 10th international conference on database theory, pp 398–412

  53. Mustafa R (2017) An improved decoding technique for efficient Huffman coding. J Comput Sci Appl Info Technol 2(1):1–5 Available at: http://symbiosisonlinepublishing.com/computer-science-technology/computerscience-information-technology10.php

    MathSciNet  Google Scholar 

  54. Nam LH, Hung PD (2021) Building a big data oriented architecture for Enterprise integration. In: pp. 172–182

  55. Nobre ICM, Frossard P (2019) Optimized quantization in distributed graph signal processing. ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp. 5376–5380

  56. Oommen BJ, Rueda L (2004) A new family of weak estimators for training in non-stationary distributions, pp 644–652.https://doi.org/10.1007/978-3-540-27868-9_70

  57. Paakkonen J et al (2019) File size distributions and caching for offloading. IEEE workshop on signal processing advances in wireless communications. SPAWC. pp.1–5

  58. Pal R (2021) Speech compression with wavelet transform and Huffman coding. 2021 international conference on communication information and computing technology (ICCICT). 25 June 2021 IEEE, pp. 1–4

  59. Ramya KA, Pushpa M (2016) A survey on lossless and Lossy data compression methods. Int J Comput Sci Eng Commun 4(1):1277–1280

    Google Scholar 

  60. Salomon D, Motta G (2010) Handbook of data compression. Springer London, London

    Book  MATH  Google Scholar 

  61. Sayood K (2018) Huffman coding. Introduction to Data Compression. Elsevier, pp 41–88

  62. Schack  R The length of a typical Huffman codeword. IEEE Trans Inf Theory 40(4) 1246–1247. https://doi.org/10.1109/18.335944

  63. Shannon CE (1948) A mathematical theory of communication. Bell Syst Techn J 27(3):379–423 Available at: http://cm.bell-labs.com/cm/ms/what/shannonday/shannon1948.pdf

    Article  MathSciNet  MATH  Google Scholar 

  64. Shao Z et al (2022) A high-throughput VLSI architecture design of canonical Huffman encoder. IEEE Trans Circ Syst II: Express Briefs 69(1):209–213 (Available at: https://ieeexplore.ieee.org/document/9462403/)

  65. Sharma M (2010) Compression using Huffman coding. IJCSNS Int J Comput Sci Network Sec 10(5):133 Available at: http://paper.ijcsns.org/07_book/201005/20100520.pdf

    Google Scholar 

  66. Singh R, Bikramjeet B (2013) A survey on different compression techniques and bit reduction algorithm for compression of text/lossless data. Int J Advanc Res Comput Sci Software Eng 3(3):2277–2128

    Google Scholar 

  67. Snyder D, Chen G, Povey D (2015) MUSAN: A Music, speech, and noise corpus (October). Available at: http://arxiv.org/abs/1510.08484, https://doi.org/10.48550/arXiv.1510.08484

  68. Suri PR, Goel M (2011) Ternary tree and memory-efficient Huffman decoding algorithm. Int J Comput Sci Issues 8(1):483–489

    Google Scholar 

  69. Szpankowski W, Verdu S (2011) Minimum expected length of fixed-to-variable lossless compression without prefix constraints. IEEE Trans Inf Theory 57(7):4017–4025 Available at: https://ieeexplore.ieee.org/document/5895099/

  70. Trum C (2020) Realizing the benefits of digitalization through standards. Nat Gas Electric 36(7):9–17. https://doi.org/10.1002/gas.22157

    Article  Google Scholar 

  71. Uthayakumar J, Vengattaraman T, Dhavachelvan P (2018) A survey on data compression techniques: from the perspective of data quality, coding schemes, data type and applications. J King Saud Univ - Comput Info Sci 33:119–140. https://doi.org/10.1016/j.jksuci.2018.05.006

    Article  Google Scholar 

  72. Vitter JS (1987) Algorithm 673: dynamic Huffman coding. ACM Trans Math Software (TOMS) 15(2):158–167

    Article  MATH  Google Scholar 

  73. Welch (1984) A technique for high-performance data compression. Computer 17(6):8–19. https://doi.org/10.1109/MC.1984.1659158

  74. Ziv J, Lempel A (1977) A universal algorithm for sequential data compression. IEEE Trans Inf Theory 23(3):337–343 Available at: http://ieeexplore.ieee.org/document/1055714/

    Article  MathSciNet  MATH  Google Scholar 

  75. Zuo L, Zhu MM (2020) Minimize cost of data transfers using bandwidth reservation on FPVB paths of dynamic HPNs. 2020 international conference on computing, networking and communications (ICNC). February 2020 IEEE. pp. 74–78

Download references

Acknowledgements

This paper is an ongoing research on more in-depth lossless compression technique for data 16-bit files. Gratitude to Universitas Amikom Yogyakarta and Universiti Teknikal Malaysia Melaka, and everyone who provided substantial support in this study.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tonny Hidayat.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest and certify NO affiliations with or involvement in any organization or entity with any financial interest (such as honoraria; educational grants; participation in speakers’ bureaus; membership, employment, consultancies, stock ownership, or other equity interest; and expert testimony or patent-licensing arrangements), or non-financial interest (such as personal or professional relationships, affiliations, knowledge or beliefs) in the subject matter or materials discussed in this manuscript.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hidayat, T., Zakaria, M.H. & Pee, A.N.C. Increasing the Huffman generation code algorithm to equalize compression ratio and time in lossless 16-bit data archiving. Multimed Tools Appl 82, 24031–24068 (2023). https://doi.org/10.1007/s11042-022-14130-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-14130-1

Keywords

Navigation