A survey and benchmark evaluation for neural-network-based lossless universal compressors toward multi-source data

Sun, Hui; Ma, Huidong; Ling, Feng; Xie, Haonan; Sun, Yongxia; Yi, Liping; Yan, Meng; Zhong, Cheng; Liu, Xiaoguang; Wang, Gang

doi:10.1007/s11704-024-40300-5

A survey and benchmark evaluation for neural-network-based lossless universal compressors toward multi-source data

Review Article
Open access
Published: 21 March 2025

Volume 19, article number 197360, (2025)
Cite this article

Download PDF

You have full access to this open access article

Frontiers of Computer Science Aims and scope Submit manuscript

A survey and benchmark evaluation for neural-network-based lossless universal compressors toward multi-source data

Download PDF

Hui Sun^1,2^na1,
Huidong Ma¹^na1,
Feng Ling¹,
Haonan Xie³,
Yongxia Sun¹,
Liping Yi^1,2,
Meng Yan¹,
Cheng Zhong⁴,
Xiaoguang Liu¹ &
…
Gang Wang¹

304 Accesses
Explore all metrics

Abstract

As various types of data grow explosively, large-scale data storage, backup, and transmission become challenging, which motivates many researchers to propose efficient universal compression algorithms for multi-source data. In recent years, due to the emergence of hardware acceleration devices such as GPUs, TPUs, DPUs, and FPGAs, the performance bottleneck of neural networks (NN) has been overcome, making NN-based compression algorithms increasingly practical and popular. However, the research survey for the NN-based universal lossless compressors has not been conducted yet, and there is also a lack of unified evaluation metrics. To address the above problems, in this paper, we present a holistic survey as well as benchmark evaluations. Specifically, i) we thoroughly investigate NN-based lossless universal compression algorithms toward multi-source data and classify them into 3 types: static pre-training, adaptive, and semi-adaptive. ii) We unify 19 evaluation metrics to comprehensively assess the compression effect, resource consumption, and model performance of compressors. iii) We conduct experiments more than 4600 CPU/GPU hours to evaluate 17 state-of-the-art compressors on 28 real-world datasets across data types of text, images, videos, audio, etc. iv) We also summarize the strengths and drawbacks of NN-based lossless data compressors and discuss promising research directions. We summarize the results as the NN-based Lossless Compressors Benchmark (NNLCB, See fahaihi.github.io/NNLCB website), which will be updated and maintained continuously in the future.

Article PDF

Neural Network Compression Framework for Fast Model Inference

Light Loss-Less Data Compression, with GPU Implementation

A Survey of Deep Neural Network Compression

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Schaller R R. Moore’s law: past, present and future. IEEE Spectrum, 1997, 34(6): 52–59
Article MATH Google Scholar
Rydning J. Global DataSphere, Data Marketplaces, and Data as a Service. See idc.com/getdoc.jsp?containerId=IDC_P38353 website, 2023
Google Scholar
Sun H, Ma H, Zheng Y, Xie H, Wang X, Liu X. SR2C: a structurally redundant short reads collapser for optimizing DNA data compression. In: Proceedings of the 29th IEEE International Conference on Parallel and Distributed Systems. 2023, 60–67
MATH Google Scholar
Ji Z, Zhou J R, Jiang L, Wu Q H. Overview of DNA sequence data compression techniques. Acta Electronica Sinica, 2010, 38(5): 1113–1121
MATH Google Scholar
Numanagić I, Bonfield J K, Hach F, Voges J, Ostermann J, Alberti C, Mattavelli M, Sahinalp S C. Comparison of high-throughput sequencing data compression tools. Nature Methods, 2016, 13(12): 1005–1008
Article Google Scholar
Kredens K V, Martins J V, Dordal O B, Ferrandin M, Herai R H, Scalabrin E E, Ávila B C. Vertical lossless genomic data compression tools for assembled genomes: a systematic literature review. PLoS One, 2020, 15(5): e0232942
Article Google Scholar
Sun H, Zheng Y, Xie H, Ma H, Liu X, Wang G. PMFFRC: a large-scale genomic short reads compression optimizer via memory modeling and redundant clustering. BMC Bioinformatics, 2023, 24(1): 454
Article Google Scholar
Sun H, Zheng Y, Xie H, Ma H, Zhong C, Yan M, Liu X, Wang G. PQSDC: a parallel lossless compressor for quality scores data via sequences partition and run-length prediction mapping. Bioinformatics, 2024, 40(5): btae323
Article Google Scholar
Ai D, Lu H Y, Yang Y R, Liu Y H, Lu J, Liu Y. A brief overview 3D point cloud data compression technology. Journal of Xi’an University of Posts and Telecommunications, 2021, 26(1): 90–96
MATH Google Scholar
Chen X, Tian J, Beaver I, Freeman C, Yan Y, Wang J, Tao D. FCBench: cross-domain benchmarking of lossless compression for floating-point data. Proceedings of the VLDB Endowment, 2024, 17(6): 1418–1431
Article MATH Google Scholar
Mishra D, Singh S K, Singh R K. Deep architectures for image compression: a critical review. Signal Processing, 2022, 191: 108346
Article MATH Google Scholar
Jamil S, Piran M J, Rahman M U, Kwon O J. Learning-driven lossy image compression: a comprehensive survey. Engineering Applications of Artificial Intelligence, 2023, 123: 106361
Article Google Scholar
Bourai N E H, Merouani H F, Djebbar A. Deep learning-assisted medical image compression challenges and opportunities: systematic review. Neural Computing and Applications, 2024, 36(17): 10067–10108
Article Google Scholar
Tian T, Wang H. Large-scale video compression: recent advances and challenges. Frontiers of Computer Science, 2018, 12(5): 825–839
Article MathSciNet MATH Google Scholar
Im S K, Ghandi M M. Improved rate-distortion optimized video coding using non-integer bit estimation and multiple Lambda search. Frontiers of Computer Science, 2016, 10(1): 157–166
Article MATH Google Scholar
Lasse C. The official website of the XZ compressor. See tukaani.org/xz/ website, 2015
MATH Google Scholar
Meta. Zstandard-Fast real-time compression algorithm. See facebook/zstd: Zstandard - Fast real-time compression algorithm website, 2024
Google Scholar
Google. Brotli compression format. See github.com/google/brotli website, 2024
Google Scholar
IlyaGrebnov. High performance block-sorting data compression library. See github.com/IlyaGrebnov/libbsc website, 2024
Google Scholar
Michael. szip homepage. See compressconsult.com/szip/ website, 2002
Julian S. The official website of the Bzip2 compressor. See sourceware.org/bzip2/ website, 2019
MATH Google Scholar
Mahoney M. Incremental journaling backup utility and archiver. See mattmahoney.net/dc/zpaq website, 2016
MATH Google Scholar
mathieuchartier. MCMfile compressor. See github.com/mathieuchartier/mcm website, 2016
Google Scholar
Margaritov. Hutter prize submission 2021a: STARLIT + cmix. See github.com/amargaritov/starlit website, 2021
Google Scholar
Aslanyurek M, Mesut A. A static dictionary-based approach to compressing short texts. In: Proceedings of the 6th International Conference on Computer Science and Engineering. 2021, 342–347
MATH Google Scholar
Kasneci E, Sessler K, Kuchemann S, Bannert M, Dementieva D, Fischer F, Gasser U, Groh G, Günnemann S, Hüllermeier E, Krusche S, Kutyniok G, Michaeli T, Nerdel C, Pfeffer J, Poquet O, Sailer M, Schmidt A, Seidel T, Stadler M, Weller J, Kuhn J, Kasneci G. ChatGPT for good? On opportunities and challenges of large language models for education. Learning and Individual Differences, 2023, 103: 102274
Article Google Scholar
Wei C, Wang Y C, Wang B, Kuo C C J. An overview on language models: recent developments and outlook. 2023, arXiv preprint arXiv: 2303.05759
Thirunavukarasu A J, Ting D S J, Elangovan K, Gutierrez L, Tan T F, Ting D S W. Large language models in medicine. Nature Medicine, 2023, 29(8): 1930–1940
Article MATH Google Scholar
Hochreiter S, Schmidhuber J. Long short-term memory. Neural Computation, 1997, 9(8): 1735–1780
Article MATH Google Scholar
Yu Y, Si X, Hu C, Zhang J. A review of recurrent neural networks: LSTM cells and network architectures. Neural Computation, 2019, 31(7): 1235–1270
Article MathSciNet MATH Google Scholar
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser L, Polosukhin I. Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, 6000–6010
Google Scholar
Huang Y, Xu J, Lai J, Jiang Z, Chen T, Li Z, Yao Y, Ma X, Yang L, Chen H, Li S, Zhao P. Advancing transformer architecture in long-context large language models: a comprehensive survey. 2024, arXiv preprint arXiv: 2311.12351
Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I. Language models are unsupervised multitask learners. OpenAI Blog, 2019, 1(8): 9
Google Scholar
Devlin J, Chang M W, Lee K, Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2018, 4171–4186
MATH Google Scholar
Gu A, Dao T. Mamba: linear-time sequence modeling with selective state spaces. 2024, arXiv preprint arXiv: 2312.00752
Beck M, Pöppel K, Spanring M, Auer A, Prudnikova O, Kopp M, Klambauer G, Brandstetter J, Hochreiter S. xLSTM: extended long short-term memory. 2024, arXiv preprint arXiv: 2405.04517
Mao Y, Cui Y, Kuo T W, Xue C J. TRACE: a fast transformer-based general-purpose lossless compressor. In: Proceedings of ACM Web Conference 2022. 2022, 1829–1838
Chapter MATH Google Scholar
Mao Y, Cui Y, Kuo T W, Xue C J. Accelerating general-purpose lossless compression via simple and scalable parameterization. In: Proceedings of the 30th ACM International Conference on Multimedia. 2022, 3205–3213
Chapter MATH Google Scholar
Mao Y, Li J, Cui Y, Xue J C. Faster and stronger lossless compression with optimized autoregressive framework. In: Proceedings of the 60th ACM/IEEE Design Automation Conference. 2023, 1–6
MATH Google Scholar
Zhong C, Sun H. Parallel algorithm for sensitive sequence recognition from long-read genome data with high error rate. Journal on Communications, 2023, 44(2): 160–171
MATH Google Scholar
Sayood K. Introduction to Data Compression. 5th ed. Sydney: Morgan Kaufmann, 2017
MATH Google Scholar
Shannon C E. A mathematical theory of communication. The Bell System Technical Journal, 1948, 27(3): 379–423
Article MathSciNet MATH Google Scholar
Moffat A. Huffman coding. ACM Computing Surveys (CSUR), 2019, 52(4): 85
MATH Google Scholar
Langdon G G. An introduction to arithmetic coding. IBM Journal of Research and Development, 1984, 28(2): 135–149
Article MathSciNet MATH Google Scholar
Ziv J, Lempel A. A universal algorithm for sequential data compression. IEEE Transactions on Information Theory, 1977, 23(3): 337–343
Article MathSciNet MATH Google Scholar
Schindler M. A fast block-sorting algorithm for lossless data compression. In: Proceedings of 1997 Data Compression Conference. 1997, 469
MATH Google Scholar
Capon J. A probabilistic model for run-length coding of pictures. IRE Transactions on Information Theory, 1959, 5(4): 157–163
Article MathSciNet MATH Google Scholar
Smith C A. A survey of various data compression techniques. International Journal of Recent Technology Engineering, 2010, 2(1): 1–20
MATH Google Scholar
Jayasankar U, Thirumal V, Ponnurangam D. A survey on data compression techniques: from the perspective of data quality, coding schemes, data type and applications. Journal of King Saud University-Computer and Information Sciences, 2021, 33(2): 119–140
Article Google Scholar
Chiarot G, Silvestri C. Time series compression survey. ACM Computing Surveys, 2023, 55(10): 1–32
Article MATH Google Scholar
Holtz K. The evolution of lossless data compression techniques. In: Proceedings of WESCON’ 93. 1993, 140–145
Chapter MATH Google Scholar
Kimura N, Latifi S. A survey on data compression in wireless sensor networks. In: Proceedings of International Conference on Information Technology: Coding and Computing. 2005, 8–13
MATH Google Scholar
Chew L W, Ang L M, Seng K P. Survey of image compression algorithms in wireless sensor networks. In: Proceedings of 2008 International Symposium on Information Technology. 2008, 1–9
MATH Google Scholar
Me S S, Vijayakuymar V R, Anuja R. A survey on various compression methods for medical images. International Journal of Intelligent Systems and Applications (IJISA), 2012, 4(3): 13–19
Article MATH Google Scholar
Hosseini M. Data compression algorithms and their applications. See scribd.com/document/77511910/Data-Compression-Algorithms-and-Their-Applications website, 2012
MATH Google Scholar
Srisooksai T, Keamarungsi K, Lamsrichan P, Araki K. Practical data compression in wireless sensor networks: a survey. Journal of Network and Computer Applications, 2012, 35(1): 37–59
Article Google Scholar
Sharma N, Kaur J, Kaur N. A review on various Lossless text data compression techniques. Research Cell: An International Journal of Engineering Sciences, 2014, 2: 58–63
MATH Google Scholar
Zhu Z, Zhang Y, Ji Z, He S, Yang X. High-throughput DNA sequence data compression. Briefings in Bioinformatics, 2015, 16(1): 1–15
Article MATH Google Scholar
Hernaez M, Pavlichin D, Weissman T, Ochoa I. Genomic data compression. Annual Review of Biomedical Data Science, 2019, 2: 19–37
Article Google Scholar
Kryukov K, Ueda M T, Nakagawa S, Imanishi T. Sequence compression benchmark (SCB) database—A comprehensive evaluation of reference-free compressors for FASTA-formatted sequences. GigaScience, 2020, 9(7): giaa072
Article Google Scholar
Gilmary R, Venkatesan A, Vaiyapuri G. Compression techniques for DNA sequences: a thematic review. Journal of Computing Science and Engineering, 2021, 15(2): 59–71
Article Google Scholar
Sun H, Ma H, Zheng Y, Xie H, Yan M, Zhong C. LRCB: a comprehensive benchmark evaluation of reference-free lossless compression tools for genomics sequencing long reads data. In: Proceedings of 2024 Data Compression Conference. 2024, 584
Chapter MATH Google Scholar
Singh B, Kaur A, Singh J. A review of ECG data compression techniques. International Journal of Computer Applications, 2015, 116(11): 39–44
Article MATH Google Scholar
Rajankar S O, Talbar S N. An electrocardiogram signal compression techniques: a comprehensive review. Analog Integrated Circuits and Signal Processing, 2019, 98(1): 59–74
Article Google Scholar
Kumar P, Parmar A. Versatile approaches for medical image compression: a review. Procedia Computer Science, 2020, 167: 1380–1389
Article MATH Google Scholar
Patidar G, Kumar S, Kumar D. A review on medical image data compression techniques. In: Proceedings of the 2nd International Conference on Data, Engineering and Applications. 2020, 1–6
MATH Google Scholar
Seeli D J J, Thanammal K K. A comparative review and analysis of medical image encryption and compression techniques. Multimedia Tools and Applications, 2024
Google Scholar
Wen L, Zhou K, Yang S, Li L. Compression of smart meter big data: a survey. Renewable and Sustainable Energy Reviews, 2018, 91: 59–69
Article MATH Google Scholar
Prokop K, Bien A, Barczentewicz S. Compression techniques for realtime control and non-time-critical big data in smart grids: a review. Energies, 2023, 16(24): 8077
Article MATH Google Scholar
Tcheou M P, Lovisolo L, Ribeiro M V, da Silva E A B, Rodrigues M A M, Romano J M T, Diniz P S R. The compression of electric signal waveforms for smart grids: state of the art and future trends. IEEE Transactions on Smart Grid, 2014, 5(1): 291–302
Article MATH Google Scholar
Sheltami T, Musaddiq M, Shakshuki E. Data compression techniques in wireless sensor networks. Future Generation Computer Systems, 2016, 64: 151–162
Article MATH Google Scholar
Sandhya Rani I, Venkateswarlu B. A systematic review of different data compression technique of cloud big sensing data. In: Proceedings of the 2nd International Conference on Computer Networks and Communication Technologies. 2020, 222–228
MATH Google Scholar
Ketshabetswe K L, Zungeru A M, Mtengi B, Lebekwe C K, Prabaharan S R S. Data compression algorithms for wireless sensor networks: a review and comparison. IEEE Access, 2021, 9: 136872–136891
Article Google Scholar
Correa J D A, Pinto A S R, Montez C. Lossy data compression for IoT sensors: a review. Internet of Things, 2022, 19: 100516
Article MATH Google Scholar
De Romarategui D G F. Compressing network data with deep learning. Universitat Politècnica de Catalunya, Dissertation, 2024
Google Scholar
Kaur R, Chana I, Bhattacharya J. Data deduplication techniques for efficient cloud storage management: a systematic review. The Journal of Supercomputing, 2018, 74(5): 2035–2085
Article MATH Google Scholar
Cappello F, Di S, Li S, Liang X, Gok A M, Tao D, Yoon C H, Wu X C, Alexeev Y, Chong F T. Use cases of lossy compression for floatingpoint data in scientific data sets. The International Journal of High Performance Computing Applications, 2019, 33(6): 1201–1220
Article Google Scholar
Schmidhuber J, Heil S. Sequential neural text compression. IEEE Transactions on Neural Networks, 1996, 7(1): 142–146
Article MATH Google Scholar
Goyal M, Tatwawadi K, Chandak S, Ochoa I. DZip: improved general-purpose lossless compression based on novel neural network modeling. In: Proceedings of 2020 Data Compression Conference. 2020, 372–372
Chapter Google Scholar
Mahoney M V. Fast text compression with neural networks. In: Proceedings of the 13th International Florida Artificial Intelligence Research Society Conference. 2000, 230–234
MATH Google Scholar
Delétang G, Ruoss A, Duquenne P A, Catt E, Genewein T, Mattern C, Grau-Moya J, Wenliang L K, Aitchison M, Orseau L, Hutter M, Veness J. Language modeling is compression. In: Proceedings of the 12th International Conference on Learning Representations. 2024
Google Scholar
Goyal M, Tatwawadi K, Chandak S, Ochoa I. DeepZip: lossless data compression using recurrent neural networks. In: Proceedings of 2019 Data Compression Conference. 2019, 575
Chapter Google Scholar
Liu Q, Xu Y, Li Z. DecMac: a deep context model for high efficiency arithmetic coding. In: Proceedings of 2019 International Conference on Artificial Intelligence in Information and Communication. 2019, 438–443
MATH Google Scholar
Valmeekam C S K, Narayanan K, Kalathil D, Chamberland J F, Shakkottai S. LLMZip: lossless text compression using large language models. 2023, arXiv preprint arXiv: 2306.04050
Byronknoll. Cmix. See github.com/byronknoll/cmix website, 2024
MATH Google Scholar
Bell T, Witten I H, Cleary J G. Modeling for text compression. ACM Computing Surveys (CSUR), 1989, 21(4): 557–591
Article MATH Google Scholar
Burrows M, Wheeler D J. A block-sorting lossless data compression algorithm. Palo Alto: Systems Research Center, 1994: 124
MATH Google Scholar
Mahoney M V. Adaptive weighing of context models for lossless data compression. See mattmahoney.net/dc/cs200516.pdf, 2005
MATH Google Scholar
Gailly J L, Adler M, GZip offical website. See gnu.org/software/gzip/manual/ website, 2023
Google Scholar
Roshal E. RAR offical website. See rarlab.com/ website, 2024
Google Scholar
LZMA2 Official Website. LZMA2. See 7-zip website, 2024
Boutell T. RFC2083: Png (portable network graphics) specification version 1.0. RFC Editor. See dl.acm.org/doi/pdf/10.17487/RFC2083, 1997
Google Scholar
Coalson J. Free lossless audio codec. See xiph.org/flac website, 2023
Google Scholar
Hoffmann J, Borgeaud S, Mensch A, Buchatskaya E, Cai T, Rutherford E, de Las Casas D, Hendricks L A, Welbl J, Clark A, Hennigan T, Noland E, Millican K, van den Driessche G, Damoc B, Guy A, Osindero S, Simonyan K, Elsen E, Rae J W, Vinyals O, Sifre L. Training compute-optimal large language models. 2022, arXiv preprint arXiv: 2203.15556
Zaheer M, Guruganesh G, Dubey A, Ainslie J, Alberti C, Ontanon S, Pham P, Ravula A, Wang Q, Yang L, Ahmed A. Big bird: Transformers for longer sequences. In: Proceedings of the 34th Conference on Neural Information Processing Systems. 2020, 17283–17297
Google Scholar
Mahoney M. Large text compression benchmark. See mattmahoney.net/dc/text website, 2006
MATH Google Scholar
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg A C, Fei-Fei L. Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 2015, 115(3): 211–252
Article MathSciNet Google Scholar
Panayotov V, Chen G, Povey D, Khudanpur S. Librispeech: an ASR corpus based on public domain audio books. In: Proceedings of 2015 IEEE International Conference on Acoustics, Speech and Signal processing. 2015, 5206–5210
MATH Google Scholar
Touvron H, Lavril T, Izacard G, Martinet X, Lachaux M A, Lacroix T, Rozière B, Goyal N, Hambro E, Azhar F, Rodriguez A, Joulin A, Grave E, Lample G. LlaMA: open and efficient foundation language models. 2023, arXiv preprint arXiv: 2302.13971
Gailly J L, Adler M. zlib. See zlib.net/ website, 2024
Google Scholar
Rhatushnyak A. PAQ8H. See mattmahoney.net/dc/paq website, 2006
Google Scholar
byronknoll. Lstm-compress. See github.com/byronknoll/lstm-compress website, 2017
Google Scholar
Bellard F. NNCP. See bellard.org/nncp/nncp website, 2019
Google Scholar
Likhosherstov V, Choromanski K, Davis J, Song X, Weller A. Sublinear memory: How to make performers slim. In: Proceedings of the 35th Conference on Neural Information Processing Systems. 2021, 6707–6719
Google Scholar
Knoll B. TensorFlow-compress. See github.com/byronknoll/tensorflow-compress website, 2020
Google Scholar
Ma Y, Yu D, Wu T, Wang H. PaddlePaddle: an open-source deep learning platform from industrial practice. Frontiers of Data & Computing, 2019, 1(1): 105–115
Google Scholar
Pang B, Nijkamp E, Wu Y N. Deep learning with TensorFlow: a review. Journal of Educational and Behavioral Statistics, 2020, 45(2): 227–248
Article MATH Google Scholar
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L, Desmaison A, Köpf A, Yang E, DeVito Z, Raison M, Tejani A, Chilamkurthy S, Steiner B, Fang L, Bai J, Chintala S. PyTorch: an imperative style, high-performance deep learning library. In: Proceedings of the 33rd Conference on Neural Information Processing Systems. 2019, 32
Google Scholar
Wang D, Cui W. An efficient graph data compression model based on the germ quotient set structure. Frontiers of Computer Science, 2022, 16(6): 166617
Article MATH Google Scholar
Xing Y, Li G, Wang Z, Feng B, Song Z, Wu C. GTZ: a fast compression and cloud transmission tool optimized for FASTQ files. BMC Bioinformatics, 2017, 18(16): 549
Article MATH Google Scholar
Deorowicz S. Silesia corpus. See github.com/MiloszKrajewski/Silesia Corpus website, 2018
Google Scholar
LeCun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 1998, 86(11): 2278–2324
Article MATH Google Scholar
Krizhevsky A. Learning multiple layers of features from tiny images. See cs.toronto.edu/~kriz/learning-features-2009-TR.pdf website, 2009
MATH Google Scholar
Image Compression Benchmark official website. See imagecompression.info/ website, 2015
Piczak K J. ESC: dataset for environmental sound classification. In: Proceedings of the 23rd ACM international conference on Multimedia. 2015, 1015–1018
Chapter MATH Google Scholar
Warden P. Speech commands: a dataset for limited-vocabulary speech recognition. 2018, arXiv preprint arXiv: 1804.03209
Ito K, Johnson L. The LJ speech dataset. See keithito.com/LJ-Speech-Dataset website, 2017
MATH Google Scholar
Pratas D, Pinho A J. A DNA sequence corpus for compression benchmark. In: Proceedings of the 12th International Conference on Practical Applications of Computational Biology & Bioinformatics. 2019, 208–215
MATH Google Scholar
Geer L Y, Marchler-Bauer A, Geer R C, Han L, He J, He S, Liu C, Shi W, Bryant S H. The NCBI biosystems database. Nucleic Acids Research, 2010, 38(suppl_1): D492–D496
Article Google Scholar
PBzip2. PBzip2. See launchpad.net/pbzip2 website, 2009
Google Scholar
Takehiro K. SnZip Official Website. See github.com/kubo/snzipwebsite, 2021
Google Scholar
PPMD Official Website. PPMD. See 7-zip website, 2010
Barina D, Klima O. X3: lossless data compressor. In: Proceedings of 2022 Data Compression Conference. 2022, 441
Chapter MATH Google Scholar
Barina D. Experimental lossless data compressor. Microprocessors and Microsystems, 2023, 98: 104803
Article Google Scholar
LZ4. LZ4 official website. See github.com/lz4/lz4 website, 2024
Google Scholar
Qin L, Sun J. Model compression for data compression: neural network based lossless compressor made practical. In: Proceedings of 2023 Data Compression Conference. 2023, 52–61
Chapter MATH Google Scholar
Tibshirani R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 1996, 58(1): 267–288
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

This work was partly supported by the National Natural Science Foundation of China (Grant Nos. 62272253 and 62272252) and the Fundamental Research Funds for the Central Universities. It was also supported in part by the China Scholarship Council (CSC202406200085) and the Innovation Project of Guangxi Graduate Education (YCBZ2024005). The High-performance Computing Center of Guangxi University partly supported the experimental work. The authors thank the editor and anonymous reviewers for their constructive comments and suggestions for improving our manuscript.

Author information

These authors contributed equally to this work.

Authors and Affiliations

Nankai-Baidu Joint Laboratory, Parallel and Distributed Software Technology Laboratory, TMCC, SysNet, DISSec, GTIISC, College of Computer Science, Nankai University, Tianjin, 300350, China
Hui Sun, Huidong Ma, Feng Ling, Yongxia Sun, Liping Yi, Meng Yan, Xiaoguang Liu & Gang Wang
College of Computing and Data Science, Nanyang Technological University, Singapore, 639798, Singapore
Hui Sun & Liping Yi
Institute of Artificial Intelligence, School of Electrical Engineering, Guangxi University, Nanning, 53004, China
Haonan Xie
Key Laboratory of Parallel, Distributed and Intelligent of Guangxi Universities and Colleges, School of Computer, Electronics and Information, Guangxi University, Nanning, 53004, China
Cheng Zhong

Authors

Hui Sun
View author publications
You can also search for this author inPubMed Google Scholar
Huidong Ma
View author publications
You can also search for this author inPubMed Google Scholar
Feng Ling
View author publications
You can also search for this author inPubMed Google Scholar
Haonan Xie
View author publications
You can also search for this author inPubMed Google Scholar
Yongxia Sun
View author publications
You can also search for this author inPubMed Google Scholar
Liping Yi
View author publications
You can also search for this author inPubMed Google Scholar
Meng Yan
View author publications
You can also search for this author inPubMed Google Scholar
Cheng Zhong
View author publications
You can also search for this author inPubMed Google Scholar
Xiaoguang Liu
View author publications
You can also search for this author inPubMed Google Scholar
Gang Wang
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding authors

Correspondence to Meng Yan or Gang Wang.

Ethics declarations

Competing interests The authors declare that they have no competing interests or financial conflicts to disclose.

Additional information

Electronic supplementary material Supplementary material is available in the online version of this article at journal.hep.com.cn and link.springer.com.

Hui SUN obtained his BSc and MSc degrees in information security and computer science from China University of Mining and Technology and Guangxi University, China in 2019 and 2022, respectively. He is currently pursuing the PhD degree at the College of Computer Science, Nankai University, China, and is also a visiting student at the College of Computing and Data Science (CCDS), Nanyang Technological University (NTU), Singapore. His research interests include AI for data compression, deep learning, and parallel computing. He has authored technical conferences and journal papers in DCC, ICPADS, ISBRA, Journal on Communications, Journal of Chinese Computer Systems, Bioinformatics, BMC Bioinformatics, etc.

Huidong MA obtained his BSc and MSc degrees in computer science from Hainan University, China and Guangxi University, China in 2020 and 2023, respectively. He is currently pursuing a PhD degree in computer science at Nankai University, China. His main research interests include data storage systems, machine learning, AI for data compression, and large language models. He has authored technical papers in conferences and journals, such as DCC, ICPADS, ISBRA, Bioinformatics, and BMC Bioinformatics.

Feng LING obtained a BSc degree in information security from Northeast University, China in 2022. He is currently pursuing a MSc degree at Nankai University, China. He is a member of the Nankai-Baidu Joint Laboratory and the Parallel and Distributed Software Technology Laboratory. His research interests include AI for data compression, high-performance computing, parallel algorithm design, and neural network inference frameworks.

Haonan XIE is currently pursuing a PhD degree in automation at the School of Electrical Engineering, Guangxi University, China. He is a CAAI member, IEEE member, and IET member. His main research interests include artificial intelligence-engaged energy conversion, systems engineering modeling, and compression & management of big data in the power industry. He has authored technical papers in RSER, AE, DCC, ICPADS, Bioinformatics, and BMC Bioinformatics journals and conferences.

Yongxia SUN received her BSc and MSc degrees in information management & system and computer technology from Tianjin Agricultural University, China and Hunan University of Technology and Business, China. She is currently pursing the PhD degree at college of computer science, Nankai University, China. She is a member of the Nankai-Baidu Joint Laboratory and the Parallel and Distributed Software Technology Laboratory. Her research interests include data compression and storage, data auditing, machine learning, and blockchain application. She has authored technical papers in Computer Networks, CMC, etc.

Liping YI is currently pursuing the PhD degree at the College of Computer Science, Nankai University, China and is also a visiting student at the College of Computing and Data Science (CCDS), Nanyang Technological University (NTU), Singapore. Her research interests include federated learning, she has authored technical papers in NeurIPS, ICML, MM, IJCAI, ICASSP, ICWS, DASFAA, etc. conferences and TSC, TMC, KBS journals. She served as the reviewer of NeurIPS, ICML, ICLR, KDD, AAAI, IJCAI, CVPR, MM, ICASSP, ICME, FL-IJCAI’23 workshop, FL@FM-NeurIPS’23 workshop, FL@FM-TheWebConf’4 workshop, FL@FM-ICME’24 Workshop conferences, and TMC, TNNLS, TGCN, Neurocomputing journals.

Meng YAN obtained her BSc and PhD degrees in physics and computer science from Tianjin University and Nankai University, China in 2014 and 2022, respectively. She is an assistant professor at Nankai University, conducting postdoctoral research at the Nankai-Baidu Joint Laboratory. Her main research areas include blockchain, machine learning, and AI for data compression. She has published technical papers on SRDS, Chinese Journal of Electronics, Bioinformatics, etc. She is currently a member of editor board for Journal Blockchain.

Cheng ZHONG received the PhD degree in computer science and technology from the University of Science and Technology of China, China. He is currently a professor with the School of Computer, Electronics and Information at Guangxi University, China. He has been hosted several national and provincial research projects. He has published more than 150 journal/conference papers and edited 5 books. His research interests include parallel computing, bioinformatics, distributed computing, and information security. He is an outstanding member of Chinese Computer Federation.

Xiaoguang LIU received his BSc, MSc, and PhD degrees in computer science from Nankai University, China in 1996, 1999, and 2002, respectively. He is currently a professor at the Department of Computer Science, Nankai University, China. His research interests include search engines, storage systems, GPU computing, and federated learning. He has authored technical papers in DCC, ICML, AAAI, IJCAI, WWW, SIGIR, VLDB conferences and TC, TPDS, TOS, TKDE, TDSC, TMM, TNNLS, TCSVT journals, etc.

Gang WANG received his BSc, MSc, and PhD degrees in computer science from Nankai University, China in 1996, 1999 and 2002, respectively. He is currently a professor at the Department of Computer Science, Nankai University, China. His research interests include parallel computing, storage systems, data mining, machine learning, and federated learning. He has authored technical papers in ICML, AAAI, IJCAI, WWW, SIGIR, DCC, VLDB, ACM MM conferences and TC, TPDS, TOS, TKDE, TDSC, TNNLS, TCSVT journals, etc.

Supplementary Materials

A Survey and Benchmark Evaluation for Neural-Network-Based Lossless Universal Compressors Toward Multi-Source Data

Supplementary material, approximately 272 KB.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Sun, H., Ma, H., Ling, F. et al. A survey and benchmark evaluation for neural-network-based lossless universal compressors toward multi-source data. Front. Comput. Sci. 19, 197360 (2025). https://doi.org/10.1007/s11704-024-40300-5

Download citation

Received: 24 March 2024
Accepted: 09 December 2024
Published: 21 March 2025
DOI: https://doi.org/10.1007/s11704-024-40300-5

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A survey and benchmark evaluation for neural-network-based lossless universal compressors toward multi-source data

Abstract

Article PDF

Similar content being viewed by others

Neural Network Compression Framework for Fast Model Inference

Light Loss-Less Data Compression, with GPU Implementation

A Survey of Deep Neural Network Compression

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Additional information

Supplementary Materials

A Survey and Benchmark Evaluation for Neural-Network-Based Lossless Universal Compressors Toward Multi-Source Data

Supplementary material, approximately 272 KB.

Rights and permissions

About this article

Cite this article

Share this article

Keywords