ABSTRACT
Deep-learning-based compressor has received interests recently due to much improved compression ratio. However, modern approaches suffer from long execution time. To ease this problem, this paper targets on cutting down the execution time of deep-learning-based compressors. Building history-dependencies sequentially (e.g., recurrent neural networks) is responsible for long inference latency. Instead, we introduce transformer into deep learning compressors to build history-dependencies in parallel. However, existing transformer is too heavy in computation and incompatible to compression tasks.
This paper proposes a fast general-purpose lossless compressor, TRACE, by designing a compression-friendly structure based on a single-layer transformer. We first design a new metric to advise the selection part of compression model structures. Byte-grouping and Shared-ffn schemes are further proposed to fully utilize the capacity of the single-layer transformer. These features allow TRACE to achieve competitive compression ratio and a much faster speed. In addition, we further accelerate the compression procedure by designing a controller to reduce the parameter updating overhead. Experiments show that TRACE achieves an overall ∼ 3x speedup while keeps a comparable compression ratio to the state-of-the-art compressors. The source code for TRACE and links to the datasets are available at https://github.com/mynotwo/A-Fast-Transformer-based-General-Purpose-LosslessCompressor.
- X. Yu al.2020. Two-Level Data Compression using Machine Learning in Time Series Database. In 2020 IEEE 36th International Conference on Data Engineering (ICDE. 1333–1344.Google Scholar
- Fabrice Bellard. 2019. NNCP: Lossless Data Compression with Neural Networks. (2019). https://bellard.org/nncp/Google Scholar
- Nikolay Bogoychev. 2020. Not all parameters are born equal: Attention is mostly what you need. arXiv preprint arXiv:2010.11859(2020).Google Scholar
- M. Burtscher and P. Ratanaworabhan. 2009. FPC: A High-Speed Compressor for Double-Precision Floating-Point Data. IEEE Trans. Comput. 58, 1 (2009), 18–31.Google ScholarDigital Library
- [5] Worldwide Quarterly Enterprise Infrastructure Tracker: Buyer and Cloud Deployment.2021. (2021).Google Scholar
- Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078(2014).Google Scholar
- J. Cleary and I. Witten. 1984. Data compression using adaptive coding and partial string matching. IEEE Transacitons on Communications 32, 4 (1984), 396–402.Google ScholarCross Ref
- John G Cleary and William J Teahan. 1997. Unbounded length contexts for PPM. Comput. J. 40, 2_and_3 (1997), 67–75.Google ScholarCross Ref
- Y. Collet. 2016. Zstd github repository from facebook.https://github.com/facebook/zstdGoogle Scholar
- C. Costa, G. Chatzimilioudis, D. Zeinalipour-Yazti, and M. F. Mokbel. 2017. Efficient Exploration of Telco Big Data with Compression and Decaying. In 2017 IEEE 33rd International Conference on Data Engineering (ICDE). 1332–1343.Google Scholar
- C. Costa, G. Chatzimilioudis, D. Zeinalipour-Yazti, and M. F. Mokbel. 2017. SPATE: Compacting and Exploring Telco Big Data. In 2017 IEEE 33rd International Conference on Data Engineering (ICDE). 1419–1420.Google Scholar
- J. Deng, W. Dong, R. Socher, 2009. Imagenet: A large-scale hierarchical image database[C]//2009 IEEE conference on computer vision and pattern recognition. Ieee (2009), 248–255.Google Scholar
- Sebastian Deorowicz. 1985. Silesia Dataset. (1985). http://sun.aei.polsl.pl/sdeor/index.php?page=silesiaGoogle Scholar
- Peter Deutsch. 1996. GZIP file format specification version 4.3. RFC 1952(1996), 1–12. https://doi.org/10.17487/RFC1952Google ScholarDigital Library
- J. Devlin, M. W. Chang, K. Lee, 2018. Bert: Pre-training of deep bidirectional transformers for language understanding[J]. arXiv. (2018). arxiv:1810.04805preprint.Google Scholar
- DKRZ. 2020. DKRZ. (2020). https://www.research-in-germany.org/en/research-landscape/research-organisations/research-infrastructures/dkrz.htmlGoogle Scholar
- Mohit Goyal, Kedar Tatwawadi, Shubham Chandak, and Idoia Ochoa. 2021. DZip: Improved general-purpose loss less compression based on novel neural network modeling. In 2021 Data Compression Conference (DCC). IEEE, 153–162.Google ScholarCross Ref
- S. Idreos, R. Kaushik, V. Narasayya, and R. Ramamurthy. 2010. Estimating the compression fraction of an index using sampling. In 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010. 441–444.Google ScholarCross Ref
- David Reinsel John Rydning, John F.Gantz. 2021. 2021–2025: The World Keeps Creating More Data — Now, What Do We Do with It All? (2021). https://www.idc.com/getdoc.jsp?containerId=US46410421Google Scholar
- Friso Kingma, Pieter Abbeel, and Jonathan Ho. 2019. Bit-swap: Recursive bits-back coding for lossless compression with hierarchical latent variables. In International Conference on Machine Learning. PMLR, 3408–3417.Google Scholar
- W Kinsner and RH Greenfield. 1991. The Lempel-Ziv-Welch (LZW) data compression algorithm for packet radio. In [Proceedings] WESCANEX’91. IEEE, 225–229.Google Scholar
- B. Knoll. 2014. CMIX. (2014). http://www.byronknoll.com/cmix.htmlGoogle Scholar
- B. Knoll. 2016. Tensorflow-compress. https://github.com/byronknoll/tensorflow-compressGoogle Scholar
- B. Knoll. 2020. NNCP: Lossless Data Compression with Neural Networks. (2020). https://bellard.org/nncp/Google Scholar
- Byron Knoll and Nando de Freitas. 2012. A machine learning perspective on predictive coding with PAQ8. In 2012 Data Compression Conference. IEEE, 377–386.Google ScholarDigital Library
- Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. 2019. Albert: A lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942(2019).Google Scholar
- Valerii Likhosherstov, Krzysztof M Choromanski, Jared Quincy Davis, Xingyou Song, and Adrian Weller. 2021. Sub-linear memory: How to make performers slim. Advances in Neural Information Processing Systems 34 (2021).Google Scholar
- Qian Liu, Yiling Xu, and Zhu Li. 2019. DecMac: A Deep Context Model for High Efficiency Arithmetic Coding. In 2019 International Conference on Artificial Intelligence in Information and Communication (ICAIIC). IEEE, 438–443.Google Scholar
- Matt Mahoney. 2006. Large Text Compression Benchmark. (2006). http://mattmahoney.net/dc/text.htmlGoogle Scholar
- José L Nunez-Yanez and Vassilios A Chouliaras. 2005. A configurable statistical lossless compression core based on variable order Markov modeling and arithmetic coding. IEEE Trans. Comput. 54, 11 (2005), 1345–1359.Google ScholarDigital Library
- I. Pavlov. 1999. 7zip: File archiver. (1999). www.7-zip.orgGoogle Scholar
- Karol J. Piczak. 2015. ESC: Dataset for Environmental Sound Classification. (2015). https://doi.org/10.7910/DVN/YDEPUTGoogle ScholarCross Ref
- Jack W Rae, Anna Potapenko, Siddhant M Jayakumar, and Timothy P Lillicrap. 2019. Compressive transformers for long-range sequence modelling. arXiv preprint arXiv:1911.05507(2019).Google Scholar
- Claude Elwood Shannon. 1948. A mathematical theory of communication. The Bell system technical journal 27, 3 (1948), 379–423.Google Scholar
- Rajeev Sharma. 2016. Zipline. (2016). https://github.com/opencomputeproject/Project-ZiplineGoogle Scholar
- Y. Tay, M. Dehghani, S. Abnar, 2020. Long Range Arena: A Benchmark for Efficient Transformers. (2020). arxiv:2011.04006preprint.Google Scholar
- James Townsend, Tom Bird, and David Barber. 2019. Practical lossless compression with latent variables using bits back coding. arXiv preprint arXiv:1901.04866(2019).Google Scholar
- Rianne van den Berg, Alexey A Gritsenko, Mostafa Dehghani, Casper Kaae Sønderby, and Tim Salimans. 2020. Idf++: Analyzing and improving integer discrete flows for lossless compression. In International Conference on Learning Representations.Google Scholar
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).Google Scholar
- Wikipedia. 2020. Amazon Web Serivce. (2020). https://en.wikipedia.org/wiki/Amazon_Web_ServicesGoogle Scholar
- Ian H Witten, Radford M Neal, and John G Cleary. 1987. Arithmetic coding for data compression. Commun. ACM 30, 6 (1987), 520–540.Google ScholarDigital Library
- Jacob Ziv and Abraham Lempel. 1977. A universal algorithm for sequential data compression. IEEE Transactions on information theory 23, 3 (1977), 337–343.Google ScholarDigital Library
Index Terms
- TRACE: A Fast Transformer-based General-Purpose Lossless Compressor
Recommendations
Accelerating General-purpose Lossless Compression via Simple and Scalable Parameterization
MM '22: Proceedings of the 30th ACM International Conference on MultimediaThe storage of multi-media data can benefit from the advancements in general-purpose lossless compression. The explosive growth of multi-media data volume in data centers demands a higher compression ratio and better compressors' run-time speed. However,...
A general purpose lossless data compression method for GPU
The paper describes a parallel method for a lossless data compression that uses graphical processing units (GPUs). Two commonly used statistical and dictionary approaches to data compression have been applied in our method. The reduction of compression ...
Evolutionary lossless compression with GP-ZIP*
GECCO '08: Proceedings of the 10th annual conference on Genetic and evolutionary computationIn recent research we proposed GP-zip, a system which uses evolution to find optimal ways to combine standard compression algorithms for the purpose of maximally losslessly compressing files and archives. The system divides files into blocks of ...
Comments