skip to main content
10.1145/3577193.3593721acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article

FAZ: A flexible auto-tuned modular error-bounded compression framework for scientific data

Published: 21 June 2023 Publication History

Abstract

Error-bounded lossy compression has been effective to resolve the big scientific data issue because it has a great potential to significantly reduce the data volume while allowing users to control data distortion based on specified error bounds. However, none of the existing error-bounded lossy compressors can always obtain the best compression quality because of the diverse characteristics of different datasets. In this paper, we develop FAZ, a flexible and adaptive error-bounded lossy compression framework, which projects a fairly high capability of adapting to diverse datasets. FAZ can always keep the compression quality at the best level compared with other state-of-the-art compressors for different datasets. We perform a comprehensive evaluation using 6 real-world scientific applications and 6 other state-of-the-art error-bounded lossy compressors. Experiments show that compared with the other existing lossy compressors, FAZ can improve the compression ratio by up to 120%, 190%, and 75% when setting the same error bound, the same PSNR and the same SSIM, respectively.

References

[1]
[n. d.]. Miranda application. https://wci.llnl.gov/simulation/computer-codes/miranda.
[2]
[n. d.]. Scalable Computing for Advanced Library and Environment (SCALE) - LETKF. https://github.com/gylien/scale-letkf.
[3]
[n. d.]. SEGSalt. https://wiki.seg.org/wiki/SEG/EAGE_Salt_and_Overthrust_Models.
[4]
[n. d.]. SPERR. https://github.com/NCAR/SPERR.
[5]
Rafael Ballester-Ripoll, Peter Lindstrom, and Renato Pajarola. 2019. TTHRESH: Tensor compression for multidimensional visual data. IEEE transactions on visualization and computer graphics 26, 9 (2019), 2891--2903.
[6]
Dor Bank, Noam Koenigstein, and Raja Giryes. 2020. Autoencoders. arXiv preprint arXiv:2003.05991 (2020).
[7]
Franck Cappello, Sheng Di, Sihuan Li, Xin Liang, Gok M. Ali, Dingwen Tao, Chun Yoon Hong, Xin-chuan Wu, Yuri Alexeev, and T. Frederic Chong. 2019. Use cases of lossy compression for floating-point data in scientific datasets. International Journal of High Performance Computing Applications (IJHPCA) 33 (2019), 1201--1220.
[8]
John Clyne, Pablo Mininni, Alan Norton, and Mark Rast. 2007. Interactive desktop analysis of high resolution simulations: application to turbulent plume dynamics and current sheet formation. New Journal of Physics 9, 8 (2007), 301.
[9]
Albert Cohen, Ingrid Daubechies, and J-C Feauveau. 1992. Biorthogonal bases of compactly supported wavelets. Communications on pure and applied mathematics 45, 5 (1992), 485--560.
[10]
Yann Collet. 2015. Zstandard - Real-time data compression algorithm. http://facebook.github.io/zstd/ (2015).
[11]
Ingrid Daubechies. 1988. Orthonormal bases of compactly supported wavelets. Communications on pure and applied mathematics 41, 7 (1988), 909--996.
[12]
Andrew Glaws, Ryan King, and Michael Sprague. 2020. Deep learning for in situ data compression of large turbulent flow simulations. Physical Review Fluids 5, 11 (2020), 114602.
[13]
Lucas Hayne, John Clyne, and Shaomeng Li. 2021. Using Neural Networks for Two Dimensional Scientific Data Compression. In 2021 IEEE International Conference on Big Data (Big Data). IEEE, 2956--2965.
[14]
Christopher E Heil and David F Walnut. 1989. Continuous and discrete wavelet transforms. SIAM review 31, 4 (1989), 628--666.
[15]
Hurricane ISABEL simulation data. 2004. http://vis.computer.org/vis2004contest/data.html. Online.
[16]
Suha Kayum et al. 2020. GeoDRIVE - a high performance computing flexible platform for seismic applications. First Break 38, 2 (2020), 97--100.
[17]
Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013).
[18]
Soheil Kolouri, Phillip E Pope, Charles E Martin, and Gustavo K Rohde. 2018. Sliced Wasserstein auto-encoders. In International Conference on Learning Representations.
[19]
Xin Liang, Sheng Di, Sihuan Li, Dingwen Tao, Bogdan Nicolae, Zizhong Chen, and Franck Cappello. 2019. Significantly improving lossy compression quality based on an optimized hybrid prediction model. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1--26.
[20]
Xin Liang, Sheng Di, Dingwen Tao, Sihuan Li, Shaomeng Li, Hanqi Guo, Zizhong Chen, and Franck Cappello. 2018. Error-Controlled Lossy Compression Optimized for High Compression Ratios of Scientific Datasets. In 2018 IEEE International Conference on Big Data. IEEE.
[21]
Xin Liang, Sheng Di, Dingwen Tao, Sihuan Li, Bogdan Nicolae, Zizhong Chen, and Franck Cappello. 2019. Improving performance of data dumping with lossy compression for scientific simulation. In 2019 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, 1--11.
[22]
Xin Liang, Ben Whitney, Jieyang Chen, Lipeng Wan, Qing Liu, Dingwen Tao, James Kress, David R Pugmire, Matthew Wolf, Norbert Podhorszki, et al. 2021. MGARD+: Optimizing multilevel methods for error-bounded scientific data reduction. IEEE Trans. Comput. (2021).
[23]
Xin Liang, Kai Zhao, Sheng Di, Sihuan Li, Robert Underwood, Ali M Gok, Jiannan Tian, Junjing Deng, Jon C Calhoun, Dingwen Tao, et al. 2022. SZ3:Amodular framework for composing prediction-based error-bounded lossy compressors. IEEE Transactions on Big Data (2022).
[24]
Z Lin, Hhm TS, WW Lee, WM Tang, and RB White. 1998. Turbulent transport reduction by zonal flows: massively parallel simulations. Science 281, 5384 (1998), 1835.
[25]
Peter Lindstrom. 2014. Fixed-rate compressed floating-point arrays. IEEE transactions on visualization and computer graphics 20, 12 (2014), 2674--2683.
[26]
Peter G Lindstrom et al. 2017. Fpzip. Technical Report. Lawrence Livermore National Lab.(LLNL), Livermore, CA (United States).
[27]
Jinyang Liu, Sheng Di, Kai Zhao, Sian Jin, Dingwen Tao, Xin Liang, Zizhong Chen, and Franck Cappello. 2021. Exploring Autoencoder-based Error-bounded Compression for Scientific Data. In 2021 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, 294--306.
[28]
Jinyang Liu, Sheng Di, Kai Zhao, Xin Liang, Zizhong Chen, and Franck Cappello. 2022. Dynamic quality metric oriented error bounded lossy compression for scientific datasets. In 2022 SC22: International Conference for High Performance Computing, Networking, Storage and Analysis (SC). IEEE Computer Society, 892--906.
[29]
Tong Liu, Jinzhen Wang, Qing Liu, Shakeel Alibhai, Tao Lu, and Xubin He. 2021. High-Ratio Lossy Compression: Exploring the Autoencoder to Compress Scientific Data. IEEE Transactions on Big Data (2021).
[30]
Tao Lu, Qing Liu, Xubin He, Huizhang Luo, Eric Suchyta, Jong Choi, Norbert Podhorszki, Scott Klasky, Mathew Wolf, Tong Liu, et al. 2018. Understanding and modeling lossy compression schemes on HPC scientific data. In 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 348--357.
[31]
NYX simulation. 2019. https://amrex-astro.github.io/Nyx. Online.
[32]
William A Pearlman, Asad Islam, Nithin Nagaraj, and Amir Said. 2004. Efficient, low-complexity image coding with a set-partitioning embedded block coder. IEEE transactions on circuits and systems for video technology 14, 11 (2004), 1219--1235.
[33]
Naoto Sasaki, Kento Sato, Toshio Endo, and Satoshi Matsuoka. 2015. Exploration of Lossy Compression for Application-Level Checkpoint/Restart. In IPDPS 2015. 914--922.
[34]
Dingwen Tao, Sheng Di, Zizhong Chen, and Franck Cappello. 2017. Significantly improving lossy compression for scientific data sets based on multidimensional prediction and error-controlled quantization. In 2017 IEEE International Parallel and Distributed Processing Symposium. IEEE, 1129--1139.
[35]
Dingwen Tao, Sheng Di, Hanqi Guo, Zizhong Chen, and Franck Cappello. 2019. Z-checker: A framework for assessing lossy compression of scientific data. The International Journal of High Performance Computing Applications 33, 2 (2019), 285--303.
[36]
Dingwen Tao, Sheng Di, Xin Liang, Zizhong Chen, and Franck Cappello. 2019. Optimizing lossy compression rate-distortion from automatic online selection between SZ and ZFP. IEEE Transactions on Parallel and Distributed Systems 30, 8 (2019), 1857--1871.
[37]
David S Taubman and Michael W Marcellin. 2002. JPEG2000: Standard for interactive imaging. Proc. IEEE 90, 8 (2002), 1336--1357.
[38]
Lipeng Wan, Kshitij V. Mehta, Scott A. Klasky, Matthew D. Wolf, H Y. Wang, W H. Wang, J C. Li, and Zhihong Lin. 2019. Data Management Challenges of Exascale Scientific Simulations: A Case Study with the Gyrokinetic Toroidal Code and ADIOS. (7 2019).
[39]
Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 13, 4 (2004), 600--612.
[40]
Kai Zhao, Sheng Di, Maxim Dmitriev, Thierry-Laurent D. Tonellot, Zizhong Chen, and Franck Cappello. 2021. Optimizing Error-Bounded Lossy Compression for Scientific Data by Dynamic Spline Interpolation. In 2021 IEEE 37th International Conference on Data Engineering (ICDE). 1643--1654.
[41]
Kai Zhao, Sheng Di, Xin Lian, Sihuan Li, Dingwen Tao, Julie Bessac, Zizhong Chen, and Franck Cappello. 2020. SDRBench: Scientific Data Reduction Benchmark for Lossy Compressors. In 2020 IEEE International Conference on Big Data (Big Data). 2716--2724.
[42]
Kai Zhao, Sheng Di, Xin Liang, Sihuan Li, Dingwen Tao, Zizhong Chen, and Franck Cappello. 2020. Significantly Improving Lossy Compression for HPC Datasets with Second-Order Prediction and Parameter Optimization. In Proceedings of the 29th International Symposium on High-Performance Parallel and Distributed Computing (HPDC '20). Association for Computing Machinery, New York, NY, USA, 89--100.

Cited By

View all
  • (2025)Multifacets of lossy compression for scientific data in the Joint-Laboratory of Extreme Scale ComputingFuture Generation Computer Systems10.1016/j.future.2024.05.022163(107323)Online publication date: Feb-2025
  • (2024)GWLZ: A Group-wise Learning-based Lossy Compression Framework for Scientific DataProceedings of the 14th Workshop on AI and Scientific Computing at Scale using Flexible Computing Infrastructures10.1145/3659995.3660041(34-41)Online publication date: 3-Jun-2024
  • (2024)High-performance Effective Scientific Error-bounded Lossy Compression with Auto-tuned Multi-component InterpolationProceedings of the ACM on Management of Data10.1145/36392592:1(1-27)Online publication date: 26-Mar-2024
  • Show More Cited By

Index Terms

  1. FAZ: A flexible auto-tuned modular error-bounded compression framework for scientific data
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        ICS '23: Proceedings of the 37th ACM International Conference on Supercomputing
        June 2023
        505 pages
        ISBN:9798400700569
        DOI:10.1145/3577193
        Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 21 June 2023

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. data compression
        2. high performance computing

        Qualifiers

        • Research-article

        Conference

        ICS '23
        Sponsor:

        Acceptance Rates

        Overall Acceptance Rate 629 of 2,180 submissions, 29%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)150
        • Downloads (Last 6 weeks)17
        Reflects downloads up to 16 Feb 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2025)Multifacets of lossy compression for scientific data in the Joint-Laboratory of Extreme Scale ComputingFuture Generation Computer Systems10.1016/j.future.2024.05.022163(107323)Online publication date: Feb-2025
        • (2024)GWLZ: A Group-wise Learning-based Lossy Compression Framework for Scientific DataProceedings of the 14th Workshop on AI and Scientific Computing at Scale using Flexible Computing Infrastructures10.1145/3659995.3660041(34-41)Online publication date: 3-Jun-2024
        • (2024)High-performance Effective Scientific Error-bounded Lossy Compression with Auto-tuned Multi-component InterpolationProceedings of the ACM on Management of Data10.1145/36392592:1(1-27)Online publication date: 26-Mar-2024
        • (2024)SZOps: Scalar Operations for Error-bounded Lossy Compressor for Scientific DataProceedings of the SC '24 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis10.1109/SCW63240.2024.00042(260-269)Online publication date: 17-Nov-2024
        • (2024)CUSZ-i: High-Ratio Scientific Lossy Compression on GPUs with Optimized Multi-Level InterpolationSC24: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41406.2024.00019(1-15)Online publication date: 17-Nov-2024
        • (2024)CliZ: Optimizing Lossy Compression for Climate Datasets with Adaptive Fine-tuned Data Prediction2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS57955.2024.00044(417-429)Online publication date: 27-May-2024
        • (2024)Attention Based Machine Learning Methods for Data Reduction with Guaranteed Error Bounds2024 IEEE International Conference on Big Data (BigData)10.1109/BigData62323.2024.10825655(1039-1048)Online publication date: 15-Dec-2024
        • (2023)SECRE: Surrogate-Based Error-Controlled Lossy Compression Ratio Estimation Framework2023 IEEE 30th International Conference on High Performance Computing, Data, and Analytics (HiPC)10.1109/HiPC58850.2023.00029(132-142)Online publication date: 18-Dec-2023
        • (2023)Scientific Error-bounded Lossy Compression with Super-resolution Neural Networks2023 IEEE International Conference on Big Data (BigData)10.1109/BigData59044.2023.10386682(229-236)Online publication date: 15-Dec-2023
        • (2023)Exploring Wavelet Transform Usages for Error-bounded Scientific Data Compression2023 IEEE International Conference on Big Data (BigData)10.1109/BigData59044.2023.10386386(4233-4239)Online publication date: 15-Dec-2023

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media