skip to main content
10.1145/3502181.3531473acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
research-article
Public Access

Ultrafast Error-bounded Lossy Compression for Scientific Datasets

Published: 27 June 2022 Publication History

Abstract

Today's scientific high-performance computing applications and advanced instruments are producing vast volumes of data across a wide range of domains, which impose a serious burden on data transfer and storage. Error-bounded lossy compression has been developed and widely used in the scientific community because it not only can significantly reduce the data volumes but also can strictly control the data distortion based on the user-specified error bound. Existing lossy compressors, however, cannot offer ultrafast compression speed, which is highly demanded by numerous applications or use cases (such as in-memory compression and online instrument data compression). In this paper, we propose a novel ultrafast error-bounded lossy compressor that can obtain fairly high compression performance on both CPUs and GPUs and with reasonably high compression ratios. The key contributions are threefold. (1) We propose a generic error-bounded lossy compression framework---called SZx---that achieves ultrafast performance through its novel design comprising only lightweight operations such as bitwise and addition/subtraction operations, while still keeping a high compression ratio. (2) We implement SZx on both CPUs and GPUs and optimize the performance according to their architectures. (3) We perform a comprehensive evaluation with six real-world production-level scientific datasets on both CPUs and GPUs. Experiments show that SZx is 2~16x faster than the second-fastest existing error-bounded lossy compressor (either SZ or ZFP) on CPUs and GPUs, with respect to both compression and decompression.

References

[1]
[n. d.]. Hurricane ISABEL simulation dataset in IEEE Visualization 2004 Test. http://vis.computer.org/vis2004contest/data.html. Online.
[2]
[n. d.]. The Local Ensemble Transform Kalman Filter (LETKF) data assimilation package for the SCALE-RM weather model. https://github.com/gylien/scale-letkf.
[3]
[n. d.]. Miranda turbulence simulation. https://wci.llnl.gov/simulation/ computer-codes/miranda. Online.
[4]
[n. d.]. NYX simulation. https://amrex-astro.github.io/Nyx. Online.
[5]
[n. d.]. Scientific Data Reduction Benchmark. https://sdrbench.github.io/. Online.
[6]
Mark Ainsworth, Ozan Tugluk, Ben Whitney, and Scott Klasky. 2018. Multilevel techniques for compression and reduction of scientific data--the univariate case. Computing and Visualization in Science, Vol. 19, 5 (01 Dec 2018), 65--76.
[7]
Rafael Ballester-Ripoll, Peter Lindstrom, and Renato Pajarola. 2018. TTHRESH: Tensor Compression for Multidimensional Visual Data. CoRR, Vol. abs/1806.05952 (2018). http://arxiv.org/abs/1806.05952
[8]
Franck Cappello, Sheng Di, Sihuan Li, Xin Liang, Gok M. Ali, Dingwen Tao, Chun Yoon Hong, Xin-chuan Wu, Yuri Alexeev, and T. Frederic Chong. 2019. Use cases of lossy compression for floating-point data in scientific datasets. International Journal of High Performance Computing Applications (IJHPCA), Vol. 33 (2019), 1201--1220.
[9]
Yann Collet. 2015. Zstandard -- Real-time data compression algorithm. http://facebook.github.io/zstd/ (2015).
[10]
cuZFP. 2020. https://github.com/LLNL/zfp/tree/develop/src/cuda_zfp. Online.
[11]
L Peter Deutsch. 1996. GZIP file format specification version 4.3.
[12]
Sheng Di and Franck Cappello. 2016. Fast error-bounded lossy HPC data compression with SZ. In IEEE International Parallel and Distributed Processing Symposium. 730--739.
[13]
Sheng Di, Dingwen Tao, Xin Liang, and Franck Cappello. 2019. Efficient Lossy Compression for Scientific Data Based on Pointwise Relative Error Bound. IEEE Transactions on Parallel and Distributed Systems, Vol. 30, 2 (2019), 331--345. https://doi.org/10.1109/TPDS.2018.2859932
[14]
Ali Murat Gok, Sheng Di, Yuri Alexeev, Dingwen Tao, Vladimir Mironov, Xin Liang, and Franck Cappello. 2018. PaSTRI: Error-Bounded Lossy Compression for Two-Electron Integrals in Quantum Chemistry. In 2018 IEEE International Conference on Cluster Computing (CLUSTER). 1--11. https://doi.org/10.1109/CLUSTER.2018.00013
[15]
Salman Habib, Vitali Morozov, Nicholas Frontiere, Hal Finkel, Adrian Pope, Katrin Heitmann, Kalyan Kumaran, Venkatram Vishwanath, Tom Peterka, Joe Insley, et al. 2016. HACC: Extreme scaling and performance across diverse architectures. Commun. ACM, Vol. 60, 1 (2016), 97--104.
[16]
Dewan Ibtesham, Dorian Arnold, Patrick G Bridges, Kurt B Ferreira, and Ron Brightwell. 2012. On the viability of compression for reducing the overheads of checkpoint/restart-based fault tolerance. In 2012 41st international conference on parallel processing. IEEE, 148--157.
[17]
JE Kay, C Deser, A Phillips, A Mai, C Hannay, G Strand, JM Arblaster, SC Bates, G Danabasoglu, J Edwards, et al. 2015. The Community Earth System Model (CESM), large ensemble project: A community resource for studying climate change in the presence of internal climate variability. Bulletin of the American Meteorological Society, Vol. 96, 8 (2015), 1333--1349.
[18]
Jeongnim Kim and et al. 2018. QMCPACK: an open source ab initio quantum Monte Carlo package for the electronic structure of atoms, molecules and solids., Vol. 30, 19 (apr 2018), 195901.
[19]
Sriram Lakshminarasimhan, Neil Shah, Stephane Ethier, Scott Klasky, Rob Latham, Rob Ross, and Nagiza F. Samatova. 2011. Compressing the Incompressible with ISABELA: In-situ Reduction of Spatio-temporal Data. In Euro-Par 2011 Parallel Processing, Emmanuel Jeannot, Raymond Namyst, and Jean Roman (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 366--379.
[20]
LCRC. 2021. ThetaGPU Machine Overview. https://www.alcf.anl.gov/support-center/theta/theta-thetagpu-overview. Online.
[21]
Xin Liang, Sheng Di, Dingwen Tao, Sihuan Li, Shaomeng Li, Hanqi Guo, Zizhong Chen, and Franck Cappello. 2018. Error-Controlled Lossy Compression Optimized for High Compression Ratios of Scientific Datasets. In 2018 IEEE International Conference on Big Data. IEEE.
[22]
Peter Lindstrom. 2014. Fixed-rate compressed floating-point arrays. IEEE Transactions on Visualization and Computer Graphics, Vol. 20, 12 (2014), 2674--2683.
[23]
Peter Lindstrom and Martin Isenburg. 2006. Fast and efficient compression of floating-point data. IEEE Transactions on Visualization and Computer Graphics, Vol. 12, 5 (2006), 1245--1250.
[24]
Mark Harris, Shubhabrata Sengupta and John D. Owens. [n. d.]. Parallel Prefix Sum (Scan) with CUDA.
[25]
Marziyeh Nourian, Xiang Wang, Xiaodong Yu, Wu-chun Feng, and Michela Becchi. 2017. Demystifying automata processing: GPUs, FPGAs or Micron's AP?. In Proceedings of the International Conference on Supercomputing. 1--11.
[26]
Cody Rivera, Sheng Di, Jiannan Tian, Xiaodong Yu, Dingwen Tao, and Franck Cappello. 2022. Optimizing Huffman Decoding for Error-Bounded Lossy Compression on GPUs. arXiv preprint arXiv:2201.09118 (2022).
[27]
SLAC National Accelerator Laboratory. 2017. Linac Coherent Light Source (LCLS-II). https://lcls.slac.stanford.edu/. Online.
[28]
Summit. [n. d.]. https://www.olcf.ornl.gov/summit/.
[29]
Dingwen Tao, Sheng Di, Zizhong Chen, and Franck Cappello. 2017. Significantly improving lossy compression for scientific data sets based on multidimensional prediction and error-controlled quantization. In 2017 IEEE International Parallel and Distributed Processing Symposium. IEEE, 1129--1139.
[30]
Dingwen Tao, Sheng Di, Hanqi Guo, Zizhong Chen, and Franck Cappello. 2019. Z-checker: A framework for assessing lossy compression of scientific data. The International Journal of High Performance Computing Applications, Vol. 33, 2 (2019), 285--303. https://doi.org/10.1177/1094342017737147
[31]
Jiannan Tian et al. 2020. CuSZ: An Efficient GPU-Based Error-Bounded Lossy Compression Framework for Scientific Data. In Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques (PACT '20). 3--15.
[32]
Jiannan Tian, Sheng Di, Xiaodong Yu, Cody Rivera, Kai Zhao, Sian Jin, Yunhe Feng, Xin Liang, Dingwen Tao, and Franck Cappello. 2021. Optimizing Error-Bounded Lossy Compression for Scientific Data on GPUs. In 2021 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, 283--293.
[33]
Robert Underwood, Sheng Di, Jon C. Calhoun, and Franck Cappello. 2020. FRaZ: A Generic High-Fidelity Fixed-Ratio Lossy Compression Framework for Scientific Floating-point Data. https://arxiv.org/abs/2001.06139. Online.
[34]
Zang Wang, Alan C. Bovick, Hamid R. Sheikh, and Eero P. Simoncelli. [n. d.]. The SSIM Index for Image Quality Assessment. https://www.cns.nyu.edu/ lcv/ssim/
[35]
Xin-Chuan Wu, Sheng Di, Emma Maitreyee Dasgupta, Franck Cappello, Hal Finkel, Yuri Alexeev, and Frederic T. Chong. 2019. Full-State Quantum Circuit Simulation by Using Data Compression. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'19). Association for Computing Machinery, New York, USA, Article 80, 24 pages.
[36]
Xiaodong Yu and Michela Becchi. 2013. Exploring different automata representations for efficient regular expression matching on GPUs. ACM SIGPLAN Notices, Vol. 48, 8 (2013), 287--288.
[37]
Xiaodong Yu, Tekin Bicer, Rajkumar Kettimuthu, and Ian Foster. 2021 a. Topology-aware optimizations for multi-GPU ptychographic image reconstruction. In Proceedings of the ACM International Conference on Supercomputing. 354--366.
[38]
Xiaodong Yu, Sheng Di, Ali Murat Gok, Dingwen Tao, and Franck Cappello. 2021 b. cuZ-checker: A GPU-Based Ultra-Fast Assessment System for Lossy Compressions. In 2021 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, 307--319.
[39]
Xiaodong Yu, Viktor Nikitin, Daniel J Ching, Selin Aslan, Doug a Gürsoy, and Tekin Bicc er. 2022. Scalable and accurate multi-GPU-based image reconstruction of large-scale ptychography data. Scientific Reports, Vol. 12, 1 (2022), 1--16.
[40]
Xiaodong Yu, Hao Wang, Wu-chun Feng, Hao Gong, and Guohua Cao. 2017. An enhanced image reconstruction tool for computed tomography on GPUs. In Proceedings of the Computing Frontiers Conference. 97--106.
[41]
Xiaodong Yu, Hao Wang, Wu-chun Feng, Hao Gong, and Guohua Cao. 2019. GPU-based iterative medical CT image reconstructions. Journal of Signal Processing Systems, Vol. 91, 3 (2019), 321--338.
[42]
Xiaodong Yu, Fengguo Wei, Xinming Ou, Michela Becchi, Tekin Bicer, and Danfeng (Daphne) Yao. 2020. GPU-Based Static Data-Flow Analysis for Fast and Scalable Android App Vetting. In The 34th IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE.
[43]
Kai Zhao, Sheng Di, Xin Liang, Sihuan Li, Dingwen Tao, Zizhong Chen, and Franck Cappello. 2020. Significantly Improving Lossy Compression for HPC Datasets with Second-Order Prediction and Parameter Optimization. In Proceedings of the 29th International Symposium on High-Performance Parallel and Distributed Computing (HPDC '20). Association for Computing Machinery, New York, NY, USA, 89--100.
[44]
Zlib. [n. d.]. https://www.zlib.net/. Online.

Cited By

View all
  • (2024)Everything You Always Wanted to Know About Storage Compressibility of Pre-Trained ML Models but Were Afraid to AskProceedings of the VLDB Endowment10.14778/3659437.365945617:8(2036-2049)Online publication date: 31-May-2024
  • (2024)Significantly Improving Fixed-Ratio Compression Framework for Resource-limited ApplicationsProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673092(845-855)Online publication date: 12-Aug-2024
  • (2024)CereSZ: Enabling and Scaling Error-bounded Lossy Compression on Cerebras CS-2Proceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3625549.3658691(309-321)Online publication date: 3-Jun-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
HPDC '22: Proceedings of the 31st International Symposium on High-Performance Parallel and Distributed Computing
June 2022
314 pages
ISBN:9781450391993
DOI:10.1145/3502181
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 June 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. error-bounded lossy compression
  2. gpu
  3. high-speed compressor
  4. scientific data

Qualifiers

  • Research-article

Funding Sources

  • ARAMCO
  • U.S. Department of Energy Office of Science and Office of Advanced Scientific Computing Research (ASCR)
  • U.S. Department of Energy

Conference

HPDC '22

Acceptance Rates

Overall Acceptance Rate 166 of 966 submissions, 17%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)235
  • Downloads (Last 6 weeks)38
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Everything You Always Wanted to Know About Storage Compressibility of Pre-Trained ML Models but Were Afraid to AskProceedings of the VLDB Endowment10.14778/3659437.365945617:8(2036-2049)Online publication date: 31-May-2024
  • (2024)Significantly Improving Fixed-Ratio Compression Framework for Resource-limited ApplicationsProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673092(845-855)Online publication date: 12-Aug-2024
  • (2024)CereSZ: Enabling and Scaling Error-bounded Lossy Compression on Cerebras CS-2Proceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3625549.3658691(309-321)Online publication date: 3-Jun-2024
  • (2024)Shifting Between Compute and Memory Bounds: A Compression-Enabled Roofline ModelProceedings of the SC '24 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis10.1109/SCW63240.2024.00047(309-316)Online publication date: 17-Nov-2024
  • (2024)Enhancing Lossy Compression Through Cross-Field Information for Scientific ApplicationsProceedings of the SC '24 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis10.1109/SCW63240.2024.00046(300-308)Online publication date: 17-Nov-2024
  • (2024)SZOps: Scalar Operations for Error-bounded Lossy Compressor for Scientific DataProceedings of the SC '24 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis10.1109/SCW63240.2024.00042(260-269)Online publication date: 17-Nov-2024
  • (2024)hZCCL: Accelerating Collective Communication with Co-Designed Homomorphic CompressionSC24: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41406.2024.00110(1-15)Online publication date: 17-Nov-2024
  • (2024)CUSZP2: A GPU Lossy Compressor with Extreme Throughput and Optimized Compression RatioSC24: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41406.2024.00021(1-18)Online publication date: 17-Nov-2024
  • (2024)CliZ: Optimizing Lossy Compression for Climate Datasets with Adaptive Fine-tuned Data Prediction2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS57955.2024.00044(417-429)Online publication date: 27-May-2024
  • (2024)Accelerating Lossy and Lossless Compression on Emerging BlueField DPU Architectures2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS57955.2024.00040(373-385)Online publication date: 27-May-2024
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media