Skip to main content

Fulfilling the Promises of Lossy Compression for Scientific Applications

  • Conference paper
  • First Online:
Driving Scientific and Engineering Discoveries Through the Convergence of HPC, Big Data and AI (SMC 2020)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1315))

Included in the following conference series:

Abstract

Many scientific simulations, machine/deep learning applications and instruments are in need of significant data reduction. Error-bounded lossy compression has been identified as one solution and has been tested for many use-cases: reducing streaming intensity (instruments), reducing storage and memory footprints, accelerating computation and accelerating data access and transfer. Ultimately, users’ trust in lossy compression relies on the preservation of science: same conclusions should be drawn from computations or analysis done from lossy compressed data. Experience from scientific simulations, Artificial Intelligence (AI) and instruments reveals several points: (i) there are important gaps in the understanding of the effects of lossy compressed data on computations, AI and analysis, (ii) each use-case, application and user has its own requirements in terms of compression ratio, speed and accuracy, and current generic monolithic compressors are not responding well to this need for specialization. This situation calls for more research and development on the lossy compression technologies. This paper addresses the most pressing research needs regarding the application of lossy compression in the scientific context.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Lossy compressors can even achieve compression ratios of x100:1 for visualization purpose, if high accuracy is not needed.

  2. 2.

    Bit-rate is defined as the average number of bits used to represent each data point after compression. That is, the smaller the bit-rate, the higher the compression ratio.

  3. 3.

    There is even an effort to standardize the application programming interface (API).

References

  1. Hammerling, D.M., Baker, A.H., Pinard, A., Lindstrom, P.: A collaborative effort to improve lossy compression methods for climate data. In: 2019 IEEE/ACM 5th International Workshop on Data Analysis and Reduction for Big Scientific Data (DRBSD-5), pp. 16–22 (2019)

    Google Scholar 

  2. Sasaki, N., Sato, K., Endo, T., Matsuoka, S.: Exploration of lossy compression for application-level checkpoint/restart. In: 2015 IEEE International Parallel and Distributed Processing Symposium, pp. 914–922 (2015)

    Google Scholar 

  3. Calhoun, J., Cappello, F., Olson, L.N., Snir, M., Gropp, W.D.: Exploring the feasibility of lossy compression for PDE simulations. Int. J. High Perform. Comput. Appl. 33(2), 397–410 (2019)

    Article  Google Scholar 

  4. Tao, D., Di, S., Liang, X., Chen, Z., Cappello, F.: Improving performance of iterative methods by lossy check pointing. In: Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2018, pp. 52–65, New York, NY, USA. Association for Computing Machinery (2018)

    Google Scholar 

  5. Chen, Z., Son, S.W., Hendrix, W., Agrawal, A., Liao, W., Choudhary, A.: Numarck: machine learning algorithm for resiliency and checkpointing. In: SC 2014: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 733–744 (2014)

    Google Scholar 

  6. Zhang, J., Zhuo, X., Moon, A., Liu, H., Son, S.W.: Efficient encoding and reconstruction of HPC datasets for checkpoint/restart. In: 2019 35th Symposium on Mass Storage Systems and Technologies (MSST), pp. 79–91 (2019)

    Google Scholar 

  7. Di, S., Cappello, F.: Fast error-bounded lossy hpc data compression with SZ. In: 2016 IEEE International Parallel and Distributed Processing Symposium, pp. 730–739. IEEE (2016)

    Google Scholar 

  8. Tao, D., Di, S., Chen, Z., Cappello, F.: Significantly improving lossy compression for scientific data sets based on multidimensional prediction and error-controlled quantization. In: 2017 IEEE International Parallel and Distributed Processing Symposium, pp. 1129–1139. IEEE (2017)

    Google Scholar 

  9. Liang, X., et al.: Error-controlled lossy compression optimized for high compression ratios of scientific datasets. In: 2018 IEEE International Conference on Big Data (Big Data), pp. 438–447. IEEE (2018)

    Google Scholar 

  10. Liang, X., et al.: Improving performance of data dumping with lossy compression for scientific simulation. In: 2019 IEEE International Conference on Cluster Computing (CLUSTER), pp. 1–11 (2019)

    Google Scholar 

  11. Liang, X., et al.: Significantly improving lossy compression quality based on an optimized hybrid prediction model. In: Proceedings of International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–26 (2019)

    Google Scholar 

  12. Zhao, K., et al.: Significantly improving lossy compression for HPC datasets with second-order prediction and parameter optimization. In: 29th International Symposium on High-Performance Parallel and Distributed Computing (ACM HPDC20), pp. 1–12 (2020)

    Google Scholar 

  13. SZ lossy compressor team. https://github.com/disheng222/sz

  14. Lindstrom, P., Isenburg, M.: Fast and efficient compression of floating-point data. IEEE Trans. Visual Comput. Graph. 12(5), 1245–1250 (2006)

    Article  Google Scholar 

  15. Lindstrom, P.: Fixed-rate compressed floating-point arrays. IEEE Trans. Visual Comput. Graph. 20(12), 2674–2683 (2014)

    Article  Google Scholar 

  16. Clyne, J., Mininni, P., Norton, A., Rast, M.: Interactive desktop analysis of high resolution simulations: application to turbulent plume dynamics and current sheet formation. New J. Phys. 9(8), 301 (2007)

    Article  Google Scholar 

  17. Ballard, G., Klinvex, A., Kolda, T.G.: TuckerMPI: a parallel C++/MPI software package for large-scale data compression via the tucker tensor decomposition. ACM Trans. Math. Softw. 46(2) (2020)

    Google Scholar 

  18. Ainsworth, M., Tugluk, O., Whitney, B., Klasky, S.: Multilevel techniques for compression and reduction of scientific data–the univariate case. Comput. Vis. Sci. 19(5), 65–76 (2018)

    Article  MathSciNet  Google Scholar 

  19. Ainsworth, M., Tugluk, O., Whitney, B., Klasky, S.: Multilevel techniques for compression and reduction of scientific data–the multivariate case. SIAM J. Sci. Comput. 41(2), A1278–A1303 (2019)

    Article  MathSciNet  Google Scholar 

  20. Ainsworth, M., Tugluk, O., Whitney, B., Klasky, S.: Multilevel techniques for compression and reduction of scientific data-quantitative control of accuracy in derived quantities. SIAM J. Sci. Comput. 41(4), A2146–A2171 (2019)

    Article  MathSciNet  Google Scholar 

  21. Ainsworth, M., Tugluk, O., Whitney, B., Klasky, S.: Multilevel techniques for compression and reduction of scientific data–the unstructured case. SIAM J. Sci. Comput. 42(2), A1402–A1427 (2020)

    Article  MathSciNet  Google Scholar 

  22. Li, S., Di, S., Liang, X., Chen, Z., Cappello, F.: Optimizing lossy compression with adjacent snapshots for n-body simulation. In: 2018 IEEE International Conference on Big Data (Big Data), pp. 428–437. IEEE (2018)

    Google Scholar 

  23. Liang, X., Di, S., Tao, D., Chen, Z., Cappello, F.: An efficient transformation scheme for lossy data compression with point-wise relative error bound. In: IEEE International Conference on Cluster Computing (CLUSTER), pp. 179–189, New York, NY, USA. IEEE (2018)

    Google Scholar 

  24. Lee, D., Sim, A., Choi, J., Wu, K.: Improving statistical similarity based data reduction for non-stationary data. In: Proceedings of the 29th International Conference on Scientific and Statistical Database Management, SSDBM 2017, New York, NY, USA. Association for Computing Machinery (2017)

    Google Scholar 

  25. Ballester-Ripoll, R., Lindstrom, P., Pajarola, R.: TTHRESH: tensor compression for multidimensional visual data. IEEE Trans. Vis. Comput. Graph. 1 (2019)

    Google Scholar 

  26. Authors not disclosed (double blind submission). cuSZ: an efficient GPU-based error-boundedlossy compression framework for scientific data (submitted, 2020)

    Google Scholar 

  27. Jin, S., et al.: Understanding GPU-based lossy compression for extreme-scale cosmological simulations (2020)

    Google Scholar 

  28. Scientific Data Reduction Benchmark (2019). https://sdrbench.github.io/

  29. Pasquetto, I.V., Borgman, C.L., Wofford, M.F.: Uses and reuses of scientific data: the data creators’ advantage. Harvard Data Sci. Rev. 1(2), 11 (2019). https://hdsr.mitpress.mit.edu/pub/jduhd7og

  30. Eyring, V., et al.: Overview of the coupled model intercomparison project phase 6 (CMIP6) experimental design and organization. Geosci. Model Dev. 9(5), 1937–1958 (2016)

    Article  Google Scholar 

  31. Kay, J.E., et al.: Evaluating lossy data compression on climate simulation data within a large ensemble. Geosci. Model Dev. 9(12) (2016)

    Google Scholar 

  32. Tao, D., Di, S., Guo, H., Chen, Z., Cappello, F.: Z-checker: a framework for assessing lossy compression of scientific data. Int. J. High Perform. Comput. Appl. 33(2), 285–303 (2017)

    Article  Google Scholar 

  33. Habib, S., et al.: HACC: extreme scaling and performance across diverse architectures. Commun. ACM 60(1), 97–104 (2016)

    Article  Google Scholar 

  34. VisAly-Foresight (2019). https://github.com/lanl/VizAly-Foresight

  35. Cappello, F., et al.: Use cases of lossy compression for floating-point data in scientific data sets. Int. J. High Perform. Comput. Appl. 33(6), 1201–1220 (2019)

    Article  Google Scholar 

  36. Diffenderfer, J., Fox, A.L., Hittinger, J.A., Sanders, G., Lindstrom, P.G.: Error analysis of ZFP compression for floating-point data. SIAM J. Sci. Comput. 41(3), A1867–A1898 (2019)

    Article  MathSciNet  Google Scholar 

  37. Agullo, E., et al.: Exploring variable accuracy storage through lossy compression techniques in numerical linear algebra: a first application to flexible GMRES. Res. Report RR-9342, Inria Bordeaux Sud-Ouest (2020)

    Google Scholar 

  38. Fox, A., Diffenderfer, J., Hittinger, J., Sanders, G., Lindstrom, P.: Stability analysis of inline ZFP compression for floating-point data in iterative methods. CoRR, ArXiv:abs/2003.02324 (2020)

  39. Tao, D., Di, S., Liang, X., Chen, Z., Cappello, F.: Fixed-PSNR lossy compression for scientific data. In: 2018 IEEE International Conference on Cluster Computing (CLUSTER), pp. 314–318 (2018)

    Google Scholar 

  40. Hammerling, D.M., Baker, A.H., Pinard, A., Lindstrom, P.: A collaborative effort to improve lossy compression methods for climate data. In: 2019 IEEE/ACM DRBSD-5), pp. 16–22 (2019)

    Google Scholar 

  41. Yakushin, I., et al.: Feature-preserving lossy compression for in situ data. In: International Workshop on Performance Modelling, Runtime System and Applications at the Exascale (EXA-PMRA20) (2020)

    Google Scholar 

  42. Liang, X., et al.: Toward feature-preserving 2D and 3D vector field compression. In: 2020 IEEE Pacific Visualization Symposium (PacificVis), pp. 81–90 (2020)

    Google Scholar 

  43. Soler, M., Plainchault, M., Conche, B., Tierny, J.: Topologically controlled lossy compression. In: IEEE Pacific Visualization Symposium, PacificVis 2018, Japan, 2018. IEEE Computer Society (2018)

    Google Scholar 

  44. Underwood, R., Di, S., Calhoun, J.C., Cappello, F.: Fraz: a generic high-fidelity fixed-ratio lossy compression framework for scientific floating-point data. In: Proceedings of the 34th IEEE International Parallel and Distributed Symposium (IEEE IPDPS2020) (2020)

    Google Scholar 

  45. Burtscher, M., Ratanaworabhan, P.: FPC: a high-speed compressor for double-precision floating-point data. IEEE Trans. Comput. 58(1), 18–31 (2009)

    Article  MathSciNet  Google Scholar 

  46. Lu, T., et al.: Understanding and modeling lossy compression schemes on HPC scientific data. In: 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 348–357 (2018)

    Google Scholar 

  47. Tao, D., Di, S., Liang, X., Chen, Z., Cappello, F.: Optimizing lossy compression rate-distortion from automatic online selection between SZ and ZFP. IEEE Trans. Parallel Distrib. Syst. 30(8), 1857–1871 (2019)

    Article  Google Scholar 

  48. Luo, H., et al.: Identifying latent reduced models to precondition lossy compression. In: IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2019, pp. 293–302 (2019)

    Google Scholar 

  49. Gok, A.M., et al.: PaSTRI: error-bounded lossy compression for two-electron integrals in quantum chemistry. In 2018 IEEE International Conference on Cluster Computing (CLUSTER), pp. 1–11 (2018)

    Google Scholar 

  50. Wu, X.-C., et al.: Full-state quantum circuit simulation by using data compression. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC’19, New York, NY, USA. Association for Computing Machinery (2019)

    Google Scholar 

  51. Jin, S., Di, S., Liang, X., Tian, J., Tao, D., Cappello, F.: DeepSZ: a novel framework to compress deep neural networks by using error-bounded lossy compression. In: Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2019, pp. 159–170, New York, NY, USA. Association for Computing Machinery (2019)

    Google Scholar 

  52. Burtscher, M., Mukka, H., Yang, A., Hesaaraki, F.: Real-time synthesis of compression algorithms for scientific data. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2016. IEEE Press (2016)

    Google Scholar 

  53. Chandak, S., Tatwawadi, K., Wen, C., Wang, L., Ojea, J.A., Weissman, T.: LFZip: lossy compression of multivariate floating-point time series data via improved prediction. In: Bilgin, A., Marcellin, M.W., Serra-Sagristà, J., Storer, J.A. (eds.) Data Compression Conference, DCC 2020, Snowbird, UT, USA, March 24–27, 2020, pp. 342–351. IEEE (2020)

    Google Scholar 

Download references

Acknowledgments

The co-authors wish to thank (in alphabetical order): Mark Ainsworth, Julie Bessac, Jon Calhoun, Ozan Tugluk and Robert Underwood for the fruitfull discussions within the ECP CODAR project. This research was supported by the ECP, Project Number: 17-SC-20-SC, a collaborative effort of two DOE organizations – the Office of Science and the National Nuclear Security Administration, responsible for the planning and preparation of a capable exascale ecosystem, including software, applications, hardware, advanced system engineering and early testbed platforms, to support the nation’s exascale computing imperative. The material was based upon work supported by the DOE, Office of Science, under contract DE-AC02-06CH11357, and supported by the National Science Foundation under Grant No. 1763540, Grant No. 1617488 and Grant No. 2003709. We acknowledge the computing resources provided on Bebop, which is operated by the Laboratory Computing Resource Center at Argonne National Laboratory. This research also used computing resources of the Argonne Leadership Computing Facility.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Franck Cappello .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 UChicago Argonne, LLC

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Cappello, F., Di, S., Gok, A.M. (2020). Fulfilling the Promises of Lossy Compression for Scientific Applications. In: Nichols, J., Verastegui, B., Maccabe, A.‘., Hernandez, O., Parete-Koon, S., Ahearn, T. (eds) Driving Scientific and Engineering Discoveries Through the Convergence of HPC, Big Data and AI. SMC 2020. Communications in Computer and Information Science, vol 1315. Springer, Cham. https://doi.org/10.1007/978-3-030-63393-6_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-63393-6_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-63392-9

  • Online ISBN: 978-3-030-63393-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics