Fulfilling the Promises of Lossy Compression for Scientific Applications

Cappello, Franck; Di, Sheng; Gok, Ali Murat

doi:10.1007/978-3-030-63393-6_7

Franck Cappello¹¹,
Sheng Di¹¹ &
Ali Murat Gok¹¹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1315))

Included in the following conference series:

Smoky Mountains Computational Sciences and Engineering Conference

1181 Accesses
7 Citations

Abstract

Many scientific simulations, machine/deep learning applications and instruments are in need of significant data reduction. Error-bounded lossy compression has been identified as one solution and has been tested for many use-cases: reducing streaming intensity (instruments), reducing storage and memory footprints, accelerating computation and accelerating data access and transfer. Ultimately, users’ trust in lossy compression relies on the preservation of science: same conclusions should be drawn from computations or analysis done from lossy compressed data. Experience from scientific simulations, Artificial Intelligence (AI) and instruments reveals several points: (i) there are important gaps in the understanding of the effects of lossy compressed data on computations, AI and analysis, (ii) each use-case, application and user has its own requirements in terms of compression ratio, speed and accuracy, and current generic monolithic compressors are not responding well to this need for specialization. This situation calls for more research and development on the lossy compression technologies. This paper addresses the most pressing research needs regarding the application of lossy compression in the scientific context.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Lossy compressors can even achieve compression ratios of x100:1 for visualization purpose, if high accuracy is not needed.
2.
Bit-rate is defined as the average number of bits used to represent each data point after compression. That is, the smaller the bit-rate, the higher the compression ratio.
3.
There is even an effort to standardize the application programming interface (API).

References

Hammerling, D.M., Baker, A.H., Pinard, A., Lindstrom, P.: A collaborative effort to improve lossy compression methods for climate data. In: 2019 IEEE/ACM 5th International Workshop on Data Analysis and Reduction for Big Scientific Data (DRBSD-5), pp. 16–22 (2019)
Google Scholar
Sasaki, N., Sato, K., Endo, T., Matsuoka, S.: Exploration of lossy compression for application-level checkpoint/restart. In: 2015 IEEE International Parallel and Distributed Processing Symposium, pp. 914–922 (2015)
Google Scholar
Calhoun, J., Cappello, F., Olson, L.N., Snir, M., Gropp, W.D.: Exploring the feasibility of lossy compression for PDE simulations. Int. J. High Perform. Comput. Appl. 33(2), 397–410 (2019)
Article Google Scholar
Tao, D., Di, S., Liang, X., Chen, Z., Cappello, F.: Improving performance of iterative methods by lossy check pointing. In: Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2018, pp. 52–65, New York, NY, USA. Association for Computing Machinery (2018)
Google Scholar
Chen, Z., Son, S.W., Hendrix, W., Agrawal, A., Liao, W., Choudhary, A.: Numarck: machine learning algorithm for resiliency and checkpointing. In: SC 2014: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 733–744 (2014)
Google Scholar
Zhang, J., Zhuo, X., Moon, A., Liu, H., Son, S.W.: Efficient encoding and reconstruction of HPC datasets for checkpoint/restart. In: 2019 35th Symposium on Mass Storage Systems and Technologies (MSST), pp. 79–91 (2019)
Google Scholar
Di, S., Cappello, F.: Fast error-bounded lossy hpc data compression with SZ. In: 2016 IEEE International Parallel and Distributed Processing Symposium, pp. 730–739. IEEE (2016)
Google Scholar
Tao, D., Di, S., Chen, Z., Cappello, F.: Significantly improving lossy compression for scientific data sets based on multidimensional prediction and error-controlled quantization. In: 2017 IEEE International Parallel and Distributed Processing Symposium, pp. 1129–1139. IEEE (2017)
Google Scholar
Liang, X., et al.: Error-controlled lossy compression optimized for high compression ratios of scientific datasets. In: 2018 IEEE International Conference on Big Data (Big Data), pp. 438–447. IEEE (2018)
Google Scholar
Liang, X., et al.: Improving performance of data dumping with lossy compression for scientific simulation. In: 2019 IEEE International Conference on Cluster Computing (CLUSTER), pp. 1–11 (2019)
Google Scholar
Liang, X., et al.: Significantly improving lossy compression quality based on an optimized hybrid prediction model. In: Proceedings of International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–26 (2019)
Google Scholar
Zhao, K., et al.: Significantly improving lossy compression for HPC datasets with second-order prediction and parameter optimization. In: 29th International Symposium on High-Performance Parallel and Distributed Computing (ACM HPDC20), pp. 1–12 (2020)
Google Scholar
SZ lossy compressor team. https://github.com/disheng222/sz
Lindstrom, P., Isenburg, M.: Fast and efficient compression of floating-point data. IEEE Trans. Visual Comput. Graph. 12(5), 1245–1250 (2006)
Article Google Scholar
Lindstrom, P.: Fixed-rate compressed floating-point arrays. IEEE Trans. Visual Comput. Graph. 20(12), 2674–2683 (2014)
Article Google Scholar
Clyne, J., Mininni, P., Norton, A., Rast, M.: Interactive desktop analysis of high resolution simulations: application to turbulent plume dynamics and current sheet formation. New J. Phys. 9(8), 301 (2007)
Article Google Scholar
Ballard, G., Klinvex, A., Kolda, T.G.: TuckerMPI: a parallel C++/MPI software package for large-scale data compression via the tucker tensor decomposition. ACM Trans. Math. Softw. 46(2) (2020)
Google Scholar
Ainsworth, M., Tugluk, O., Whitney, B., Klasky, S.: Multilevel techniques for compression and reduction of scientific data–the univariate case. Comput. Vis. Sci. 19(5), 65–76 (2018)
Article MathSciNet Google Scholar
Ainsworth, M., Tugluk, O., Whitney, B., Klasky, S.: Multilevel techniques for compression and reduction of scientific data–the multivariate case. SIAM J. Sci. Comput. 41(2), A1278–A1303 (2019)
Article MathSciNet Google Scholar
Ainsworth, M., Tugluk, O., Whitney, B., Klasky, S.: Multilevel techniques for compression and reduction of scientific data-quantitative control of accuracy in derived quantities. SIAM J. Sci. Comput. 41(4), A2146–A2171 (2019)
Article MathSciNet Google Scholar
Ainsworth, M., Tugluk, O., Whitney, B., Klasky, S.: Multilevel techniques for compression and reduction of scientific data–the unstructured case. SIAM J. Sci. Comput. 42(2), A1402–A1427 (2020)
Article MathSciNet Google Scholar
Li, S., Di, S., Liang, X., Chen, Z., Cappello, F.: Optimizing lossy compression with adjacent snapshots for n-body simulation. In: 2018 IEEE International Conference on Big Data (Big Data), pp. 428–437. IEEE (2018)
Google Scholar
Liang, X., Di, S., Tao, D., Chen, Z., Cappello, F.: An efficient transformation scheme for lossy data compression with point-wise relative error bound. In: IEEE International Conference on Cluster Computing (CLUSTER), pp. 179–189, New York, NY, USA. IEEE (2018)
Google Scholar
Lee, D., Sim, A., Choi, J., Wu, K.: Improving statistical similarity based data reduction for non-stationary data. In: Proceedings of the 29th International Conference on Scientific and Statistical Database Management, SSDBM 2017, New York, NY, USA. Association for Computing Machinery (2017)
Google Scholar
Ballester-Ripoll, R., Lindstrom, P., Pajarola, R.: TTHRESH: tensor compression for multidimensional visual data. IEEE Trans. Vis. Comput. Graph. 1 (2019)
Google Scholar
Authors not disclosed (double blind submission). cuSZ: an efficient GPU-based error-boundedlossy compression framework for scientific data (submitted, 2020)
Google Scholar
Jin, S., et al.: Understanding GPU-based lossy compression for extreme-scale cosmological simulations (2020)
Google Scholar
Scientific Data Reduction Benchmark (2019). https://sdrbench.github.io/
Pasquetto, I.V., Borgman, C.L., Wofford, M.F.: Uses and reuses of scientific data: the data creators’ advantage. Harvard Data Sci. Rev. 1(2), 11 (2019). https://hdsr.mitpress.mit.edu/pub/jduhd7og
Eyring, V., et al.: Overview of the coupled model intercomparison project phase 6 (CMIP6) experimental design and organization. Geosci. Model Dev. 9(5), 1937–1958 (2016)
Article Google Scholar
Kay, J.E., et al.: Evaluating lossy data compression on climate simulation data within a large ensemble. Geosci. Model Dev. 9(12) (2016)
Google Scholar
Tao, D., Di, S., Guo, H., Chen, Z., Cappello, F.: Z-checker: a framework for assessing lossy compression of scientific data. Int. J. High Perform. Comput. Appl. 33(2), 285–303 (2017)
Article Google Scholar
Habib, S., et al.: HACC: extreme scaling and performance across diverse architectures. Commun. ACM 60(1), 97–104 (2016)
Article Google Scholar
VisAly-Foresight (2019). https://github.com/lanl/VizAly-Foresight
Cappello, F., et al.: Use cases of lossy compression for floating-point data in scientific data sets. Int. J. High Perform. Comput. Appl. 33(6), 1201–1220 (2019)
Article Google Scholar
Diffenderfer, J., Fox, A.L., Hittinger, J.A., Sanders, G., Lindstrom, P.G.: Error analysis of ZFP compression for floating-point data. SIAM J. Sci. Comput. 41(3), A1867–A1898 (2019)
Article MathSciNet Google Scholar
Agullo, E., et al.: Exploring variable accuracy storage through lossy compression techniques in numerical linear algebra: a first application to flexible GMRES. Res. Report RR-9342, Inria Bordeaux Sud-Ouest (2020)
Google Scholar
Fox, A., Diffenderfer, J., Hittinger, J., Sanders, G., Lindstrom, P.: Stability analysis of inline ZFP compression for floating-point data in iterative methods. CoRR, ArXiv:abs/2003.02324 (2020)
Tao, D., Di, S., Liang, X., Chen, Z., Cappello, F.: Fixed-PSNR lossy compression for scientific data. In: 2018 IEEE International Conference on Cluster Computing (CLUSTER), pp. 314–318 (2018)
Google Scholar
Hammerling, D.M., Baker, A.H., Pinard, A., Lindstrom, P.: A collaborative effort to improve lossy compression methods for climate data. In: 2019 IEEE/ACM DRBSD-5), pp. 16–22 (2019)
Google Scholar
Yakushin, I., et al.: Feature-preserving lossy compression for in situ data. In: International Workshop on Performance Modelling, Runtime System and Applications at the Exascale (EXA-PMRA20) (2020)
Google Scholar
Liang, X., et al.: Toward feature-preserving 2D and 3D vector field compression. In: 2020 IEEE Pacific Visualization Symposium (PacificVis), pp. 81–90 (2020)
Google Scholar
Soler, M., Plainchault, M., Conche, B., Tierny, J.: Topologically controlled lossy compression. In: IEEE Pacific Visualization Symposium, PacificVis 2018, Japan, 2018. IEEE Computer Society (2018)
Google Scholar
Underwood, R., Di, S., Calhoun, J.C., Cappello, F.: Fraz: a generic high-fidelity fixed-ratio lossy compression framework for scientific floating-point data. In: Proceedings of the 34th IEEE International Parallel and Distributed Symposium (IEEE IPDPS2020) (2020)
Google Scholar
Burtscher, M., Ratanaworabhan, P.: FPC: a high-speed compressor for double-precision floating-point data. IEEE Trans. Comput. 58(1), 18–31 (2009)
Article MathSciNet Google Scholar
Lu, T., et al.: Understanding and modeling lossy compression schemes on HPC scientific data. In: 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 348–357 (2018)
Google Scholar
Tao, D., Di, S., Liang, X., Chen, Z., Cappello, F.: Optimizing lossy compression rate-distortion from automatic online selection between SZ and ZFP. IEEE Trans. Parallel Distrib. Syst. 30(8), 1857–1871 (2019)
Article Google Scholar
Luo, H., et al.: Identifying latent reduced models to precondition lossy compression. In: IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2019, pp. 293–302 (2019)
Google Scholar
Gok, A.M., et al.: PaSTRI: error-bounded lossy compression for two-electron integrals in quantum chemistry. In 2018 IEEE International Conference on Cluster Computing (CLUSTER), pp. 1–11 (2018)
Google Scholar
Wu, X.-C., et al.: Full-state quantum circuit simulation by using data compression. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC’19, New York, NY, USA. Association for Computing Machinery (2019)
Google Scholar
Jin, S., Di, S., Liang, X., Tian, J., Tao, D., Cappello, F.: DeepSZ: a novel framework to compress deep neural networks by using error-bounded lossy compression. In: Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2019, pp. 159–170, New York, NY, USA. Association for Computing Machinery (2019)
Google Scholar
Burtscher, M., Mukka, H., Yang, A., Hesaaraki, F.: Real-time synthesis of compression algorithms for scientific data. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2016. IEEE Press (2016)
Google Scholar
Chandak, S., Tatwawadi, K., Wen, C., Wang, L., Ojea, J.A., Weissman, T.: LFZip: lossy compression of multivariate floating-point time series data via improved prediction. In: Bilgin, A., Marcellin, M.W., Serra-Sagristà, J., Storer, J.A. (eds.) Data Compression Conference, DCC 2020, Snowbird, UT, USA, March 24–27, 2020, pp. 342–351. IEEE (2020)
Google Scholar

Download references

Acknowledgments

The co-authors wish to thank (in alphabetical order): Mark Ainsworth, Julie Bessac, Jon Calhoun, Ozan Tugluk and Robert Underwood for the fruitfull discussions within the ECP CODAR project. This research was supported by the ECP, Project Number: 17-SC-20-SC, a collaborative effort of two DOE organizations – the Office of Science and the National Nuclear Security Administration, responsible for the planning and preparation of a capable exascale ecosystem, including software, applications, hardware, advanced system engineering and early testbed platforms, to support the nation’s exascale computing imperative. The material was based upon work supported by the DOE, Office of Science, under contract DE-AC02-06CH11357, and supported by the National Science Foundation under Grant No. 1763540, Grant No. 1617488 and Grant No. 2003709. We acknowledge the computing resources provided on Bebop, which is operated by the Laboratory Computing Resource Center at Argonne National Laboratory. This research also used computing resources of the Argonne Leadership Computing Facility.

Author information

Authors and Affiliations

Argonne National Laboratory, Lemont, USA
Franck Cappello, Sheng Di & Ali Murat Gok

Authors

Franck Cappello
View author publications
You can also search for this author in PubMed Google Scholar
Sheng Di
View author publications
You can also search for this author in PubMed Google Scholar
Ali Murat Gok
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Franck Cappello .

Editor information

Editors and Affiliations

Oak Ridge National Laboratory, Oak Ridge, TN, USA
Jeffrey Nichols
Oak Ridge National Laboratory, Oak Ridge, TN, USA
Becky Verastegui
Oak Ridge National Laboratory, Oak Ridge, TN, USA
Arthur ‘Barney’ Maccabe
Oak Ridge National Laboratory, Oak Ridge, TN, USA
Oscar Hernandez
Oak Ridge National Laboratory, Oak Ridge, TN, USA
Suzanne Parete-Koon
Oak Ridge National Laboratory, Oak Ridge, TN, USA
Theresa Ahearn

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cappello, F., Di, S., Gok, A.M. (2020). Fulfilling the Promises of Lossy Compression for Scientific Applications. In: Nichols, J., Verastegui, B., Maccabe, A.‘., Hernandez, O., Parete-Koon, S., Ahearn, T. (eds) Driving Scientific and Engineering Discoveries Through the Convergence of HPC, Big Data and AI. SMC 2020. Communications in Computer and Information Science, vol 1315. Springer, Cham. https://doi.org/10.1007/978-3-030-63393-6_7

Download citation

DOI: https://doi.org/10.1007/978-3-030-63393-6_7
Published: 18 December 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63392-9
Online ISBN: 978-3-030-63393-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics