Abstract
Many scientific simulations, machine/deep learning applications and instruments are in need of significant data reduction. Error-bounded lossy compression has been identified as one solution and has been tested for many use-cases: reducing streaming intensity (instruments), reducing storage and memory footprints, accelerating computation and accelerating data access and transfer. Ultimately, users’ trust in lossy compression relies on the preservation of science: same conclusions should be drawn from computations or analysis done from lossy compressed data. Experience from scientific simulations, Artificial Intelligence (AI) and instruments reveals several points: (i) there are important gaps in the understanding of the effects of lossy compressed data on computations, AI and analysis, (ii) each use-case, application and user has its own requirements in terms of compression ratio, speed and accuracy, and current generic monolithic compressors are not responding well to this need for specialization. This situation calls for more research and development on the lossy compression technologies. This paper addresses the most pressing research needs regarding the application of lossy compression in the scientific context.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Lossy compressors can even achieve compression ratios of x100:1 for visualization purpose, if high accuracy is not needed.
- 2.
Bit-rate is defined as the average number of bits used to represent each data point after compression. That is, the smaller the bit-rate, the higher the compression ratio.
- 3.
There is even an effort to standardize the application programming interface (API).
References
Hammerling, D.M., Baker, A.H., Pinard, A., Lindstrom, P.: A collaborative effort to improve lossy compression methods for climate data. In: 2019 IEEE/ACM 5th International Workshop on Data Analysis and Reduction for Big Scientific Data (DRBSD-5), pp. 16–22 (2019)
Sasaki, N., Sato, K., Endo, T., Matsuoka, S.: Exploration of lossy compression for application-level checkpoint/restart. In: 2015 IEEE International Parallel and Distributed Processing Symposium, pp. 914–922 (2015)
Calhoun, J., Cappello, F., Olson, L.N., Snir, M., Gropp, W.D.: Exploring the feasibility of lossy compression for PDE simulations. Int. J. High Perform. Comput. Appl. 33(2), 397–410 (2019)
Tao, D., Di, S., Liang, X., Chen, Z., Cappello, F.: Improving performance of iterative methods by lossy check pointing. In: Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2018, pp. 52–65, New York, NY, USA. Association for Computing Machinery (2018)
Chen, Z., Son, S.W., Hendrix, W., Agrawal, A., Liao, W., Choudhary, A.: Numarck: machine learning algorithm for resiliency and checkpointing. In: SC 2014: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 733–744 (2014)
Zhang, J., Zhuo, X., Moon, A., Liu, H., Son, S.W.: Efficient encoding and reconstruction of HPC datasets for checkpoint/restart. In: 2019 35th Symposium on Mass Storage Systems and Technologies (MSST), pp. 79–91 (2019)
Di, S., Cappello, F.: Fast error-bounded lossy hpc data compression with SZ. In: 2016 IEEE International Parallel and Distributed Processing Symposium, pp. 730–739. IEEE (2016)
Tao, D., Di, S., Chen, Z., Cappello, F.: Significantly improving lossy compression for scientific data sets based on multidimensional prediction and error-controlled quantization. In: 2017 IEEE International Parallel and Distributed Processing Symposium, pp. 1129–1139. IEEE (2017)
Liang, X., et al.: Error-controlled lossy compression optimized for high compression ratios of scientific datasets. In: 2018 IEEE International Conference on Big Data (Big Data), pp. 438–447. IEEE (2018)
Liang, X., et al.: Improving performance of data dumping with lossy compression for scientific simulation. In: 2019 IEEE International Conference on Cluster Computing (CLUSTER), pp. 1–11 (2019)
Liang, X., et al.: Significantly improving lossy compression quality based on an optimized hybrid prediction model. In: Proceedings of International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–26 (2019)
Zhao, K., et al.: Significantly improving lossy compression for HPC datasets with second-order prediction and parameter optimization. In: 29th International Symposium on High-Performance Parallel and Distributed Computing (ACM HPDC20), pp. 1–12 (2020)
SZ lossy compressor team. https://github.com/disheng222/sz
Lindstrom, P., Isenburg, M.: Fast and efficient compression of floating-point data. IEEE Trans. Visual Comput. Graph. 12(5), 1245–1250 (2006)
Lindstrom, P.: Fixed-rate compressed floating-point arrays. IEEE Trans. Visual Comput. Graph. 20(12), 2674–2683 (2014)
Clyne, J., Mininni, P., Norton, A., Rast, M.: Interactive desktop analysis of high resolution simulations: application to turbulent plume dynamics and current sheet formation. New J. Phys. 9(8), 301 (2007)
Ballard, G., Klinvex, A., Kolda, T.G.: TuckerMPI: a parallel C++/MPI software package for large-scale data compression via the tucker tensor decomposition. ACM Trans. Math. Softw. 46(2) (2020)
Ainsworth, M., Tugluk, O., Whitney, B., Klasky, S.: Multilevel techniques for compression and reduction of scientific data–the univariate case. Comput. Vis. Sci. 19(5), 65–76 (2018)
Ainsworth, M., Tugluk, O., Whitney, B., Klasky, S.: Multilevel techniques for compression and reduction of scientific data–the multivariate case. SIAM J. Sci. Comput. 41(2), A1278–A1303 (2019)
Ainsworth, M., Tugluk, O., Whitney, B., Klasky, S.: Multilevel techniques for compression and reduction of scientific data-quantitative control of accuracy in derived quantities. SIAM J. Sci. Comput. 41(4), A2146–A2171 (2019)
Ainsworth, M., Tugluk, O., Whitney, B., Klasky, S.: Multilevel techniques for compression and reduction of scientific data–the unstructured case. SIAM J. Sci. Comput. 42(2), A1402–A1427 (2020)
Li, S., Di, S., Liang, X., Chen, Z., Cappello, F.: Optimizing lossy compression with adjacent snapshots for n-body simulation. In: 2018 IEEE International Conference on Big Data (Big Data), pp. 428–437. IEEE (2018)
Liang, X., Di, S., Tao, D., Chen, Z., Cappello, F.: An efficient transformation scheme for lossy data compression with point-wise relative error bound. In: IEEE International Conference on Cluster Computing (CLUSTER), pp. 179–189, New York, NY, USA. IEEE (2018)
Lee, D., Sim, A., Choi, J., Wu, K.: Improving statistical similarity based data reduction for non-stationary data. In: Proceedings of the 29th International Conference on Scientific and Statistical Database Management, SSDBM 2017, New York, NY, USA. Association for Computing Machinery (2017)
Ballester-Ripoll, R., Lindstrom, P., Pajarola, R.: TTHRESH: tensor compression for multidimensional visual data. IEEE Trans. Vis. Comput. Graph. 1 (2019)
Authors not disclosed (double blind submission). cuSZ: an efficient GPU-based error-boundedlossy compression framework for scientific data (submitted, 2020)
Jin, S., et al.: Understanding GPU-based lossy compression for extreme-scale cosmological simulations (2020)
Scientific Data Reduction Benchmark (2019). https://sdrbench.github.io/
Pasquetto, I.V., Borgman, C.L., Wofford, M.F.: Uses and reuses of scientific data: the data creators’ advantage. Harvard Data Sci. Rev. 1(2), 11 (2019). https://hdsr.mitpress.mit.edu/pub/jduhd7og
Eyring, V., et al.: Overview of the coupled model intercomparison project phase 6 (CMIP6) experimental design and organization. Geosci. Model Dev. 9(5), 1937–1958 (2016)
Kay, J.E., et al.: Evaluating lossy data compression on climate simulation data within a large ensemble. Geosci. Model Dev. 9(12) (2016)
Tao, D., Di, S., Guo, H., Chen, Z., Cappello, F.: Z-checker: a framework for assessing lossy compression of scientific data. Int. J. High Perform. Comput. Appl. 33(2), 285–303 (2017)
Habib, S., et al.: HACC: extreme scaling and performance across diverse architectures. Commun. ACM 60(1), 97–104 (2016)
VisAly-Foresight (2019). https://github.com/lanl/VizAly-Foresight
Cappello, F., et al.: Use cases of lossy compression for floating-point data in scientific data sets. Int. J. High Perform. Comput. Appl. 33(6), 1201–1220 (2019)
Diffenderfer, J., Fox, A.L., Hittinger, J.A., Sanders, G., Lindstrom, P.G.: Error analysis of ZFP compression for floating-point data. SIAM J. Sci. Comput. 41(3), A1867–A1898 (2019)
Agullo, E., et al.: Exploring variable accuracy storage through lossy compression techniques in numerical linear algebra: a first application to flexible GMRES. Res. Report RR-9342, Inria Bordeaux Sud-Ouest (2020)
Fox, A., Diffenderfer, J., Hittinger, J., Sanders, G., Lindstrom, P.: Stability analysis of inline ZFP compression for floating-point data in iterative methods. CoRR, ArXiv:abs/2003.02324 (2020)
Tao, D., Di, S., Liang, X., Chen, Z., Cappello, F.: Fixed-PSNR lossy compression for scientific data. In: 2018 IEEE International Conference on Cluster Computing (CLUSTER), pp. 314–318 (2018)
Hammerling, D.M., Baker, A.H., Pinard, A., Lindstrom, P.: A collaborative effort to improve lossy compression methods for climate data. In: 2019 IEEE/ACM DRBSD-5), pp. 16–22 (2019)
Yakushin, I., et al.: Feature-preserving lossy compression for in situ data. In: International Workshop on Performance Modelling, Runtime System and Applications at the Exascale (EXA-PMRA20) (2020)
Liang, X., et al.: Toward feature-preserving 2D and 3D vector field compression. In: 2020 IEEE Pacific Visualization Symposium (PacificVis), pp. 81–90 (2020)
Soler, M., Plainchault, M., Conche, B., Tierny, J.: Topologically controlled lossy compression. In: IEEE Pacific Visualization Symposium, PacificVis 2018, Japan, 2018. IEEE Computer Society (2018)
Underwood, R., Di, S., Calhoun, J.C., Cappello, F.: Fraz: a generic high-fidelity fixed-ratio lossy compression framework for scientific floating-point data. In: Proceedings of the 34th IEEE International Parallel and Distributed Symposium (IEEE IPDPS2020) (2020)
Burtscher, M., Ratanaworabhan, P.: FPC: a high-speed compressor for double-precision floating-point data. IEEE Trans. Comput. 58(1), 18–31 (2009)
Lu, T., et al.: Understanding and modeling lossy compression schemes on HPC scientific data. In: 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 348–357 (2018)
Tao, D., Di, S., Liang, X., Chen, Z., Cappello, F.: Optimizing lossy compression rate-distortion from automatic online selection between SZ and ZFP. IEEE Trans. Parallel Distrib. Syst. 30(8), 1857–1871 (2019)
Luo, H., et al.: Identifying latent reduced models to precondition lossy compression. In: IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2019, pp. 293–302 (2019)
Gok, A.M., et al.: PaSTRI: error-bounded lossy compression for two-electron integrals in quantum chemistry. In 2018 IEEE International Conference on Cluster Computing (CLUSTER), pp. 1–11 (2018)
Wu, X.-C., et al.: Full-state quantum circuit simulation by using data compression. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC’19, New York, NY, USA. Association for Computing Machinery (2019)
Jin, S., Di, S., Liang, X., Tian, J., Tao, D., Cappello, F.: DeepSZ: a novel framework to compress deep neural networks by using error-bounded lossy compression. In: Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2019, pp. 159–170, New York, NY, USA. Association for Computing Machinery (2019)
Burtscher, M., Mukka, H., Yang, A., Hesaaraki, F.: Real-time synthesis of compression algorithms for scientific data. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2016. IEEE Press (2016)
Chandak, S., Tatwawadi, K., Wen, C., Wang, L., Ojea, J.A., Weissman, T.: LFZip: lossy compression of multivariate floating-point time series data via improved prediction. In: Bilgin, A., Marcellin, M.W., Serra-Sagristà, J., Storer, J.A. (eds.) Data Compression Conference, DCC 2020, Snowbird, UT, USA, March 24–27, 2020, pp. 342–351. IEEE (2020)
Acknowledgments
The co-authors wish to thank (in alphabetical order): Mark Ainsworth, Julie Bessac, Jon Calhoun, Ozan Tugluk and Robert Underwood for the fruitfull discussions within the ECP CODAR project. This research was supported by the ECP, Project Number: 17-SC-20-SC, a collaborative effort of two DOE organizations – the Office of Science and the National Nuclear Security Administration, responsible for the planning and preparation of a capable exascale ecosystem, including software, applications, hardware, advanced system engineering and early testbed platforms, to support the nation’s exascale computing imperative. The material was based upon work supported by the DOE, Office of Science, under contract DE-AC02-06CH11357, and supported by the National Science Foundation under Grant No. 1763540, Grant No. 1617488 and Grant No. 2003709. We acknowledge the computing resources provided on Bebop, which is operated by the Laboratory Computing Resource Center at Argonne National Laboratory. This research also used computing resources of the Argonne Leadership Computing Facility.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 UChicago Argonne, LLC
About this paper
Cite this paper
Cappello, F., Di, S., Gok, A.M. (2020). Fulfilling the Promises of Lossy Compression for Scientific Applications. In: Nichols, J., Verastegui, B., Maccabe, A.‘., Hernandez, O., Parete-Koon, S., Ahearn, T. (eds) Driving Scientific and Engineering Discoveries Through the Convergence of HPC, Big Data and AI. SMC 2020. Communications in Computer and Information Science, vol 1315. Springer, Cham. https://doi.org/10.1007/978-3-030-63393-6_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-63393-6_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63392-9
Online ISBN: 978-3-030-63393-6
eBook Packages: Computer ScienceComputer Science (R0)