Abstract
Climate data have been dramatically increasing in volume in recent years. This huge volume of climate data poses considerable challenges for data storage, archiving and sharing. In this paper, we propose a lossless compression algorithm for climate data, named czip. We efficiently eliminate data redundancy through several new methods, including adaptive prediction, eXclusive OR differencing, multiway compression and static regions. To utilize the multiple cores available on modern computers, czip is implemented in parallel. Experimental results show that czip can achieve outstanding compression ratios as well as deflating and inflating throughputs; czip can achieve 800 MB/s deflating throughputs and over 2600 MB/s inflating throughputs on a server with 16 cores.
Similar content being viewed by others
References
Overpeck, J.T., Meehl, G.A., Bony, S., Easterling, D.R.: Climate data challenges in the 21 st century. Science (Washington) 331(6018), 700–702 (2011)
Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Trans. Inf. Theory 23(3), 337–343 (1977)
Ziv, J., Lempel, A.: Compression of individual sequences via variable rate coding. IEEE Trans. Inf. Theory 24(5), 530–536 (1978)
zlib. http://www.zlib.net (Online)
lz4: Extremely Fast Compression algorithm. http://code.google.com/p/lz4/ (Online)
bzip2. http://www.bzip.org (Online)
Isenburg, M., Lindstrom, P., Snoeyink, J.: Lossless compression of predicted floating-point geometry. IEEE Trans. Inf. Theory 37(8), 869–877 (2005)
Burtscher, M., Ratanaworabhan, P.: FPC: a high-speed compressor for double-precision floating-point data. IEEE Trans. Comput. 58(1), 18–31 (2009)
C. 120.0-G-2: Lossless data compression. In: Report Concerning Space Data System Standards. Green Book (Issue 2) (2006)
Lindstrom, P., Isenburg, M.: Fast and efficient compression of floating-point data. IEEE Trans. Comput. 12(5), 1245–1250 (2006)
Ibarria, L., Lindstrom, P., Rossignac, J., Szymczak, A.: Out-of-core compression and decompression of large n-dimensional scalar fields. Comput. Graph. Forum 22(3), 343–348 (2003)
Wheeler, D., Burrows, M.: A block-sorting lossless data compression algorithm. Digital Systems Research Center Report, vol. 124 (1994)
LZO: real-time data compression library. http://www.oberhumer.com/opensource/lzo/ (Online)
Yeh, P.-S., Xia-Serafino, W., Miles, L., Kobler, B., Menasce, D.: Implementation of ccsds lossless data compression in hdf. In: Earth Science Technology Conference (2002)
O’Neil, M.A., Burtscher, M.: Floating-point data compression at 75 gb/s on a gpu. In: Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units, p. 7. ACM (2011)
Sanchez, V., Nasiopoulos, P., Abugharbieh, R.: Lossless compression of 4d medical images using h. 264/avc. In: 2006 IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 2. pp. II–II, IEEE (2006)
Woodring, J., Mniszewski, S., Brislawn, C., DeMarle, D., Ahrens, J.: Revisiting wavelet compression for large-scale climate data using jpeg, 2000 and ensuring data precision. In: 2011 IEEE Symposium Large Data Analysis and Visualization (LDAV), pp. 31–38 (2011)
Ma, K.-L., Shen, H.-W.: Compression and accelerated rendering of time-varying volume data. In: Proceedings of the 2000 International Computer Symposium-Workshop on Computer Graphics and Virtual Reality, pp. 82–89 (2000)
Fout, N., Ma, K.-L., Ahrens, J.: Time-varying, multivariate volume data reduction. In: Proceedings of the 2005 ACM Symposium on Applied Computing. ACM, pp. 1224–1230 (2005)
Fout, N., Ma, K.-L.: An adaptive prediction-based approach to lossless compression of floating-point volume data. IEEE Trans. Comput. 18(12), 2295–2304 (2012)
Engelson, V., Fritzson, D., Fritzson, P.: Lossless compression of high-volume numerical data from simulations. In: Data Compression Conference. Citeseer (2000)
Robinson, T.: Simple Lossless and Near-Lossless Waveform Compression. Cambridge University Engineering Department, Cambridge (1995)
Hans, M., Schafer, R.W.: Lossless compression of digital audio. IEEE Trans. Comput. 18(4), 21–32 (2001)
Taylor, K., Stouffer, R., Meehl, G.: An overview of CMIP5 and the experiment design. IEEE Trans. Comput. 93(4), 485 (2012)
Network Common Data Form. http://www.unidata.ucar.edu/software/netcdf/ (Online)
CMIP5 Output Requirements. http://cmip-pcmdi.llnl.gov/cmip5/output-req.html (Online)
Earth System Grid Federation. http://pcmdi9.llnl.gov/esgf-web-fe/ (Online)
Songbin, L., Xiaomeng, H., Haohuan, F.: Data reduction analysis for climate data sets. In: 10th IFIP International Conference on Network and Parallel Computing (2013)
Rice, R.F.: Practical universal noiseless coding. In: 23rd Annual Technical Symposium. International Society for Optics and Photonics, pp. 247–267 (1979)
pigz. http://zlib.net/pigz. (Online)
Homepage of Martin Isenburg. http://www.cs.unc.edu/~isenburg/ (Online)
SZIP 2.1. http://www.hdfgroup.org/ftp/lib-external/szip/ (Online)
Acknowledgments
The authors would like to thank the editor and the anonymous reviewers for their valuable comments. This study was supported by funding from the National Natural Science Foundation of China (41375102), the National Grand Fundamental Research 973 Program of China (No. 2014CB347800), and the National High Technology Development Program of China (2011AA01A203).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Huang, X., Ni, Y., Chen, D. et al. Czip: A Fast Lossless Compression Algorithm for Climate Data. Int J Parallel Prog 44, 1248–1267 (2016). https://doi.org/10.1007/s10766-016-0403-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10766-016-0403-z