skip to main content
10.1145/3332466.3374525acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
research-article
Public Access

waveSZ: a hardware-algorithm co-design of efficient lossy compression for scientific data

Published: 19 February 2020 Publication History

Abstract

Error-bounded lossy compression is critical to the success of extreme-scale scientific research because of ever-increasing volumes of data produced by today's high-performance computing (HPC) applications. Not only can error-controlled lossy compressors significantly reduce the I/O and storage burden but they can retain high data fidelity for post analysis. Existing state-of-the-art lossy compressors, however, generally suffer from relatively low compression and decompression throughput (up to hundreds of megabytes per second on a single CPU core), which considerably restrict the adoption of lossy compression by many HPC applications especially those with a fairly high data production rate. In this paper, we propose a highly efficient lossy compression approach based on field programmable gate arrays (FPGAs) under the state-of-the-art lossy compression model SZ. Our contributions are fourfold. (1) We adopt a wavefront memory layout to alleviate the data dependency during the prediction for higher-dimensional predictors, such as the Lorenzo predictor. (2) We propose a co-design framework named waveSZ based on the wavefront memory layout and the characteristics of SZ algorithm and carefully implement it by using high-level synthesis. (3) We propose a hardware-algorithm co-optimization method to improve the performance. (4) We evaluate our proposed waveSZ on three real-world HPC simulation datasets from the Scientific Data Reduction Benchmarks and compare it with other state-of-the-art methods on both CPUs and FPGAs. Experiments show that our waveSZ can improve SZ's compression throughput by 6.9X ~ 8.7X over the production version running on a state-of-the-art CPU and improve the compression ratio and throughput by 2.1X and 5.8X on average, respectively, compared with the state-of-the-art FPGA design.

References

[1]
NEK5000: a fast and scalable high-order solver for computational fluid dynamics. 2019. GZIP file format specification version 4.3. https://nek5000.mcs.anl.gov/. Online.
[2]
Mohamed S Abdelfattah, Andrei Hagiescu, and Deshanand Singh. 2014. Gzip on a chip: High performance lossless data compression on FPGAs using OpenCl. In Proceedings of the International Workshop on OpenCL 2013 & 2014. ACM, ACM, Bristol, UK, 4.
[3]
Roberto Ammendola, Andrea Biagioni, Fabrizio Capuani, Paolo Cretaro, Giulia De Bonis, Francesca Lo Cicero, Alessandro Lonardo, Michele Martinelli, Pier Stanislao Paolucci, Elena Pastorelli, Luca Pontisso, Francesco Simula, and Piero Vicini. 2018. Large Scale Low Power Computing System - Status of Network Design in ExaNeSt and EuroExa Projects. arXiv:cs.DC/1804.03893
[4]
Allison H Baker, Haiying Xu, John M Dennis, Michael N Levy, Doug Nychka, Sheri A Mickelson, Jim Edwards, Mariana Vertenstein, and Al Wegener. 2014. A methodology for evaluating the impact of data compression on climate simulation data. In Proceedings of the 23rd international symposium on High-performance parallel and distributed computing. ACM, ACM, Vancouver, BC, Canada, 203--214.
[5]
Mehmet E Belviranli, Peng Deng, Laxmi N Bhuyan, Rajiv Gupta, and Qi Zhu. 2015. Peerwave: Exploiting wavefront parallelism on gpus with peer-sm synchronization. In Proceedings of the 29th ACM on International Conference on Supercomputing. ACM, Austin, TX, USA, 25--35.
[6]
Martin Burtscher and Paruj Ratanaworabhan. 2008. FPC: A high-speed compressor for double-precision floating-point data. IEEE Trans. Comput. 58, 1 (2008), 18--31.
[7]
Franck Cappello, Sheng Di, Sihuan Li, Xin Liang, Ali Murat Gok, Dingwen Tao, Chun Hong Yoon, Xin-Chuan Wu, Yuri Alexeev, and Frederic T Chong. 2019. Use cases of lossy compression for floating-point data in scientific data sets. The International Journal of High Performance Computing Applications 33, 6 (2019), 1201--1220.
[8]
Zhengzhang Chen, Seung Woo Son, William Hendrix, Ankit Agrawal, Wei-keng Liao, and Alok Choudhary. 2014. NUMARCK: Machine learning algorithm for resiliency and checkpointing. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, New Orleans, LA, USA, 733--744.
[9]
Steven Claggett, Sahar Azimi, and Martin Burtscher. 2018. SPDP: An automatically synthesized lossless compression algorithm for floating-point data. In the 2018 Data Compression Conference. IEEE, Snowbird, UT, USA, 337--346.
[10]
John Clyne, Pablo Mininni, Alan Norton, and Mark Rast. 2007. Interactive desktop analysis of high resolution simulations: application to turbulent plume dynamics and current sheet formation. New Journal of Physics 9, 301 (2007), 1--29.
[11]
IEEE Standards Committee et al. 2008. 754-2008 IEEE standard for floating-point arithmetic. IEEE Computer Society Std 2008 (2008), 517.
[12]
Community Earth System Model (CESM) Atmosphere Model. 2019. http://www.cesm.ucar.edu/models/. Online.
[13]
Philippe Coussy and Adam Morawiec. 2008. High-Level Synthesis: From Algorithm to Digital Circuit. Springer Science & Business Media, Dordrecht, Netherlands.
[14]
L Peter Deutsch. 1996. GZJP file format specification version 4.3.
[15]
Sheng Di. 2019. https://www.mcs.anl.gov/~shdi/download/sz-2.0-user-guide.pdf. Online.
[16]
Sheng Di and Franck Cappello. 2016. Fast error-bounded lossy HPC data compression with SZ. In 2016 IEEE International Parallel and Distributed Processing Symposium. IEEE, Chicago, IL, USA, 730--739.
[17]
Tom Feist. 2012. Vivado design suite. White Paper 5 (2012), 30.
[18]
Ian Foster, Mark Ainsworth, Bryce Allen, Julie Bessac, Franck Cappello, Jong Youl Choi, Emil Constantinescu, Philip E Davis, Sheng Di, Wendy Di, et al. 2017. Computing just what you need: Online data analysis and reduction at extreme scales. In European Conference on Parallel Processing. Springer, Springer, Cham, 3--19.
[19]
Jeremy Fowers, Joo-Young Kim, Doug Burger, and Scott Hauck. 2015. A scalable high-bandwidth architecture for lossless compression on FPGAs. In 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines. IEEE, Vancouver, BC, Canada, 52--59.
[20]
Daniel D Gajski, Nikil D Dutt, Allen CH Wu, and Steve YL Lin. 2012. High-level Synthesis: Introduction to Chip and System Design. Springer Science & Business Media, Boston, MA, USA.
[21]
Tong Geng, Tianqi Wang, Chunshu Wu, Chen Yang, Wei Wu, Ang Li, and Martin C. Herbordt. 2019. O3BNN: an out-of-order architecture for high-performance binarized neural network inference with fine-grained pruning. In Proceedings of the ACM International Conference on Supercomputing. ACM, Denver, CO, USA, 461--472.
[22]
Ali Murat Gok, Sheng Di, Alexeev Yuri, Dingwen Tao, Vladimir Mironov, Xin Liang, and Franck Cappello. 2018. PaSTRI: A novel data compression algorithm for two-electron integrals in quantum chemistry. In IEEE International Conference on Cluster Computing (CLUSTER). IEEE, Belfast, UK, 1--11.
[23]
Leonardo A Bautista Gomez and Franck Cappello. 2013. Improving floating point compression through binary masks. In 2013 IEEE International Conference on Big Data. IEEE, Silicon Valley, CA, USA, 326--331.
[24]
Salman Habib, Vitali Morozov, Nicholas Frontiere, Hal Finkel, Adrian Pope, Katrin Heitmann, Kalyan Kumaran, Venkatram Vishwanath, Tom Peterka, Joe Insley, et al. 2016. HACC: Extreme scaling and performance across diverse architectures. Commun. ACM 60, 1 (2016), 97--104.
[25]
W-J Huang, Nirmal Saxena, and Edward J McCluskey. 2000. A reliable LZ data compressor on reconfigurable coprocessors. In Proceedings 2000 IEEE Symposium on Field-Programmable Custom Computing Machines (Cat. No. PR00871). IEEE, Napa Valley, CA, USA, 249--258.
[26]
D. A. Huffman. 1952. A Method for the Construction of Minimum-Redundancy Codes. Proceedings of the IRE 40, 9 (Sep. 1952), 1098--1101.
[27]
Hurricane ISABEL Simulation Data. 2019. http://vis.computer.org/vis2004contest/data.html. Online.
[28]
Lawrence Ibarria, Peter Lindstrom, Jarek Rossignac, and Andrzej Szymczak. 2003. Out-of-core compression and decompression of large n-dimensional scalar fields. Computer Graphics Forum 22, 3 (2003), 343--348.
[29]
Sriram Lakshminarasimhan, Neil Shah, Stephane Ethier, Scott Klasky, Rob Latham, Rob Ross, and Nagiza F Samatova. 2011. Compressing the incompressible with ISABELA: In-situ reduction of spatio-temporal data. In European Conference on Parallel Processing. Springer, Springer, Berlin, Heidelberg, 366--379.
[30]
Xin Liang, Sheng Di, Sihuan Li, Dingwen Tao, Bogdan Nicolae, Zizhong Chen, and Franck Cappello. 2019. Significantly improving lossy compression quality based on an optimized hybrid prediction model. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. ACM, ACM, Denver, CO, USA, 33.
[31]
Xin Liang, Sheng Di, Dingwen Tao, Zizhong Chen, and Franck Cappello. 2018. An Efficient transformation scheme for lLossy data compression with point-wise relative error bound. In IEEE International Conference on Cluster Computing (CLUSTER). IEEE, Belfast, UK, 179--189.
[32]
Xin Liang, Sheng Di, Dingwen Tao, Sihuan Li, Shaomeng Li, Hanqi Guo, Zizhong Chen, and Franck Cappello. 2018. Error-controlled lossy compression optimized for high compression ratios of scientific datasets. In 2018 IEEE International Conference on Big Data (Big Data). IEEE, Seattle, WA, USA, 438--447.
[33]
Xin Liang, Sheng Di, Dingwen Tao, Sihuan Li, Bogdan Nicolae, Zizhong Chen, and Franck Cappello. 2019. Improving Performance of Data Dumping with Lossy Compression for Scientific Simulation. In 2019 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, IEEE, Albuquerque, NM, USA, 1--11.
[34]
Peter Lindstrom. 2014. Fixed-rate compressed floating-point arrays. IEEE Transactions on Visualization and Computer Graphics 20, 12 (2014), 2674--2683.
[35]
Peter Lindstrom and Martin Isenburg. 2006. Fast and efficient compression of floating-point data. IEEE Transactions on Visualization and Computer Graphics 12, 5 (2006), 1245--1250.
[36]
Tao Lu, Qing Liu, Xubin He, Huizhang Luo, Eric Suchyta, Jong Choi, Norbert Podhorszki, Scott Klasky, Mathew Wolf, Tong Liu, et al. 2018. Understanding and modeling lossy compression schemes on HPC scientific data. In 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, Vancouver, BC, Canada, 348--357.
[37]
Andrew Martin, Damir Jamsek, and K Agarawal. 2013. FPGA-based application acceleration: Case study with gzip compression/decompression streaming engine. ICCAD Special Session C 7 (2013), 2013.
[38]
Dirk Meister, Jürgen Kaiser, Andre Brinkmann, Toni Cortes, Michael Kuhn, and Julian Kunkel. 2012. A study on data deduplication in HPC storage systems. In SC '12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. IEEE Computer Society Press, IEEE, Salt Lake City, UT, USA, 7.
[39]
M. A. Mesa, A. Ramirez, A. Azevedo, C. Meenderinck, B. Juurlink, and M. Valero. 2009. Scalability of Macroblock-level Parallelism for H.264 Decoding. In 2009 15th International Conference on Parallel and Distributed Systems. IEEE, Shenzhen, China, 236--243.
[40]
NYX simulation. 2019. https://amrex-astro.github.io/Nyx/. Online.
[41]
PantaRhei cluster. 2019. https://www.dingwentao.com/experimental-system. Online.
[42]
Weikang Qiao, Jieqiong Du, Zhenman Fang, Michael Lo, MauChung Frank Chang, and Jason Cong. 2018. High-throughput lossless compression on tightly coupled CPU-FPGA platforms. In 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, Boulder, CO, USA, 37--44.
[43]
Weikang Qiao, Zhenman Fang, Mau-Chung Frank Chang, and Jason Cong. 2019. An FPGA-Based BWT accelerator for Bzip2 data compression. In 2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, San Diego, CA, USA, 96--99.
[44]
Suzanne Rigler, William Bishop, and Andrew Kennings. 2007. FPGA-based lossless data compression using Huffman and LZ77 algorithms. In 2007 Canadian conference on electrical and computer engineering. IEEE, Vancouver, BC, Canada, 1235--1238.
[45]
Jörg Ritter and Paul Molitor. 2001. A pipelined architecture for partitioned DWT based lossy image compression using FPGA's. In Proceedings of the 2001 ACM/SIGDA ninth international symposium on Field programmable gate arrays. ACM, Monterey, California, USA, 201--206.
[46]
Lucana Santos, José Fco López, Roberto Sarmiento, and Raffaele Vitulli. 2013. FPGA implementation of a lossy compression algorithm for hyperspectral images with a high-level synthesis tool. In 2013 NASA/ESA Conference on Adaptive Hardware and Systems (AHS-2013). IEEE, Torino, Italy, 107--114.
[47]
Naoto Sasaki, Kento Sato, Toshio Endo, and Satoshi Matsuoka. 2015. Exploration of lossy compression for application-level checkpoint/restart. In 2015 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, Hyderabad, India, 914--922.
[48]
Scientific Data Reduction Benchmarks. 2019. https://sdrbench.github.io/. Online.
[49]
H Sofikitis, K Roumpou, Apostolos Dollas, and N Bourbakis. 2005. An architecture for video compression based on the SCAN algorithm. In 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'05). IEEE, Napa, CA, USA, 295--296.
[50]
Seung Woo Son, Zhengzhang Chen, William Hendrix, Ankit Agrawal, Wei-keng Liao, and Alok Choudhary. 2014. Data compression for the exascale computing era-survey. Supercomputing Frontiers and Innovations 1, 2 (2014), 76--88.
[51]
Yinqi Tang and Naveen Verma. 2018. Energy-efficient pedestrian detection system: Exploiting statistical error compensation for lossy memory data compression. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 26, 7 (2018), 1301--1311.
[52]
Dingwen Tao, Sheng Di, Zizhong Chen, and Franck Cappello. 2017. Significantly improving lossy compression for scientific data sets based on multidimensional prediction and error-controlled quantization. In 2017 IEEE International Parallel and Distributed Processing Symposium. IEEE, Orlando, FL, USA, 1129--1139.
[53]
Dingwen Tao, Sheng Di, Xin Liang, Zizhong Chen, and Franck Cappello. 2019. Optimizing lossy compression rate-distortion from automatic online selection between SZ and ZFP. IEEE Transactions on Parallel and Distributed Systems 30, 8 (2019), 1857--1871.
[54]
David Taubman and Michael Marcellin. 2012. JPEG2000 image compression fundamentals, standards and practice: image compression fundamentals, standards and practice. Vol. 642. Springer Science & Business Media, Boston, MA, USA.
[55]
S. Crusan V. Vishwanath and K. Harms. 2019. Parallel I/O on Mira. https://www.alcf.anl.gov/files/Parallel_IO_on_Mira_0.pdf. Online.
[56]
Vivado Design Suite User Guide: High-Level Synthesis. 2019. https://www.xilinx.com/support/documentation/sw_manuals/xilinx2017_4/ug902-vivado-high-level-synthesis,pdf. Online.
[57]
Gregory K Wallace. 1992. The JPEG still picture compression standard. IEEE Transactions on Consumer Electronics 38, 1 (1992), xviii--xxxiv.
[58]
M. E. Wolf and M. S. Lam. 1991. A loop transformation theory and an algorithm to maximize parallelism. IEEE Transactions on Parallel and Distributed Systems 2, 4 (Oct 1991), 452--471.
[59]
Xilinx GZip. 2019. https://github.com/Xilinx/Applications/tree/master/GZip. Online.
[60]
Qingqing Xiong, Rushi Patel, Chen Yang, Tong Geng, Anthony Skjellum, and Martin C Herbordt. 2019. GhostSZ: A transparent FPGA-accelerated lossy compression framework. In 2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, San Diego, CA, USA, 258--266.
[61]
Zstd. 2019. https://github.com/facebook/zstd/releases. Online.

Cited By

View all
  • (2024)CereSZ: Enabling and Scaling Error-bounded Lossy Compression on Cerebras CS-2Proceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3625549.3658691(309-321)Online publication date: 3-Jun-2024
  • (2024)cuSZp2: A GPU Lossy Compressor with Extreme Throughput and Optimized Compression RatioProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC41406.2024.00021(1-18)Online publication date: 17-Nov-2024
  • (2024)Accelerating Lossy and Lossless Compression on Emerging BlueField DPU Architectures2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS57955.2024.00040(373-385)Online publication date: 27-May-2024
  • Show More Cited By

Index Terms

  1. waveSZ: a hardware-algorithm co-design of efficient lossy compression for scientific data

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    PPoPP '20: Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
    February 2020
    454 pages
    ISBN:9781450368186
    DOI:10.1145/3332466
    © 2020 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the United States Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication Notes

    Badge change: Article originally badged under Version 1.0 guidelines https://www.acm.org/publications/policies/artifact-review-badging

    Publication History

    Published: 19 February 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Badges

    Author Tags

    1. FPGA
    2. compression ratio
    3. lossy compression
    4. scientific data
    5. software-hardware co-design
    6. throughput

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    PPoPP '20

    Acceptance Rates

    PPoPP '20 Paper Acceptance Rate 28 of 121 submissions, 23%;
    Overall Acceptance Rate 230 of 1,014 submissions, 23%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)283
    • Downloads (Last 6 weeks)42
    Reflects downloads up to 16 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)CereSZ: Enabling and Scaling Error-bounded Lossy Compression on Cerebras CS-2Proceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3625549.3658691(309-321)Online publication date: 3-Jun-2024
    • (2024)cuSZp2: A GPU Lossy Compressor with Extreme Throughput and Optimized Compression RatioProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC41406.2024.00021(1-18)Online publication date: 17-Nov-2024
    • (2024)Accelerating Lossy and Lossless Compression on Emerging BlueField DPU Architectures2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS57955.2024.00040(373-385)Online publication date: 27-May-2024
    • (2024)Accelerating memory and I/O intensive HPC applications using hardware compressionJournal of Parallel and Distributed Computing10.1016/j.jpdc.2024.104955193:COnline publication date: 1-Nov-2024
    • (2023)Streaming Hardware Compressor Generator FrameworkProceedings of the SC '23 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3625126(289-297)Online publication date: 12-Nov-2023
    • (2023)A Data-driven Approach to Harvesting Latent Reduced Models to Precondition Lossy Compression for Scientific DataIEEE Transactions on Big Data10.1109/TBDATA.2022.32259599:3(949-963)Online publication date: 1-Jun-2023
    • (2023)High-Ratio Lossy Compression: Exploring the Autoencoder to Compress Scientific DataIEEE Transactions on Big Data10.1109/TBDATA.2021.30661519:1(22-36)Online publication date: 1-Feb-2023
    • (2023)Parallelizing Stream Compression for IoT Applications on Asymmetric Multicores2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00078(950-964)Online publication date: Apr-2023
    • (2023)Characterizing Lossy and Lossless Compression on Emerging BlueField DPU Architectures2023 IEEE Symposium on High-Performance Interconnects (HOTI)10.1109/HOTI59126.2023.00019(33-40)Online publication date: Aug-2023
    • (2022)CEAZProceedings of the 36th ACM International Conference on Supercomputing10.1145/3524059.3532362(1-13)Online publication date: 28-Jun-2022
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media