L3: Accelerator-Friendly Lossless Image Format for High-Resolution, High-Throughput DNN Training

Bae, Jonghyun; Baek, Woohyeon; Ham, Tae Jun; Lee, Jae W.

doi:10.1007/978-3-031-20083-0_11

Jonghyun Bae¹²,
Woohyeon Baek¹²,
Tae Jun Ham¹² &
…
Jae W. Lee¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13671))

Included in the following conference series:

European Conference on Computer Vision

Abstract

The training process of deep neural networks (DNNs) is usually pipelined with stages for data preparation on CPUs followed by gradient computation on accelerators like GPUs. In an ideal pipeline, the end-to-end training throughput is eventually limited by the throughput of the accelerator, not by that of data preparation. In the past, the DNN training pipeline achieved a near-optimal throughput by utilizing datasets encoded with a lightweight, lossy image format like JPEG. However, as high-resolution, losslessly-encoded datasets become more popular for applications requiring high accuracy, a performance problem arises in the data preparation stage due to low-throughput image decoding on the CPU. Thus, we propose L3, a custom lightweight, lossless image format for high-resolution, high-throughput DNN training. The decoding process of L3 is effectively parallelized on the accelerator, thus minimizing CPU intervention for data preparation during DNN training. L3 achieves a 9.29$\times $ higher data preparation throughput than PNG, the most popular lossless image format, for the Cityscapes dataset on NVIDIA A100 GPU, which leads to 1.71$\times $ higher end-to-end training throughput. Compared to JPEG and WebP, two popular lossy image formats, L3 provides up to 1.77$\times $ and 2.87$\times $ higher end-to-end training throughput for ImageNet, respectively, at equivalent metric performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Less Is More: Accelerating Faster Neural Networks Straight from JPEG

Usage of compressed domain in fast frameworks

Article 26 January 2022

Speeding up the Multi-objective NAS Through Incremental Learning

References

Abadi, M., et al.: TensorFlow: a system for large-scale machine learning. In: Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation, pp. 265–283. USENIX Association (2016)
Google Scholar
Agustsson, E., Timofte, R.: NTIRE 2017 challenge on single image super-resolution: dataset and study. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (2017)
Google Scholar
Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 833–851. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_49
Chapter Google Scholar
Chen, T., et al.: MXNet: a flexible and efficient machine learning library for heterogeneous distributed systems. arXiv preprint arXiv:1512.01274 (2015)
Chen, T., et al.: TVM: an automated end-to-end optimizing compiler for deep learning. In: Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation, pp. 578–594. USENIX Association (2018)
Google Scholar
Chen, X., Ma, H., Wan, J., Li, B., Xia, T.: Multi-view 3D object detection network for autonomous driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)
Google Scholar
Cheng, B., Schwing, A., Kirillov, A.: Per-pixel classification is not all you need for semantic segmentation. In: Proceedings of the Advances in Neural Information Processing Systems, vol. 34, pp. 17864–17875. Curran Associates, Inc. (2021)
Google Scholar
Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)
Google Scholar
Dang-Nguyen, D.T., Pasquini, C., Conotter, V., Boato, G.: RAISE: a raw images dataset for digital image forensics. In: Proceedings of the 6th ACM Multimedia Systems Conference, pp. 219–224. Association for Computing Machinery (2015)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-fei, L.: ImageNet: a large-scale hierarchical image database. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE (2009)
Google Scholar
Farrens, M., Park, A.: Dynamic base register caching: a technique for reducing address bus width. In: Proceedings of the 18th Annual International Symposium on Computer Architecture, pp. 128–137. Association for Computing Machinery (1991)
Google Scholar
Funasaka, S., Nakano, K., Ito, Y.: Adaptive loss-less data compression method optimized for GPU decompression. Concurrency Comput. Pract. Experience 29(24), e4283 (2017)
Article Google Scholar
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2012)
Google Scholar
Hong, Y., Pan, H., Sun, W., Jia, Y.: Deep dual-resolution networks for real-time and accurate semantic segmentation of road scenes. arXiv preprint arXiv:2101.06085 (2021)
Hou, L., et al.: High resolution medical image analysis with spatial partitioning. arXiv preprint arXiv:1909.03108 (2019)
Huang, Y., et al.: GPipe: efficient training of giant neural networks using pipeline parallelism. In: Proceedings of the Advances in Neural Information Processing Systems, vol. 32. Curran Associates, Inc. (2019)
Google Scholar
Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2019)
Google Scholar
Kirillov, A., Wu, Y., He, K., Girshick, R.: PointRend: image segmentation as rendering. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020)
Google Scholar
Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images. Technical report. Citeseer (2009)
Google Scholar
Lang, A.H., Vora, S., Caesar, H., Zhou, L., Yang, J., Beijbom, O.: PointPillars: fast encoders for object detection from point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2019)
Google Scholar
Li, S., Yan, Z., Li, H., Cheng, K.T.: Exploring intermediate representation for monocular vehicle pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1873–1883 (2021)
Google Scholar
Liang, J., Cao, J., Sun, G., Zhang, K., Van Gool, L., Timofte, R.: SwinIR: image restoration using swin transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, pp. 1833–1844 (2021)
Google Scholar
Lim, B., Son, S., Kim, H., Nah, S., Mu Lee, K.: Enhanced deep residual networks for single image super-resolution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 136–144 (2017)
Google Scholar
Ma, L., et al.: Rammer: enabling holistic deep learning compiler optimizations with rTasks. In: Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation, pp. 881–897. USENIX Association (2020)
Google Scholar
Markthub, P., Belviranli, M.E., Lee, S., Vetter, J.S., Matsuoka, S.: DRAGON: breaking GPU memory capacity limits with direct NVM access. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, pp. 32:1–32:13. IEEE (2018)
Google Scholar
Menze, M., Geiger, A.: Object scene flow for autonomous vehicles. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015)
Google Scholar
Mohan, J., Phanishayee, A., Raniwala, A., Chidambaram, V.: Analyzing and mitigating data stalls in DNN training. Proc. VLDB Endowment 14(5), 771–784 (2021)
Article Google Scholar
Murray, D.G., Simsa, J., Klimovic, A., Indyk, I.: tf.data: a machine learning data processing framework. Proceedings of the VLDB Endowment. 14(12), 2945–2958 (2021)
Article Google Scholar
Narayanan, D., et al.: PipeDream: generalized pipeline parallelism for DNN training. In: Proceedings of the 27th ACM Symposium on Operating Systems Principles, pp. 1–15. Association for Computing Machinery (2019)
Google Scholar
Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning. In: Proceedings of the NIPS Workshop on Deep Learning and Unsupervised Feature Learning (2011)
Google Scholar
NVIDIA: NVIDIA A100 tensor core GPU architecture (2020). https://images.nvidia.com/aem-dam/en-zz/Solutions/data-center/nvidia-ampere-architecture-whitepaper.pdf
NVIDIA: nvcomp: a library for fast lossless compression/decompression on the GPU (2021). https://github.com/NVIDIA/nvcomp
NVIDIA: the NVIDIA data loading library (DALI) (2021). https://github.com/NVIDIA/DALI
NVIDIA: nvJPEG libraries: GPU-accelerated JPEG decoder, encoder and transcoder (2021). https://developer.nvidia.com/nvjpeg
NVIDIA: nvJPEG2000 libraries (2021). https://docs.nvidia.com/cuda/nvjpeg2000
Ozsoy, A., Swany, M.: CULZSS: LZSS lossless data compression on CUDA. In: Proceedings of the 2011 IEEE International Conference on Cluster Computing, pp. 403–411 (2011)
Google Scholar
Paeth, A.W.: II.9 - image file compression made easy. In: Graphics Gems II, pp. 93–100. Morgan Kaufmann (1991)
Google Scholar
Park, P., Jeong, H., Kim, J.: TrainBox: an extreme-scale neural network training server architecture by systematically balancing operations. In: Proceedings of the 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture, pp. 825–838 (2020)
Google Scholar
Patel, R.A., Zhang, Y., Mak, J., Davidson, A., Owens, J.D.: Parallel lossless data compression on the GPU. In: Proceedings of the 2012 Innovative Parallel Computing, pp. 1–9 (2012)
Google Scholar
Pekhimenko, G., Seshadri, V., Mutlu, O., Gibbons, P.B., Kozuch, M.A., Mowry, T.C.: Base-delta-immediate compression: practical data compression for on-chip caches. In: Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques, pp. 377–388. Association for Computing Machinery (2012)
Google Scholar
Peng, Y., et al.: A generic communication scheduler for distributed DNN training acceleration. In: Proceedings of the 27th ACM Symposium on Operating Systems Principles, pp. 16–29. Association for Computing Machinery (2019)
Google Scholar
Pillow: python pillow filters (2021). https://pillow.readthedocs.io/en/stable/handbook/concepts.html#filters
PyTorch: pyTorch (2021). https://pytorch.org
Rebsamen, M., Suter, Y., Wiest, R., Reyes, M., Rummel, C.: Brain morphometry estimation: from hours to seconds using deep learning. Front. Neurol. 11, 244 (2020)
Article Google Scholar
Ren, C., He, X., Wang, C., Zhao, Z.: Adaptive consistency prior based deep network for image denoising. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8596–8606 (2021)
Google Scholar
Rhu, M., Gimelshein, N., Clemons, J., Zulfiqar, A., Keckler, S.W.: vDNN: virtualized deep neural networks for scalable, memory-efficient neural network design. In: Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 18:1–18:13. IEEE (2016)
Google Scholar
Sarangi, S., Baas, B.: Canonical huffman decoder on fine-grain many-core processor arrays. In: Proceedings of the 2021 26th Asia and South Pacific Design Automation Conference, pp. 512–517 (2021)
Google Scholar
Sitaridi, E., Mueller, R., Kaldewey, T., Lohman, G., Ross, K.A.: Massively-parallel lossless data decompression. In: Proceedings of the 2016 45th International Conference on Parallel Processing, pp. 242–247 (2016)
Google Scholar
Ultralytics: Yolov5 (2021). https://github.com/ultralytics/yolov5/
Wang, L., et al.: SuperNeurons: dynamic GPU memory management for training deep neural networks. In: Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 41–53. ACM (2018)
Google Scholar
Wang, L., et al.: DIESEL: a dataset-based distributed storage and caching system for large-scale deep learning training. In: Proceedings of the 49th International Conference on Parallel Processing. Association for Computing Machinery (2020)
Google Scholar
Wang, X., et al.: ESRGAN: enhanced super-resolution generative adversarial networks. In: Proceedings of the European Conference on Computer Vision Workshops (2018)
Google Scholar
Wang, Z., Cun, X., Bao, J., Zhou, W., Liu, J., Li, H.: Uformer: a general u-shaped transformer for image restoration. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 17683–17693 (2022)
Google Scholar
Weißenberger, A., Schmidt, B.: Massively parallel huffman decoding on GPUs. In: Proceedings of the 47th International Conference on Parallel Processing. Association for Computing Machinery (2018)
Google Scholar
Xu, L., Zhang, J., Cheng, X., Zhang, F., Wei, X., Ren, J.: Efficient deep image denoising via class specific convolution. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 4, pp. 3039–3046 (2021)
Google Scholar
Yamamoto, N., Nakano, K., Ito, Y., Takafuji, D., Kasagi, A., Tabaru, T.: Huffman coding with gap arrays for GPU acceleration. In: Proceedings of the 49th International Conference on Parallel Processing. Association for Computing Machinery (2020)
Google Scholar
Zhou, S., Nie, D., Adeli, E., Yin, J., Lian, J., Shen, D.: High-resolution encoder-decoder networks for low-contrast medical image segmentation. IEEE Trans. Image Process. 29, 461–475 (2020)
Article MathSciNet MATH Google Scholar
Zhou, Y., Tuzel, O.: VoxelNet: end-to-end learning for point cloud based 3D object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)
Google Scholar

Download references

Acknowledgements

This work was supported by SNU-SK Hynix Solution Research Center (S3RC), and National Research Foundation of Korea (NRF) grant funded by Korea government (MSIT) (NRF-2020R1A2C3010663). The source code is available at https://github.com/SNU-ARC/L3.git.

Author information

Authors and Affiliations

Seoul National University, Seoul, Korea
Jonghyun Bae, Woohyeon Baek, Tae Jun Ham & Jae W. Lee

Authors

Jonghyun Bae
View author publications
You can also search for this author in PubMed Google Scholar
Woohyeon Baek
View author publications
You can also search for this author in PubMed Google Scholar
Tae Jun Ham
View author publications
You can also search for this author in PubMed Google Scholar
Jae W. Lee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jae W. Lee .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 315 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bae, J., Baek, W., Ham, T.J., Lee, J.W. (2022). L3: Accelerator-Friendly Lossless Image Format for High-Resolution, High-Throughput DNN Training. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13671. Springer, Cham. https://doi.org/10.1007/978-3-031-20083-0_11

Download citation

DOI: https://doi.org/10.1007/978-3-031-20083-0_11
Published: 03 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20082-3
Online ISBN: 978-3-031-20083-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

L3: Accelerator-Friendly Lossless Image Format for High-Resolution, High-Throughput DNN Training