poster

Transparent GPU memory management for DNNs

Authors:

Jungho Park,

Hyungmin Cho,

Wookeun Jung,

Jaejin LeeAuthors Info & Claims

PPoPP '18: Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

Pages 411 - 412

https://doi.org/10.1145/3178487.3178531

Published: 10 February 2018 Publication History

Get Access

Abstract

Modern DNN frameworks exploit GPU acceleration by default to achieve high performance. The limitation of GPU memory capacity becomes a serious problem because DNNs are becoming deeper and larger. This paper proposes a purely software-based transparent solution, called tvDNN, to the GPU memory capacity problem. It is based on GPU memory swapping and memory object sectioning techniques. It also provides an efficient memory-object swapping schedule based on ILP (optimal) and heuristics (suboptimal). The experimental results show that tvDNN enables Caffe to build VGG-16 with a large batch size, such as 256 or 512, using a few GB of GPU memory without significant performance degradation.

References

[1]

M. Abadi, A. Agarwal, P. Barham, and et al. 2016. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. (2016). arXiv:1603.04467

Google Scholar

[2]

Yoshua Bengio. 2012. Practical recommendations for gradient-based training of deep architectures. In Neural networks: Tricks of the trade. Springer, 437--478.

Digital Library

Google Scholar

[3]

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. 2009. ImageNet: A Large-Scale Hierarchical Image Database. In CVPR09.

Google Scholar

[4]

Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional Architecture for Fast Feature Embedding. arXiv preprint arXiv:1408.5093 (2014).

Google Scholar

[5]

M. Rhu, N. Gimelshein, J. Clemons, A. Zulfiqar, and S. W. Keckler. 2016. vDNN: Virtualized deep neural networks for scalable, memory-efficient neural network design. In 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 1--13.

Digital Library

Google Scholar

[6]

K. Simonyan and A. Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. (2014). arXiv:1409.1556

Google Scholar

Cited By

View all

Yin MXu XZhang TYe C(2021)Performance Evaluation Model for Matrix Calculation on GPUInternational Journal of Pattern Recognition and Artificial Intelligence10.1142/S021800142154030635:15Online publication date: 15-Oct-2021
https://doi.org/10.1142/S0218001421540306
Kwon YRhu MOskin MInoue K(2018)Beyond the memory wallProceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2018.00021(148-161)Online publication date: 20-Oct-2018
https://dl.acm.org/doi/10.1109/MICRO.2018.00021
Awan AChu CSubramoni HLu XPanda D(2018)OC-DNN: Exploiting Advanced Unified Memory Capabilities in CUDA 9 and Volta GPUs for Out-of-Core DNN Training2018 IEEE 25th International Conference on High Performance Computing (HiPC)10.1109/HiPC.2018.00024(143-152)Online publication date: Dec-2018
https://doi.org/10.1109/HiPC.2018.00024

Index Terms

Transparent GPU memory management for DNNs
1. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Memory management
        Allocation / deallocation strategies
        Virtual memory

Recommendations

GMLake: Efficient and Transparent GPU Memory Defragmentation for Large-scale DNN Training with Virtual Memory Stitching
ASPLOS '24: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2

Large-scale deep neural networks (DNNs), such as large language models (LLMs), have revolutionized the artificial intelligence (AI) field and become increasingly popular. However, training or fine-tuning such models requires substantial computational ...
Transparent GPU memory management for DNNs
PPoPP '18

Modern DNN frameworks exploit GPU acceleration by default to achieve high performance. The limitation of GPU memory capacity becomes a serious problem because DNNs are becoming deeper and larger. This paper proposes a purely software-based transparent ...
Enabling Hybrid PCM Memory System with Inherent Memory Management
RACS '16: Proceedings of the International Conference on Research in Adaptive and Convergent Systems

Replacing the traditional volatile main memory, e.g., DRAM, with a non-volatile phase change memory (PCM) has become a possible solution to reduce the energy consumption of computing systems. To further reduce the bit cost of PCM, the development trend ...

Comments

Information & Contributors

Information

Published In

PPoPP '18: Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

February 2018

442 pages

ISBN:9781450349826

DOI:10.1145/3178487

General Chair:
Andreas Krall
Vienna University of Technology, Austria
,
Program Chair:
Thomas R. Gross
ETH Zürich, Switzerland

ACM SIGPLAN Notices Volume 53, Issue 1
PPoPP '18
January 2018
426 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/3200691
Editor:
Matthew Fluet
Rodchester Institude of Technology
Issue’s Table of Contents

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 February 2018

Check for updates

Author Tags

Qualifiers

Poster

Funding Sources

National Research Foundation of Korea

Conference

PPoPP '18

Sponsor:

PPoPP '18: 23nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

February 24 - 28, 2018

Vienna, Austria

Acceptance Rates

Overall Acceptance Rate 230 of 1,014 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
648
Total Downloads

Downloads (Last 12 months)13
Downloads (Last 6 weeks)1

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Yin MXu XZhang TYe C(2021)Performance Evaluation Model for Matrix Calculation on GPUInternational Journal of Pattern Recognition and Artificial Intelligence10.1142/S021800142154030635:15Online publication date: 15-Oct-2021
https://doi.org/10.1142/S0218001421540306
Kwon YRhu MOskin MInoue K(2018)Beyond the memory wallProceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2018.00021(148-161)Online publication date: 20-Oct-2018
https://dl.acm.org/doi/10.1109/MICRO.2018.00021
Awan AChu CSubramoni HLu XPanda D(2018)OC-DNN: Exploiting Advanced Unified Memory Capabilities in CUDA 9 and Volta GPUs for Out-of-Core DNN Training2018 IEEE 25th International Conference on High Performance Computing (HiPC)10.1109/HiPC.2018.00024(143-152)Online publication date: Dec-2018
https://doi.org/10.1109/HiPC.2018.00024

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Cited By

Index Terms

Recommendations

GMLake: Efficient and Transparent GPU Memory Defragmentation for Large-scale DNN Training with Virtual Memory Stitching