skip to main content
10.1145/3178487.3178528acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
poster

Layrub: layer-centric GPU memory reuse and data migration in extreme-scale deep learning systems

Published: 10 February 2018 Publication History

Abstract

Growing accuracy and robustness of Deep Neural Networks (DNN) models are accompanied by growing model capacity (going deeper or wider). However, high memory requirements of those models make it difficult to execute the training process in one GPU. To address it, we first identify the memory usage characteristics for deep and wide convolutional networks, and demonstrate the opportunities of memory reuse on both intra-layer and inter-layer levels. We then present Layrub, a runtime data placement strategy that orchestrates the execution of training process. It achieves layer-centric reuse to reduce memory consumption for extreme-scale deep learning that cannot be run on one single GPU.

References

[1]
Tianqi Chen, Bing Xu, Chiyuan Zhang, and Carlos Guestrin. 2016. Training deep nets with sublinear memory cost. arXiv preprint arXiv:1604.06174 (2016).
[2]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the 29th IEEE Conference on Computer Vision and Pattern Recognition (CVPR'16). IEEE, Las Vegas, NV, USA, 770--778.
[3]
Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM International Conference on Multimedia (MM'14). ACM, Orlando, Florida, USA, 675--678.
[4]
Sergey Zagoruyko and Nikos Komodakis. 2016. Wide residual networks. arXiv preprint arXiv:1605.07146 (2016).

Cited By

View all
  • (2021)A novel memory-efficient deep learning training framework via error-bounded lossy compressionProceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3437801.3441597(485-487)Online publication date: 17-Feb-2021
  • (2021)Bridging the Gap Between Memory and Communication Efficiency on Distributed Deep Learning SystemsIEEE Access10.1109/ACCESS.2021.30715799(57075-57088)Online publication date: 2021
  • (2024)Design and Implementation of Massive Data Migration System Based on Object StorageProceedings of the 3rd International Conference on Cognitive Based Information Processing and Applications–Volume 110.1007/978-981-97-1975-4_28(305-315)Online publication date: 2-Jun-2024
  • Show More Cited By

Index Terms

  1. Layrub: layer-centric GPU memory reuse and data migration in extreme-scale deep learning systems

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      PPoPP '18: Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
      February 2018
      442 pages
      ISBN:9781450349826
      DOI:10.1145/3178487
      • cover image ACM SIGPLAN Notices
        ACM SIGPLAN Notices  Volume 53, Issue 1
        PPoPP '18
        January 2018
        426 pages
        ISSN:0362-1340
        EISSN:1558-1160
        DOI:10.1145/3200691
        Issue’s Table of Contents
      Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 10 February 2018

      Check for updates

      Author Tags

      1. DNN
      2. GPU
      3. data placement
      4. memory efficiency

      Qualifiers

      • Poster

      Funding Sources

      • National Natural Science Foundation of China

      Conference

      PPoPP '18

      Acceptance Rates

      Overall Acceptance Rate 230 of 1,014 submissions, 23%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)8
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 03 Mar 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2021)A novel memory-efficient deep learning training framework via error-bounded lossy compressionProceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3437801.3441597(485-487)Online publication date: 17-Feb-2021
      • (2021)Bridging the Gap Between Memory and Communication Efficiency on Distributed Deep Learning SystemsIEEE Access10.1109/ACCESS.2021.30715799(57075-57088)Online publication date: 2021
      • (2024)Design and Implementation of Massive Data Migration System Based on Object StorageProceedings of the 3rd International Conference on Cognitive Based Information Processing and Applications–Volume 110.1007/978-981-97-1975-4_28(305-315)Online publication date: 2-Jun-2024
      • (2021)A novel memory-efficient deep learning training framework via error-bounded lossy compressionProceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3437801.3441597(485-487)Online publication date: 17-Feb-2021

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media