Skip to main content

An Efficient Training Framework for Reversible Neural Architectures

  • Conference paper
  • First Online:
  • 3552 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12372))

Abstract

As machine learning models and dataset escalate in scales rapidly, the huge memory footprint impedes efficient training. Reversible operators can reduce memory consumption by discarding intermediate feature maps in forward computations and recover them via their inverse functions in the backward propagation. They save memory at the cost of computation overhead. However, current implementations of reversible layers mainly focus on saving memory usage with computation overhead neglected. In this work, we formulate the decision problem for reversible operators with training time as the objective function and memory usage as the constraint. By solving this problem, we can maximize the training throughput for reversible neural architectures. Our proposed framework fully automates this decision process, empowering researchers to develop and train reversible neural networks more efficiently.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://github.com/silvandeleemput/memcnn.

  2. 2.

    https://github.com/mapillary/inplace_abn.

  3. 3.

    https://github.com/lucidrains/reformer-pytorch.

References

  1. Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous systems (2015)

    Google Scholar 

  2. Blumberg, S.B., Tanno, R., Kokkinos, I., Alexander, D.C.: Deeper image quality transfer: training low-memory neural networks for 3D images. In: Frangi, A.F., Schnabel, J.A., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds.) MICCAI 2018. LNCS, vol. 11070, pp. 118–125. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00928-1_14

    Chapter  Google Scholar 

  3. Brügger, R., Baumgartner, C.F., Konukoglu, E.: A partially reversible U-Net for memory-efficient volumetric image segmentation. In: Shen, D., et al. (eds.) MICCAI 2019. LNCS, vol. 11766, pp. 429–437. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32248-9_48

    Chapter  Google Scholar 

  4. Buló, S.R., Porzi, L., Kontschieder, P.: In-place activated batchnorm for memory-optimized training of dnns. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5639–5647, June 2018. https://doi.org/10.1109/CVPR.2018.00591

  5. Chen, T.Q., Rubanova, Y., Bettencourt, J., Duvenaud, D.K.: Neural ordinary differential equations. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 31, pp. 6571–6583. Curran Associates Inc., Red Hook (2018)

    Google Scholar 

  6. Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training deep nets with sublinear memory cost (2016)

    Google Scholar 

  7. Gomez, A.N., Ren, M., Urtasun, R., Grosse, R.B.: The reversible residual network: backpropagation without storing activations. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 2211–2221. NIPS’17, Curran Associates Inc., Red Hook, NY, USA (2017)

    Google Scholar 

  8. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd International Conference on International Conference on Machine Learning, Vol. 37, pp. 448–456. ICML’15, JMLR.org (2015)

    Google Scholar 

  9. Jacobsen, J.H., Smeulders, A.W., Oyallon, E.: i-revnet: deep invertible networks. In: International Conference on Learning Representations (2018)

    Google Scholar 

  10. Jain, P., et al.: Breaking the memory wall with optimal tensor rematerialization. Proc. Mach. Learn. Syst. 2020, 497–511 (2020)

    Google Scholar 

  11. Jia, Z., Lin, S., Qi, C.R., Aiken, A.: Exploring hidden dimensions in accelerating convolutional neural networks. In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, vol. 80, pp. 2274–2283. Proceedings of Machine Learning Research (PMLR), Stockholm Sweden (2018)

    Google Scholar 

  12. Kingma, D.P., Dhariwal, P.: Glow: generative flow with invertible 1x1 convolutions. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 31, pp. 10215–10224. Curran Associates Inc., Red Hook (2018)

    Google Scholar 

  13. Kitaev, N., Kaiser, L., Levskaya, A.: Reformer: the efficient transformer (2020)

    Google Scholar 

  14. Kusumoto, M., Inoue, T., Watanabe, G., Akiba, T., Koyama, M.: A graph theoretic framework of recomputation algorithms for memory-efficient backpropagation. In: Advances in Neural Information Processing Systems, vol. 32, pp. 1161–1170. Curran Associates, Inc. (2019)

    Google Scholar 

  15. Leemput, S.C., Teuwen, J., Ginneken, B.V., Manniesing, R.: Memcnn: a python/pytorch package for creating memory-efficient invertible neural networks. J. Open Source Softw. 4(39), 1576 (2019). https://doi.org/10.21105/joss.01576

  16. MacKay, M., Vicol, P., Ba, J., Grosse, R.: Reversible recurrent neural networks. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp. 9043–9054. NIPS’18, Curran Associates Inc., Red Hook, NY, USA (2018)

    Google Scholar 

  17. Martello, S., Toth, P.: Knapsack Problems: Algorithms and Computer Implementations. John Wiley & Sons Inc, USA (1990)

    MATH  Google Scholar 

  18. Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems, vol. 32, pp. 8024–8035. Curran Associates, Inc. (2019)

    Google Scholar 

  19. Rhu, M., Gimelshein, N., Clemons, J., Zulfiqar, A., Keckler, S.W.: Vdnn: virtualized deep neural networks for scalable, memory-efficient neural network design. In: The 49th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO-49, IEEE Press (2016)

    Google Scholar 

  20. Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Neurocomputing: foundations of research. Nature, pp. 696–699 (1988)

    Google Scholar 

  21. Sohoni, N.S., Aberger, C.R., Leszczynski, M., Zhang, J., Ré, C.: Low-memory neural network training: a technical report (2019)

    Google Scholar 

  22. Vaswani, A., et al.: Attention is all you need. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems, vol. 30, pp. 5998–6008. Curran Associates Inc., Red Hook (2017)

    Google Scholar 

  23. Xie, S., Girshick, R., Dollar, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jul 2017. https://doi.org/10.1109/cvpr.2017.634

  24. Zhang, J., Yeung, S.H., Shu, Y., He, B., Wang, W.: Efficient memory management for gpu-based deep learning systems (2019)

    Google Scholar 

  25. Zhu, J., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2242–2251, Oct 2017. https://doi.org/10.1109/ICCV.2017.244

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zixuan Jiang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jiang, Z., Zhu, K., Liu, M., Gu, J., Pan, D.Z. (2020). An Efficient Training Framework for Reversible Neural Architectures. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12372. Springer, Cham. https://doi.org/10.1007/978-3-030-58583-9_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-58583-9_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58582-2

  • Online ISBN: 978-3-030-58583-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics