An Efficient Training Framework for Reversible Neural Architectures

Jiang, Zixuan; Zhu, Keren; Liu, Mingjie; Gu, Jiaqi; Pan, David Z.

doi:10.1007/978-3-030-58583-9_17

An Efficient Training Framework for Reversible Neural Architectures

Conference paper
First Online: 19 November 2020

3552 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12372))

Abstract

As machine learning models and dataset escalate in scales rapidly, the huge memory footprint impedes efficient training. Reversible operators can reduce memory consumption by discarding intermediate feature maps in forward computations and recover them via their inverse functions in the backward propagation. They save memory at the cost of computation overhead. However, current implementations of reversible layers mainly focus on saving memory usage with computation overhead neglected. In this work, we formulate the decision problem for reversible operators with training time as the objective function and memory usage as the constraint. By solving this problem, we can maximize the training throughput for reversible neural architectures. Our proposed framework fully automates this decision process, empowering researchers to develop and train reversible neural networks more efficiently.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

References

Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous systems (2015)
Google Scholar
Blumberg, S.B., Tanno, R., Kokkinos, I., Alexander, D.C.: Deeper image quality transfer: training low-memory neural networks for 3D images. In: Frangi, A.F., Schnabel, J.A., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds.) MICCAI 2018. LNCS, vol. 11070, pp. 118–125. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00928-1_14
Chapter Google Scholar
Brügger, R., Baumgartner, C.F., Konukoglu, E.: A partially reversible U-Net for memory-efficient volumetric image segmentation. In: Shen, D., et al. (eds.) MICCAI 2019. LNCS, vol. 11766, pp. 429–437. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32248-9_48
Chapter Google Scholar
Buló, S.R., Porzi, L., Kontschieder, P.: In-place activated batchnorm for memory-optimized training of dnns. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5639–5647, June 2018. https://doi.org/10.1109/CVPR.2018.00591
Chen, T.Q., Rubanova, Y., Bettencourt, J., Duvenaud, D.K.: Neural ordinary differential equations. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 31, pp. 6571–6583. Curran Associates Inc., Red Hook (2018)
Google Scholar
Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training deep nets with sublinear memory cost (2016)
Google Scholar
Gomez, A.N., Ren, M., Urtasun, R., Grosse, R.B.: The reversible residual network: backpropagation without storing activations. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 2211–2221. NIPS’17, Curran Associates Inc., Red Hook, NY, USA (2017)
Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd International Conference on International Conference on Machine Learning, Vol. 37, pp. 448–456. ICML’15, JMLR.org (2015)
Google Scholar
Jacobsen, J.H., Smeulders, A.W., Oyallon, E.: i-revnet: deep invertible networks. In: International Conference on Learning Representations (2018)
Google Scholar
Jain, P., et al.: Breaking the memory wall with optimal tensor rematerialization. Proc. Mach. Learn. Syst. 2020, 497–511 (2020)
Google Scholar
Jia, Z., Lin, S., Qi, C.R., Aiken, A.: Exploring hidden dimensions in accelerating convolutional neural networks. In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, vol. 80, pp. 2274–2283. Proceedings of Machine Learning Research (PMLR), Stockholm Sweden (2018)
Google Scholar
Kingma, D.P., Dhariwal, P.: Glow: generative flow with invertible 1x1 convolutions. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 31, pp. 10215–10224. Curran Associates Inc., Red Hook (2018)
Google Scholar
Kitaev, N., Kaiser, L., Levskaya, A.: Reformer: the efficient transformer (2020)
Google Scholar
Kusumoto, M., Inoue, T., Watanabe, G., Akiba, T., Koyama, M.: A graph theoretic framework of recomputation algorithms for memory-efficient backpropagation. In: Advances in Neural Information Processing Systems, vol. 32, pp. 1161–1170. Curran Associates, Inc. (2019)
Google Scholar
Leemput, S.C., Teuwen, J., Ginneken, B.V., Manniesing, R.: Memcnn: a python/pytorch package for creating memory-efficient invertible neural networks. J. Open Source Softw. 4(39), 1576 (2019). https://doi.org/10.21105/joss.01576
MacKay, M., Vicol, P., Ba, J., Grosse, R.: Reversible recurrent neural networks. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp. 9043–9054. NIPS’18, Curran Associates Inc., Red Hook, NY, USA (2018)
Google Scholar
Martello, S., Toth, P.: Knapsack Problems: Algorithms and Computer Implementations. John Wiley & Sons Inc, USA (1990)
MATH Google Scholar
Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems, vol. 32, pp. 8024–8035. Curran Associates, Inc. (2019)
Google Scholar
Rhu, M., Gimelshein, N., Clemons, J., Zulfiqar, A., Keckler, S.W.: Vdnn: virtualized deep neural networks for scalable, memory-efficient neural network design. In: The 49th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO-49, IEEE Press (2016)
Google Scholar
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Neurocomputing: foundations of research. Nature, pp. 696–699 (1988)
Google Scholar
Sohoni, N.S., Aberger, C.R., Leszczynski, M., Zhang, J., Ré, C.: Low-memory neural network training: a technical report (2019)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems, vol. 30, pp. 5998–6008. Curran Associates Inc., Red Hook (2017)
Google Scholar
Xie, S., Girshick, R., Dollar, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jul 2017. https://doi.org/10.1109/cvpr.2017.634
Zhang, J., Yeung, S.H., Shu, Y., He, B., Wang, W.: Efficient memory management for gpu-based deep learning systems (2019)
Google Scholar
Zhu, J., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2242–2251, Oct 2017. https://doi.org/10.1109/ICCV.2017.244

Download references

Author information

Authors and Affiliations

The University of Texas at Austin, Austin Texas, 78712, USA
Zixuan Jiang, Keren Zhu, Mingjie Liu, Jiaqi Gu & David Z. Pan

Authors

Zixuan Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Keren Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Mingjie Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jiaqi Gu
View author publications
You can also search for this author in PubMed Google Scholar
David Z. Pan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zixuan Jiang .

Editor information

Editors and Affiliations

University of Oxford, Oxford, UK
Andrea Vedaldi
Graz University of Technology, Graz, Austria
Horst Bischof
University of Freiburg, Freiburg im Breisgau, Germany
Thomas Brox
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Jan-Michael Frahm

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jiang, Z., Zhu, K., Liu, M., Gu, J., Pan, D.Z. (2020). An Efficient Training Framework for Reversible Neural Architectures. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12372. Springer, Cham. https://doi.org/10.1007/978-3-030-58583-9_17

Download citation

DOI: https://doi.org/10.1007/978-3-030-58583-9_17
Published: 19 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58582-2
Online ISBN: 978-3-030-58583-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics