Skip to main content

Saving Memory Space in Deep Neural Networks by Recomputing: A Survey

  • Conference paper
  • First Online:
Reversible Computation (RC 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13960))

Included in the following conference series:

  • 246 Accesses

Abstract

Training a multilayered neural network involves execution of the network on the training data, followed by calculating the error between the predicted and actual output, and then performing backpropagation to update the network’s weights in order to minimise the overall error. This process is repeated many times, with the network updating its weights until it produces the desired output with a satisfactory level of accuracy. It requires storage in memory of activation and gradient data for each layer during each training run of the network. This paper surveys the main approaches to recomputing the needed activation and gradient data instead of storing it in memory. We discuss how these approaches relate to reversible computation techniques.

I. Ulidowski has been partially supported by JSPS Fellowship grant S21050.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Detailed explanation how the algorithm works is given in [8].

  2. 2.

    This operation, which is depicted in Fig. 3 and given in (5), is a cross-correlation operation, although is commonly called convolution or convolution layer transformation. The real convolution operation is denoted by \(*\) and is defined in (8) below.

References

  1. Baydin, A.G., Pearlmutter, B.A., Radul, A.A., Siskind, J.M.: Automatic differentiation in machine learning: a survey. J. Mach. Learn. Res. 18(153), 1–43 (2017)

    MathSciNet  MATH  Google Scholar 

  2. Behrmann, J., Duvenaud, D., Jacobsen, J.: Invertible residual networks. CoRR abs/1811.00995 (2018)

    Google Scholar 

  3. Behrmann, J., Vicol, P., Wang, K., Grosse, R.B., Jacobsen, J.: Understanding and mitigating exploding inverses in invertible neural networks. In: AISTATS 2021. PMLR, vol. 130, pp. 1792–1800 (2021)

    Google Scholar 

  4. Bennett, C.H.: Logical reversibility of computation. IBM J. Res. Dev. 17(6), 525–532 (1973). https://doi.org/10.1147/rd.176.0525

    Article  MathSciNet  MATH  Google Scholar 

  5. Brown, T.B., et al.: Language models are few-shot learners. In: NeurIPS 2020, Proceedings (2020)

    Google Scholar 

  6. Chang, B., Meng, L., Haber, E., Ruthotto, L., Begert, D., Holtham, E.: Reversible architectures for arbitrarily deep residual neural networks. In: AAAI-18. IAAI-18, EAAI-18, Proceedings, pp. 2811–2818. AAAI Press, Washington (2018)

    Google Scholar 

  7. Demaine, E.D., Lynch, J., Mirano, G.J., Tyagi, N.: Energy-efficient algorithms. In: ITCS 2016, Proceedings, pp. 321–332. ACM, New York (2016)

    Google Scholar 

  8. Demaine, E.D., Lynch, J., Sun, J.: An efficient reversible algorithm for linear regression. In: ICRC 2021, Proceedings, pp. 103–108, IEEE, Washington (2021)

    Google Scholar 

  9. Dinh, L., Krueger, D., Bengio, Y.: NICE: non-linear independent components estimation. In: Bengio, Y., LeCun, Y. (eds.) ICLR 2015, Workshop Track Proceedings (2015)

    Google Scholar 

  10. Dinh, L., Sohl-Dickstein, J., Bengio, S.: Density estimation using real NVP. In: ICLR 2017, Proceedings. OpenReview.net (2017)

    Google Scholar 

  11. Frank, M.P.: Reversibility for efficient computing, Ph. D. thesis, MIT (1999)

    Google Scholar 

  12. Frank, M.P.: Introduction to reversible computing: motivation, progress, and challenges. In: Bagherzadeh, N., Valero, M., Ramírez, A. (eds.) Computing Frontiers 2005, Proceedings, pp. 385–390. ACM (2005)

    Google Scholar 

  13. Gander, W.: Algorithms for the QR decomposition (2003)

    Google Scholar 

  14. García-Martín, E., Rodrigues, C.F., Riley, G.D., Grahn, H.: Estimation of energy consumption in machine learning. J. Parallel Distributed Comput. 134, 75–88 (2019). https://doi.org/10.1016/j.jpdc.2019.07.007

    Article  Google Scholar 

  15. Goel, A., Tung, C., Lu, Y., Thiruvathukal, G.K.: A survey of methods for low-power deep learning and computer vision. In: WF-IoT 2020, Proceedings, pp. 1–6. IEEE, New Orleans (2020). https://doi.org/10.1109/WF-IoT48130.2020.9221198

  16. Gomez, A.N., Ren, M., Urtasun, R., Grosse, R.B.: The reversible residual network: backpropagation without storing activations. In: NIPS 2017, Proceedings, pp. 2214–2224 (2017)

    Google Scholar 

  17. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR 2016, Proceedings, pp. 770–778. IEEE Computer Society (2016)

    Google Scholar 

  18. Hoey, J., Ulidowski, I.: Reversing an imperative concurrent programming language. Sci. Comput. Program. 223, 102873 (2022). https://doi.org/10.1016/j.scico.2022.102873

    Article  Google Scholar 

  19. Hoey, J., Ulidowski, I., Yuen, S.: Reversing parallel programs with blocks and procedures. In: EXPRESS/SOS 2018, Proceedings, EPTCS, vol. 276, pp. 69–86 (2018). https://doi.org/10.4204/EPTCS.276.7

  20. Hoogeboom, E., van den Berg, R., Welling, M.: Emerging convolutions for generative normalizing flows. In: Chaudhuri, K., Salakhutdinov, R. (eds.) ICML 2019, Proceedings. PMLR, vol. 97, pp. 2771–2780 (2019)

    Google Scholar 

  21. Imakura, A., Yamamoto, Y.: Efficient implementations of the modified Gram-Schmidt orthogonalization with a non-standard inner product. CoRR (2017)

    Google Scholar 

  22. Jacobsen, J., Smeulders, A.W.M., Oyallon, E.: i-RevNet: deep invertible networks. CoRR abs/1802.07088 (2018). http://arxiv.org/abs/1802.07088

  23. Kaelbling, L.: Introduction to Machine Learning. Course Notes, MIT Open Learning Library (2020)

    Google Scholar 

  24. Krizhevsky, A., Nair, V., Hinton, G.: The CIFAR-10 dataset. https://www.cs.toronto.edu/~kriz/cifar.html (2009)

  25. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)

    Article  Google Scholar 

  26. Morgan, T.P.: Counting the cost of training large language models. https://www.nextplatform.com/2022/12/01/counting-the-cost-of-training-large-language-models/ (2022)

  27. Nagar, S., Dufraisse, M., Varma, G.: CInC flow: Characterizable invertible 3 \(\times \) 3 convolution. CoRR abs/2107.01358 (2021)

    Google Scholar 

  28. Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: ICML-10, Proceedings, pp. 807–814. Omnipress (2010)

    Google Scholar 

  29. Image classification on ImageNet. Papers with Code. https://paperswithcode.com/sota/image-classification-on-imagenet (2022)

  30. Radford, A., Narasimhan, K., Salimans, T., Sutskever, I.: Improving language understanding by generative pre-training (1998)

    Google Scholar 

  31. Rambhatla, S.S., Jones, M., Chellappa, R.: To boost or not to boost: on the limits of boosted neural networks. CoRR abs/2107.13600 (2021)

    Google Scholar 

  32. Rumelhart, D., Hinton, G., Williams, R.: Learning representations by back-propagating errors. Nature 323, 533–536 (1986)

    Article  MATH  Google Scholar 

  33. Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vision 115(3), 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y

    Article  MathSciNet  Google Scholar 

  34. Schordan, M., Oppelstrup, T., Thomsen, M.K., Glück, R.: Reversible languages and incremental state saving in optimistic parallel discrete event simulation. In: Ulidowski, I., Lanese, I., Schultz, U.P., Ferreira, C. (eds.) RC 2020. LNCS, vol. 12070, pp. 187–207. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-47361-7_9

    Chapter  Google Scholar 

  35. Srivastava, R.K., Greff, K., Schmidhuber, J.: Highway networks (2015). https://arxiv.org/abs/1505.00387

  36. Convolutional neural networks for visual recognition. CS231n Course Notes. https://cs231n.github.io/. Stanford University (2023)

  37. Vaswani, A., et al.: Attention is all you need. In: NIPS2017, Proceedings, pp. 5998–6008 (2017)

    Google Scholar 

  38. Wu, Z., et al.: Application of image retrieval based on convolutional neural networks and hu invariant moment algorithm in computer telecommunications. Comput. Commun. 150, 729–738 (2020)

    Article  Google Scholar 

  39. Yokoyama, T., Glück, R.: A reversible programming language and its invertible self-interpreter. In: PESPM 2007, pp. 144–153. ACM (2007). https://doi.org/10.1145/1244381.1244404

  40. Zhang, A., Lipton, Z.C., Li, M., Smola, A.J.: Dive into deep learning. CoRR abs/2106.11342 (2021)

    Google Scholar 

  41. Zhao, Y., Zhou, S., Zhang, Z.: Multi-split reversible transformers can enhance neural machine translation. In: EACL 2021, Proceedings, pp. 244–254. Association for Computational Linguistics (2021). https://doi.org/10.18653/v1/2021.eacl-main.19

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Irek Ulidowski .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ulidowski, I. (2023). Saving Memory Space in Deep Neural Networks by Recomputing: A Survey. In: Kutrib, M., Meyer, U. (eds) Reversible Computation. RC 2023. Lecture Notes in Computer Science, vol 13960. Springer, Cham. https://doi.org/10.1007/978-3-031-38100-3_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-38100-3_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-38099-0

  • Online ISBN: 978-3-031-38100-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics