Saving Memory Space in Deep Neural Networks by Recomputing: A Survey

Ulidowski, Irek

doi:10.1007/978-3-031-38100-3_7

Irek Ulidowski ORCID: orcid.org/0000-0002-3834-2036^9,10

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13960))

Included in the following conference series:

International Conference on Reversible Computation

246 Accesses

Abstract

Training a multilayered neural network involves execution of the network on the training data, followed by calculating the error between the predicted and actual output, and then performing backpropagation to update the network’s weights in order to minimise the overall error. This process is repeated many times, with the network updating its weights until it produces the desired output with a satisfactory level of accuracy. It requires storage in memory of activation and gradient data for each layer during each training run of the network. This paper surveys the main approaches to recomputing the needed activation and gradient data instead of storing it in memory. We discuss how these approaches relate to reversible computation techniques.

I. Ulidowski has been partially supported by JSPS Fellowship grant S21050.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Detailed explanation how the algorithm works is given in [8].
2.
This operation, which is depicted in Fig. 3 and given in (5), is a cross-correlation operation, although is commonly called convolution or convolution layer transformation. The real convolution operation is denoted by \(*\) and is defined in (8) below.

References

Baydin, A.G., Pearlmutter, B.A., Radul, A.A., Siskind, J.M.: Automatic differentiation in machine learning: a survey. J. Mach. Learn. Res. 18(153), 1–43 (2017)
MathSciNet MATH Google Scholar
Behrmann, J., Duvenaud, D., Jacobsen, J.: Invertible residual networks. CoRR abs/1811.00995 (2018)
Google Scholar
Behrmann, J., Vicol, P., Wang, K., Grosse, R.B., Jacobsen, J.: Understanding and mitigating exploding inverses in invertible neural networks. In: AISTATS 2021. PMLR, vol. 130, pp. 1792–1800 (2021)
Google Scholar
Bennett, C.H.: Logical reversibility of computation. IBM J. Res. Dev. 17(6), 525–532 (1973). https://doi.org/10.1147/rd.176.0525
Article MathSciNet MATH Google Scholar
Brown, T.B., et al.: Language models are few-shot learners. In: NeurIPS 2020, Proceedings (2020)
Google Scholar
Chang, B., Meng, L., Haber, E., Ruthotto, L., Begert, D., Holtham, E.: Reversible architectures for arbitrarily deep residual neural networks. In: AAAI-18. IAAI-18, EAAI-18, Proceedings, pp. 2811–2818. AAAI Press, Washington (2018)
Google Scholar
Demaine, E.D., Lynch, J., Mirano, G.J., Tyagi, N.: Energy-efficient algorithms. In: ITCS 2016, Proceedings, pp. 321–332. ACM, New York (2016)
Google Scholar
Demaine, E.D., Lynch, J., Sun, J.: An efficient reversible algorithm for linear regression. In: ICRC 2021, Proceedings, pp. 103–108, IEEE, Washington (2021)
Google Scholar
Dinh, L., Krueger, D., Bengio, Y.: NICE: non-linear independent components estimation. In: Bengio, Y., LeCun, Y. (eds.) ICLR 2015, Workshop Track Proceedings (2015)
Google Scholar
Dinh, L., Sohl-Dickstein, J., Bengio, S.: Density estimation using real NVP. In: ICLR 2017, Proceedings. OpenReview.net (2017)
Google Scholar
Frank, M.P.: Reversibility for efficient computing, Ph. D. thesis, MIT (1999)
Google Scholar
Frank, M.P.: Introduction to reversible computing: motivation, progress, and challenges. In: Bagherzadeh, N., Valero, M., Ramírez, A. (eds.) Computing Frontiers 2005, Proceedings, pp. 385–390. ACM (2005)
Google Scholar
Gander, W.: Algorithms for the QR decomposition (2003)
Google Scholar
García-Martín, E., Rodrigues, C.F., Riley, G.D., Grahn, H.: Estimation of energy consumption in machine learning. J. Parallel Distributed Comput. 134, 75–88 (2019). https://doi.org/10.1016/j.jpdc.2019.07.007
Article Google Scholar
Goel, A., Tung, C., Lu, Y., Thiruvathukal, G.K.: A survey of methods for low-power deep learning and computer vision. In: WF-IoT 2020, Proceedings, pp. 1–6. IEEE, New Orleans (2020). https://doi.org/10.1109/WF-IoT48130.2020.9221198
Gomez, A.N., Ren, M., Urtasun, R., Grosse, R.B.: The reversible residual network: backpropagation without storing activations. In: NIPS 2017, Proceedings, pp. 2214–2224 (2017)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR 2016, Proceedings, pp. 770–778. IEEE Computer Society (2016)
Google Scholar
Hoey, J., Ulidowski, I.: Reversing an imperative concurrent programming language. Sci. Comput. Program. 223, 102873 (2022). https://doi.org/10.1016/j.scico.2022.102873
Article Google Scholar
Hoey, J., Ulidowski, I., Yuen, S.: Reversing parallel programs with blocks and procedures. In: EXPRESS/SOS 2018, Proceedings, EPTCS, vol. 276, pp. 69–86 (2018). https://doi.org/10.4204/EPTCS.276.7
Hoogeboom, E., van den Berg, R., Welling, M.: Emerging convolutions for generative normalizing flows. In: Chaudhuri, K., Salakhutdinov, R. (eds.) ICML 2019, Proceedings. PMLR, vol. 97, pp. 2771–2780 (2019)
Google Scholar
Imakura, A., Yamamoto, Y.: Efficient implementations of the modified Gram-Schmidt orthogonalization with a non-standard inner product. CoRR (2017)
Google Scholar
Jacobsen, J., Smeulders, A.W.M., Oyallon, E.: i-RevNet: deep invertible networks. CoRR abs/1802.07088 (2018). http://arxiv.org/abs/1802.07088
Kaelbling, L.: Introduction to Machine Learning. Course Notes, MIT Open Learning Library (2020)
Google Scholar
Krizhevsky, A., Nair, V., Hinton, G.: The CIFAR-10 dataset. https://www.cs.toronto.edu/~kriz/cifar.html (2009)
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Morgan, T.P.: Counting the cost of training large language models. https://www.nextplatform.com/2022/12/01/counting-the-cost-of-training-large-language-models/ (2022)
Nagar, S., Dufraisse, M., Varma, G.: CInC flow: Characterizable invertible 3 \(\times \) 3 convolution. CoRR abs/2107.01358 (2021)
Google Scholar
Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: ICML-10, Proceedings, pp. 807–814. Omnipress (2010)
Google Scholar
Image classification on ImageNet. Papers with Code. https://paperswithcode.com/sota/image-classification-on-imagenet (2022)
Radford, A., Narasimhan, K., Salimans, T., Sutskever, I.: Improving language understanding by generative pre-training (1998)
Google Scholar
Rambhatla, S.S., Jones, M., Chellappa, R.: To boost or not to boost: on the limits of boosted neural networks. CoRR abs/2107.13600 (2021)
Google Scholar
Rumelhart, D., Hinton, G., Williams, R.: Learning representations by back-propagating errors. Nature 323, 533–536 (1986)
Article MATH Google Scholar
Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vision 115(3), 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y
Article MathSciNet Google Scholar
Schordan, M., Oppelstrup, T., Thomsen, M.K., Glück, R.: Reversible languages and incremental state saving in optimistic parallel discrete event simulation. In: Ulidowski, I., Lanese, I., Schultz, U.P., Ferreira, C. (eds.) RC 2020. LNCS, vol. 12070, pp. 187–207. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-47361-7_9
Chapter Google Scholar
Srivastava, R.K., Greff, K., Schmidhuber, J.: Highway networks (2015). https://arxiv.org/abs/1505.00387
Convolutional neural networks for visual recognition. CS231n Course Notes. https://cs231n.github.io/. Stanford University (2023)
Vaswani, A., et al.: Attention is all you need. In: NIPS2017, Proceedings, pp. 5998–6008 (2017)
Google Scholar
Wu, Z., et al.: Application of image retrieval based on convolutional neural networks and hu invariant moment algorithm in computer telecommunications. Comput. Commun. 150, 729–738 (2020)
Article Google Scholar
Yokoyama, T., Glück, R.: A reversible programming language and its invertible self-interpreter. In: PESPM 2007, pp. 144–153. ACM (2007). https://doi.org/10.1145/1244381.1244404
Zhang, A., Lipton, Z.C., Li, M., Smola, A.J.: Dive into deep learning. CoRR abs/2106.11342 (2021)
Google Scholar
Zhao, Y., Zhou, S., Zhang, Z.: Multi-split reversible transformers can enhance neural machine translation. In: EACL 2021, Proceedings, pp. 244–254. Association for Computational Linguistics (2021). https://doi.org/10.18653/v1/2021.eacl-main.19

Download references

Author information

Authors and Affiliations

School of Computing and Mathematical Sciences, University of Leicester, Leicester, UK
Irek Ulidowski
Department of Applied Informatics, AGH University of Science and Technology, Kraków, Poland
Irek Ulidowski

Authors

Irek Ulidowski
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Irek Ulidowski .

Editor information

Editors and Affiliations

Universität Giessen, Giessen, Germany
Martin Kutrib
Technische Hochschule Mittelhessen, Giessen, Germany
Uwe Meyer

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ulidowski, I. (2023). Saving Memory Space in Deep Neural Networks by Recomputing: A Survey. In: Kutrib, M., Meyer, U. (eds) Reversible Computation. RC 2023. Lecture Notes in Computer Science, vol 13960. Springer, Cham. https://doi.org/10.1007/978-3-031-38100-3_7

Download citation

DOI: https://doi.org/10.1007/978-3-031-38100-3_7
Published: 12 July 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-38099-0
Online ISBN: 978-3-031-38100-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Saving Memory Space in Deep Neural Networks by Recomputing: A Survey