Abstract
Training a multilayered neural network involves execution of the network on the training data, followed by calculating the error between the predicted and actual output, and then performing backpropagation to update the network’s weights in order to minimise the overall error. This process is repeated many times, with the network updating its weights until it produces the desired output with a satisfactory level of accuracy. It requires storage in memory of activation and gradient data for each layer during each training run of the network. This paper surveys the main approaches to recomputing the needed activation and gradient data instead of storing it in memory. We discuss how these approaches relate to reversible computation techniques.
I. Ulidowski has been partially supported by JSPS Fellowship grant S21050.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Detailed explanation how the algorithm works is given in [8].
- 2.
References
Baydin, A.G., Pearlmutter, B.A., Radul, A.A., Siskind, J.M.: Automatic differentiation in machine learning: a survey. J. Mach. Learn. Res. 18(153), 1–43 (2017)
Behrmann, J., Duvenaud, D., Jacobsen, J.: Invertible residual networks. CoRR abs/1811.00995 (2018)
Behrmann, J., Vicol, P., Wang, K., Grosse, R.B., Jacobsen, J.: Understanding and mitigating exploding inverses in invertible neural networks. In: AISTATS 2021. PMLR, vol. 130, pp. 1792–1800 (2021)
Bennett, C.H.: Logical reversibility of computation. IBM J. Res. Dev. 17(6), 525–532 (1973). https://doi.org/10.1147/rd.176.0525
Brown, T.B., et al.: Language models are few-shot learners. In: NeurIPS 2020, Proceedings (2020)
Chang, B., Meng, L., Haber, E., Ruthotto, L., Begert, D., Holtham, E.: Reversible architectures for arbitrarily deep residual neural networks. In: AAAI-18. IAAI-18, EAAI-18, Proceedings, pp. 2811–2818. AAAI Press, Washington (2018)
Demaine, E.D., Lynch, J., Mirano, G.J., Tyagi, N.: Energy-efficient algorithms. In: ITCS 2016, Proceedings, pp. 321–332. ACM, New York (2016)
Demaine, E.D., Lynch, J., Sun, J.: An efficient reversible algorithm for linear regression. In: ICRC 2021, Proceedings, pp. 103–108, IEEE, Washington (2021)
Dinh, L., Krueger, D., Bengio, Y.: NICE: non-linear independent components estimation. In: Bengio, Y., LeCun, Y. (eds.) ICLR 2015, Workshop Track Proceedings (2015)
Dinh, L., Sohl-Dickstein, J., Bengio, S.: Density estimation using real NVP. In: ICLR 2017, Proceedings. OpenReview.net (2017)
Frank, M.P.: Reversibility for efficient computing, Ph. D. thesis, MIT (1999)
Frank, M.P.: Introduction to reversible computing: motivation, progress, and challenges. In: Bagherzadeh, N., Valero, M., Ramírez, A. (eds.) Computing Frontiers 2005, Proceedings, pp. 385–390. ACM (2005)
Gander, W.: Algorithms for the QR decomposition (2003)
García-Martín, E., Rodrigues, C.F., Riley, G.D., Grahn, H.: Estimation of energy consumption in machine learning. J. Parallel Distributed Comput. 134, 75–88 (2019). https://doi.org/10.1016/j.jpdc.2019.07.007
Goel, A., Tung, C., Lu, Y., Thiruvathukal, G.K.: A survey of methods for low-power deep learning and computer vision. In: WF-IoT 2020, Proceedings, pp. 1–6. IEEE, New Orleans (2020). https://doi.org/10.1109/WF-IoT48130.2020.9221198
Gomez, A.N., Ren, M., Urtasun, R., Grosse, R.B.: The reversible residual network: backpropagation without storing activations. In: NIPS 2017, Proceedings, pp. 2214–2224 (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR 2016, Proceedings, pp. 770–778. IEEE Computer Society (2016)
Hoey, J., Ulidowski, I.: Reversing an imperative concurrent programming language. Sci. Comput. Program. 223, 102873 (2022). https://doi.org/10.1016/j.scico.2022.102873
Hoey, J., Ulidowski, I., Yuen, S.: Reversing parallel programs with blocks and procedures. In: EXPRESS/SOS 2018, Proceedings, EPTCS, vol. 276, pp. 69–86 (2018). https://doi.org/10.4204/EPTCS.276.7
Hoogeboom, E., van den Berg, R., Welling, M.: Emerging convolutions for generative normalizing flows. In: Chaudhuri, K., Salakhutdinov, R. (eds.) ICML 2019, Proceedings. PMLR, vol. 97, pp. 2771–2780 (2019)
Imakura, A., Yamamoto, Y.: Efficient implementations of the modified Gram-Schmidt orthogonalization with a non-standard inner product. CoRR (2017)
Jacobsen, J., Smeulders, A.W.M., Oyallon, E.: i-RevNet: deep invertible networks. CoRR abs/1802.07088 (2018). http://arxiv.org/abs/1802.07088
Kaelbling, L.: Introduction to Machine Learning. Course Notes, MIT Open Learning Library (2020)
Krizhevsky, A., Nair, V., Hinton, G.: The CIFAR-10 dataset. https://www.cs.toronto.edu/~kriz/cifar.html (2009)
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Morgan, T.P.: Counting the cost of training large language models. https://www.nextplatform.com/2022/12/01/counting-the-cost-of-training-large-language-models/ (2022)
Nagar, S., Dufraisse, M., Varma, G.: CInC flow: Characterizable invertible 3 \(\times \) 3 convolution. CoRR abs/2107.01358 (2021)
Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: ICML-10, Proceedings, pp. 807–814. Omnipress (2010)
Image classification on ImageNet. Papers with Code. https://paperswithcode.com/sota/image-classification-on-imagenet (2022)
Radford, A., Narasimhan, K., Salimans, T., Sutskever, I.: Improving language understanding by generative pre-training (1998)
Rambhatla, S.S., Jones, M., Chellappa, R.: To boost or not to boost: on the limits of boosted neural networks. CoRR abs/2107.13600 (2021)
Rumelhart, D., Hinton, G., Williams, R.: Learning representations by back-propagating errors. Nature 323, 533–536 (1986)
Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vision 115(3), 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y
Schordan, M., Oppelstrup, T., Thomsen, M.K., Glück, R.: Reversible languages and incremental state saving in optimistic parallel discrete event simulation. In: Ulidowski, I., Lanese, I., Schultz, U.P., Ferreira, C. (eds.) RC 2020. LNCS, vol. 12070, pp. 187–207. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-47361-7_9
Srivastava, R.K., Greff, K., Schmidhuber, J.: Highway networks (2015). https://arxiv.org/abs/1505.00387
Convolutional neural networks for visual recognition. CS231n Course Notes. https://cs231n.github.io/. Stanford University (2023)
Vaswani, A., et al.: Attention is all you need. In: NIPS2017, Proceedings, pp. 5998–6008 (2017)
Wu, Z., et al.: Application of image retrieval based on convolutional neural networks and hu invariant moment algorithm in computer telecommunications. Comput. Commun. 150, 729–738 (2020)
Yokoyama, T., Glück, R.: A reversible programming language and its invertible self-interpreter. In: PESPM 2007, pp. 144–153. ACM (2007). https://doi.org/10.1145/1244381.1244404
Zhang, A., Lipton, Z.C., Li, M., Smola, A.J.: Dive into deep learning. CoRR abs/2106.11342 (2021)
Zhao, Y., Zhou, S., Zhang, Z.: Multi-split reversible transformers can enhance neural machine translation. In: EACL 2021, Proceedings, pp. 244–254. Association for Computational Linguistics (2021). https://doi.org/10.18653/v1/2021.eacl-main.19
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Ulidowski, I. (2023). Saving Memory Space in Deep Neural Networks by Recomputing: A Survey. In: Kutrib, M., Meyer, U. (eds) Reversible Computation. RC 2023. Lecture Notes in Computer Science, vol 13960. Springer, Cham. https://doi.org/10.1007/978-3-031-38100-3_7
Download citation
DOI: https://doi.org/10.1007/978-3-031-38100-3_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-38099-0
Online ISBN: 978-3-031-38100-3
eBook Packages: Computer ScienceComputer Science (R0)