Abstract
Recurrent Neural Networks (RNN) process sequential data to capture the time-dependency in the input signal. Training a deep RNN conventionally involves segmenting the data sequence to fit the model into memory. Increasing the segment size permits the model to better capture long-term dependencies at the expense of creating larger models that may not fit in memory. Therefore, we introduce a technique to allow designers to train a segmented RNN and obtain the same model parameters as if the entire data sequence was applied regardless of the segment size. This enables an optimal capturing of long-term dependencies. This technique can increase the computational complexity during training. Hence, the proposed technique grants designers the flexibility of balancing memory and runtime requirements. To evaluate the proposed method, we compared the total loss achieved on the testing dataset after every epoch while varying the size of the segments. The results we achieved show matching loss graphs irrespective of the segment size.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)
Werbos, P.J.: Backpropagation through time: what it does and how to do it. Proc. IEEE 78(10), 1550–1560 (1990)
Pascanu, R., Mikolov, T., Bengio, Y.: On the difficulty of training recurrent neural networks (2013)
Hochreiter, S., Urgen Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Cho, K., et al.: Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation (2014)
Jaderberg, M., et al.: Decoupled Neural Interfaces using Synthetic Gradients (2017)
Gruslys, A., et al.: Memory-Efficient Backpropagation Through Time (2016)
Williams, R.J., Peng, J.: An efficient gradient-based algorithm for on-line training of recurrent network trajectories. Neural Comput. 2(4), 490–501 (1990)
Jaeger, H., Jaeger, H.: A tutorial on training recurrent neural networks, covering BPPT, RTRL, EKF and the “echo state network” approach (2002)
Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost (2016)
Ringeval, F., et al.: AVEC 2018 workshop and challenge. In: Proceedings of 2018 Audio/Visual Emot. Chall. Work. – AVEC 2018, pp. 3–13 (2018)
SEWA database. https://db.sewaproject.eu/. Accessed 12 Jan 2019
Abadi, M., et al.: TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Ayoub, I., Al Osman, H. (2019). Memory-Efficient Backpropagation for Recurrent Neural Networks. In: Meurs, MJ., Rudzicz, F. (eds) Advances in Artificial Intelligence. Canadian AI 2019. Lecture Notes in Computer Science(), vol 11489. Springer, Cham. https://doi.org/10.1007/978-3-030-18305-9_22
Download citation
DOI: https://doi.org/10.1007/978-3-030-18305-9_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-18304-2
Online ISBN: 978-3-030-18305-9
eBook Packages: Computer ScienceComputer Science (R0)