Skip to main content

Brief Announcement: Gradual Learning of Deep Recurrent Neural Network

  • Conference paper
  • First Online:
Book cover Cyber Security Cryptography and Machine Learning (CSCML 2018)

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 10879))

Abstract

Deep Recurrent Neural Networks (RNNs) achieve state-of-the-art results in many sequence-to-sequence modeling tasks. However, deep RNNs are difficult to train and tend to suffer from overfitting. Motivated by the Data Processing Inequality (DPI) we formulate the multi-layered network as a Markov chain, introducing a training method that comprises training the network gradually and using layer-wise gradient clipping. In total, we have found that applying our methods combined with previously introduced regularization and optimization methods resulted in improvement to the state-of-the-art architectures operating in language modeling tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bianchini, M., Scarselli, F.: On the complexity of neural network classifiers: a comparison between shallow and deep architectures. IEEE Trans. Neural Netw. Learn. Syst. (2014)

    Google Scholar 

  2. Cho, K., Van Merriënboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: encoder-decoder approaches (2014). arXiv preprint arXiv:1409.1259

  3. Cho, K., van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN Encoder-Decoder for statistical machine translation (2014). arXiv preprint arXiv:1406.107

  4. Cooijmans, T., Ballas, N., Laurent, C., Gülçehre, Ç., Courville, A.: Recurrent batch normalization (2016). arXiv preprint arXiv:1603.09025

  5. Ha, D., Dai, A., Le, Q.V.: Hypernetworks (2016). arXiv preprint arXiv:1609.09106

  6. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition (2015). arXiv preprint arXiv:1512.03385

  7. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  8. Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift (2015). arXiv preprint arXiv:1502.03167

  9. Krause, B., Kahembwe, E., Murray, I., Renals, S.: Dynamic evaluation of neural sequence models (2017). arXiv preprint arXiv:1709.07432

  10. Melis, G., Dyer, C., Blunsom, P.: On the State of the Art of Evaluation in Neural Language Models. ArXiv e-prints, July 2017

    Google Scholar 

  11. Merity, S., Shirish Keskar, N., Socher, R.: Regularizing and Optimizing LSTM Language Models. ArXiv e-prints, August 2017

    Google Scholar 

  12. Montufar, G., Pascanu, R., Cho, K., Bengio, Y.: On the number of linear regions of deep neural networks (2014). arXiv preprint arXiv:1402.1869

  13. Shimodaira, H.: Improving predictive inference under covariate shift by weighting the log-likelihood function. J. Stat. Plann. Infer. 90(2), 227–244 (2000)

    Article  MathSciNet  Google Scholar 

  14. Smith, L.N., Hand, E.M., Doster, T.: Gradual dropin of layers to train very deep neural networks (2015). arXiv preprint arXiv:1511.06951

  15. Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Proceedings of the 30th International Conference on International Conference on Machine Learning, ICML 2013, vol. 28, pp. III-1139–III-1147 (2013). JMLR.org

  16. Yang, Z., Dai, Z., Salakhutdinov, R., Cohen, W.W.: Breaking the softmax bottleneck: a high-rank RNN language model (2017). arXiv preprint arXiv:1711.03953

  17. Zilly, J.G., Srivastava, R.K., Koutník, J., Schmidhuber, J.: Recurrent highway networks (2016). arXiv preprint arXiv:1607.03474

  18. Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning (2016). arXiv preprint arXiv:1611.01578

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ziv Aharoni .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Aharoni, Z., Rattner, G., Permuter, H. (2018). Brief Announcement: Gradual Learning of Deep Recurrent Neural Network. In: Dinur, I., Dolev, S., Lodha, S. (eds) Cyber Security Cryptography and Machine Learning. CSCML 2018. Lecture Notes in Computer Science(), vol 10879. Springer, Cham. https://doi.org/10.1007/978-3-319-94147-9_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-94147-9_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-94146-2

  • Online ISBN: 978-3-319-94147-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics