Skip to main content

Enhanced LSTM with Batch Normalization

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11953))

Abstract

Recurrent neural networks (RNNs) are powerful models for sequence learning. However, the training of RNNs is complicated because the internal covariate shift problem, where the input distribution at each iteration changes during the training as the parameters have been updated. Although some work has applied batch normalization (BN) to alleviate this problem in long short-term memory (LSTM), unfortunately, BN has not been applied to the update of the LSTM cell. In this paper, to tackle the internal covariate shift problem of LSTM, we introduce a method to successfully integrate BN into the update of the LSTM cell. Experimental results on two benchmark data sets, i.e. MNIST and Fashion-MNIST, show that the proposed method, enhanced LSTM with BN (eLSTM-BN), has achieved a faster convergence than LSTM and its variants, while obtained higher classification accuracy on sequence learning tasks.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Arjovsky, M., Shah, A., Bengio, Y.: Unitary evolution recurrent neural networks. In: ICML, pp. 1120–1128 (2016)

    Google Scholar 

  2. Bayer, J., Osendorfer, C., Chen, N., Urban, S., Smagt, P.: On fast dropout and its applicability to recurrent networks. CoRR abs/1311.0701 (2013)

    Google Scholar 

  3. Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 5(2), 157–166 (1994)

    Article  Google Scholar 

  4. Cho, K., van Merrienboer, B., Gülçehre, Ç., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. CoRR abs/1406.1078 (2014)

    Google Scholar 

  5. Cooijmans, T., Ballas, N., Laurent, C., Courville, A.: Recurrent batch normalization. CoRR abs/1603.09025 (2016)

    Google Scholar 

  6. Hinton, G., Salakhutdinov, R.: Reducing the dimensionality of data with neural networks. Science 313, 504–507 (2006)

    Article  MathSciNet  Google Scholar 

  7. Hochreiter, S.: Untersuchungen zu Dynamischen Neuronalen Netzen. Master’s thesis, Institut Fur Informatik, Technische Universitat, Munchen (1991)

    Google Scholar 

  8. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  9. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML, pp. 448–456 (2015)

    Google Scholar 

  10. Krizhevsky, A., Sutskever, I., Hinton, G.: ImageNet classification with deep convolutional neural networks. In: NIPS, pp. 1106–1114 (2012)

    Google Scholar 

  11. Laurent, C., Pereyra, G., Brakel, P., Zhang, Y., Bengio, Y.: Batch normalized recurrent neural networks. In: ICASSP, pp. 2657–2661 (2016)

    Google Scholar 

  12. Le, Q., Jaitly, N., Hinton, G.: A simple way to initialize recurrent networks of rectified linear units. CoRR abs/1504.00941 (2015)

    Google Scholar 

  13. Liao, Q., Poggio, T.: Bridging the gaps between residual learning, recurrent neural networks and visual cortex. CoRR abs/1604.03640 (2016)

    Google Scholar 

  14. Saxe, A., McClelland, J., Ganguli, S.: Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. CoRR abs/1312.6120 (2013)

    Google Scholar 

  15. Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. CoRR abs/1708.07747 (2017)

    Google Scholar 

  16. Yann, L., Lon, B., Yoshua, B., Patrick, H.: Gradient-based learning applied to document recognition, pp. 2278–2324. IEEE (1998)

    Google Scholar 

  17. Zaremba, W., Sutskever, I., Vinyals, O.: Recurrent neural network regularization. CoRR abs/1409.2329 (2014)

    Google Scholar 

  18. Zheng, Y., Zhong, G., Liu, J., Cai, X., Dong, J.: Visual texture perception with feature learning models and deep architectures. In: Li, S., Liu, C., Wang, Y. (eds.) CCPR 2014. CCIS, vol. 483, pp. 401–410. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-45646-0_41

    Chapter  Google Scholar 

Download references

Acknowledgments

This work was supported by the Major Project for New Generation of AI under Grant No. 2018AAA0100400, the National Key R&D Program of China under Grant No. 2016YFC1401004, the National Natural Science Foundation of China (NSFC) under Grant No. 41706010, the Science and Technology Program of Qingdao under Grant No. 17-3-3-20-nsh, the CERNET Innovation Project under Grant No. NGII20170416, the Joint Fund of the Equipments Pre-Research and Ministry of Education of China under Grand No. 6141A020337, the Open Project Program of Key Laboratory of Research on Marine Hazards Forecasting, National Marine Environmental Forecasting Center, State Oceanic Administration (SOA), under Grand No. LOMF1802, the Graduate Education Reform and Research Project of Ocean University of China under Grand No. HDJG19001, and the Fundamental Research Funds for the Central Universities of China.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guoqiang Zhong .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, LN., Zhong, G., Yan, S., Dong, J., Huang, K. (2019). Enhanced LSTM with Batch Normalization. In: Gedeon, T., Wong, K., Lee, M. (eds) Neural Information Processing. ICONIP 2019. Lecture Notes in Computer Science(), vol 11953. Springer, Cham. https://doi.org/10.1007/978-3-030-36708-4_61

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-36708-4_61

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-36707-7

  • Online ISBN: 978-3-030-36708-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics