Enhanced LSTM with Batch Normalization

Wang, Li-Na; Zhong, Guoqiang; Yan, Shoujun; Dong, Junyu; Huang, Kaizhu

doi:10.1007/978-3-030-36708-4_61

Enhanced LSTM with Batch Normalization

Li-Na Wang¹¹,
Guoqiang Zhong¹¹,
Shoujun Yan¹¹,
Junyu Dong¹¹ &
…
Kaizhu Huang¹²

Conference paper
First Online: 09 December 2019

2859 Accesses
5 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11953))

Abstract

Recurrent neural networks (RNNs) are powerful models for sequence learning. However, the training of RNNs is complicated because the internal covariate shift problem, where the input distribution at each iteration changes during the training as the parameters have been updated. Although some work has applied batch normalization (BN) to alleviate this problem in long short-term memory (LSTM), unfortunately, BN has not been applied to the update of the LSTM cell. In this paper, to tackle the internal covariate shift problem of LSTM, we introduce a method to successfully integrate BN into the update of the LSTM cell. Experimental results on two benchmark data sets, i.e. MNIST and Fashion-MNIST, show that the proposed method, enhanced LSTM with BN (eLSTM-BN), has achieved a faster convergence than LSTM and its variants, while obtained higher classification accuracy on sequence learning tasks.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Arjovsky, M., Shah, A., Bengio, Y.: Unitary evolution recurrent neural networks. In: ICML, pp. 1120–1128 (2016)
Google Scholar
Bayer, J., Osendorfer, C., Chen, N., Urban, S., Smagt, P.: On fast dropout and its applicability to recurrent networks. CoRR abs/1311.0701 (2013)
Google Scholar
Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 5(2), 157–166 (1994)
Article Google Scholar
Cho, K., van Merrienboer, B., Gülçehre, Ç., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. CoRR abs/1406.1078 (2014)
Google Scholar
Cooijmans, T., Ballas, N., Laurent, C., Courville, A.: Recurrent batch normalization. CoRR abs/1603.09025 (2016)
Google Scholar
Hinton, G., Salakhutdinov, R.: Reducing the dimensionality of data with neural networks. Science 313, 504–507 (2006)
Article MathSciNet Google Scholar
Hochreiter, S.: Untersuchungen zu Dynamischen Neuronalen Netzen. Master’s thesis, Institut Fur Informatik, Technische Universitat, Munchen (1991)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML, pp. 448–456 (2015)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.: ImageNet classification with deep convolutional neural networks. In: NIPS, pp. 1106–1114 (2012)
Google Scholar
Laurent, C., Pereyra, G., Brakel, P., Zhang, Y., Bengio, Y.: Batch normalized recurrent neural networks. In: ICASSP, pp. 2657–2661 (2016)
Google Scholar
Le, Q., Jaitly, N., Hinton, G.: A simple way to initialize recurrent networks of rectified linear units. CoRR abs/1504.00941 (2015)
Google Scholar
Liao, Q., Poggio, T.: Bridging the gaps between residual learning, recurrent neural networks and visual cortex. CoRR abs/1604.03640 (2016)
Google Scholar
Saxe, A., McClelland, J., Ganguli, S.: Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. CoRR abs/1312.6120 (2013)
Google Scholar
Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. CoRR abs/1708.07747 (2017)
Google Scholar
Yann, L., Lon, B., Yoshua, B., Patrick, H.: Gradient-based learning applied to document recognition, pp. 2278–2324. IEEE (1998)
Google Scholar
Zaremba, W., Sutskever, I., Vinyals, O.: Recurrent neural network regularization. CoRR abs/1409.2329 (2014)
Google Scholar
Zheng, Y., Zhong, G., Liu, J., Cai, X., Dong, J.: Visual texture perception with feature learning models and deep architectures. In: Li, S., Liu, C., Wang, Y. (eds.) CCPR 2014. CCIS, vol. 483, pp. 401–410. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-45646-0_41
Chapter Google Scholar

Download references

Acknowledgments

This work was supported by the Major Project for New Generation of AI under Grant No. 2018AAA0100400, the National Key R&D Program of China under Grant No. 2016YFC1401004, the National Natural Science Foundation of China (NSFC) under Grant No. 41706010, the Science and Technology Program of Qingdao under Grant No. 17-3-3-20-nsh, the CERNET Innovation Project under Grant No. NGII20170416, the Joint Fund of the Equipments Pre-Research and Ministry of Education of China under Grand No. 6141A020337, the Open Project Program of Key Laboratory of Research on Marine Hazards Forecasting, National Marine Environmental Forecasting Center, State Oceanic Administration (SOA), under Grand No. LOMF1802, the Graduate Education Reform and Research Project of Ocean University of China under Grand No. HDJG19001, and the Fundamental Research Funds for the Central Universities of China.

Author information

Authors and Affiliations

Department of Computer Science and Technology, Ocean University of China, 238 Songling Road, Qingdao, 266100, China
Li-Na Wang, Guoqiang Zhong, Shoujun Yan & Junyu Dong
Department of Electrical and Electronic Engineering, Xi’an Jiaotong-Liverpool University, 111 Ren’ai Road, SIP, Suzhou, 215123, China
Kaizhu Huang

Authors

Li-Na Wang
View author publications
You can also search for this author in PubMed Google Scholar
Guoqiang Zhong
View author publications
You can also search for this author in PubMed Google Scholar
Shoujun Yan
View author publications
You can also search for this author in PubMed Google Scholar
Junyu Dong
View author publications
You can also search for this author in PubMed Google Scholar
Kaizhu Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guoqiang Zhong .

Editor information

Editors and Affiliations

Australian National University, Canberra, ACT, Australia
Tom Gedeon
Murdoch University, Murdoch, WA, Australia
Kok Wai Wong
Kyungpook National University, Daegu, Korea (Republic of)
Minho Lee

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, LN., Zhong, G., Yan, S., Dong, J., Huang, K. (2019). Enhanced LSTM with Batch Normalization. In: Gedeon, T., Wong, K., Lee, M. (eds) Neural Information Processing. ICONIP 2019. Lecture Notes in Computer Science(), vol 11953. Springer, Cham. https://doi.org/10.1007/978-3-030-36708-4_61

Download citation

DOI: https://doi.org/10.1007/978-3-030-36708-4_61
Published: 09 December 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-36707-7
Online ISBN: 978-3-030-36708-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics