Abstract
Recurrent neural networks (RNNs) are powerful models for sequence learning. However, the training of RNNs is complicated because the internal covariate shift problem, where the input distribution at each iteration changes during the training as the parameters have been updated. Although some work has applied batch normalization (BN) to alleviate this problem in long short-term memory (LSTM), unfortunately, BN has not been applied to the update of the LSTM cell. In this paper, to tackle the internal covariate shift problem of LSTM, we introduce a method to successfully integrate BN into the update of the LSTM cell. Experimental results on two benchmark data sets, i.e. MNIST and Fashion-MNIST, show that the proposed method, enhanced LSTM with BN (eLSTM-BN), has achieved a faster convergence than LSTM and its variants, while obtained higher classification accuracy on sequence learning tasks.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Arjovsky, M., Shah, A., Bengio, Y.: Unitary evolution recurrent neural networks. In: ICML, pp. 1120–1128 (2016)
Bayer, J., Osendorfer, C., Chen, N., Urban, S., Smagt, P.: On fast dropout and its applicability to recurrent networks. CoRR abs/1311.0701 (2013)
Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 5(2), 157–166 (1994)
Cho, K., van Merrienboer, B., Gülçehre, Ç., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. CoRR abs/1406.1078 (2014)
Cooijmans, T., Ballas, N., Laurent, C., Courville, A.: Recurrent batch normalization. CoRR abs/1603.09025 (2016)
Hinton, G., Salakhutdinov, R.: Reducing the dimensionality of data with neural networks. Science 313, 504–507 (2006)
Hochreiter, S.: Untersuchungen zu Dynamischen Neuronalen Netzen. Master’s thesis, Institut Fur Informatik, Technische Universitat, Munchen (1991)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML, pp. 448–456 (2015)
Krizhevsky, A., Sutskever, I., Hinton, G.: ImageNet classification with deep convolutional neural networks. In: NIPS, pp. 1106–1114 (2012)
Laurent, C., Pereyra, G., Brakel, P., Zhang, Y., Bengio, Y.: Batch normalized recurrent neural networks. In: ICASSP, pp. 2657–2661 (2016)
Le, Q., Jaitly, N., Hinton, G.: A simple way to initialize recurrent networks of rectified linear units. CoRR abs/1504.00941 (2015)
Liao, Q., Poggio, T.: Bridging the gaps between residual learning, recurrent neural networks and visual cortex. CoRR abs/1604.03640 (2016)
Saxe, A., McClelland, J., Ganguli, S.: Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. CoRR abs/1312.6120 (2013)
Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. CoRR abs/1708.07747 (2017)
Yann, L., Lon, B., Yoshua, B., Patrick, H.: Gradient-based learning applied to document recognition, pp. 2278–2324. IEEE (1998)
Zaremba, W., Sutskever, I., Vinyals, O.: Recurrent neural network regularization. CoRR abs/1409.2329 (2014)
Zheng, Y., Zhong, G., Liu, J., Cai, X., Dong, J.: Visual texture perception with feature learning models and deep architectures. In: Li, S., Liu, C., Wang, Y. (eds.) CCPR 2014. CCIS, vol. 483, pp. 401–410. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-45646-0_41
Acknowledgments
This work was supported by the Major Project for New Generation of AI under Grant No. 2018AAA0100400, the National Key R&D Program of China under Grant No. 2016YFC1401004, the National Natural Science Foundation of China (NSFC) under Grant No. 41706010, the Science and Technology Program of Qingdao under Grant No. 17-3-3-20-nsh, the CERNET Innovation Project under Grant No. NGII20170416, the Joint Fund of the Equipments Pre-Research and Ministry of Education of China under Grand No. 6141A020337, the Open Project Program of Key Laboratory of Research on Marine Hazards Forecasting, National Marine Environmental Forecasting Center, State Oceanic Administration (SOA), under Grand No. LOMF1802, the Graduate Education Reform and Research Project of Ocean University of China under Grand No. HDJG19001, and the Fundamental Research Funds for the Central Universities of China.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Wang, LN., Zhong, G., Yan, S., Dong, J., Huang, K. (2019). Enhanced LSTM with Batch Normalization. In: Gedeon, T., Wong, K., Lee, M. (eds) Neural Information Processing. ICONIP 2019. Lecture Notes in Computer Science(), vol 11953. Springer, Cham. https://doi.org/10.1007/978-3-030-36708-4_61
Download citation
DOI: https://doi.org/10.1007/978-3-030-36708-4_61
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-36707-7
Online ISBN: 978-3-030-36708-4
eBook Packages: Computer ScienceComputer Science (R0)