Demystifying Batch Normalization: Analysis of Normalizing Layer Inputs in Neural Networks

Franceschi, Dinko D.; Jang, Jun Hyek

doi:10.1007/978-3-030-41913-4_5

Demystifying Batch Normalization: Analysis of Normalizing Layer Inputs in Neural Networks

Dinko D. Franceschi¹¹ &
Jun Hyek Jang¹¹

Conference paper
First Online: 15 February 2020

782 Accesses

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1173))

Abstract

Batch normalization was introduced as a novel solution to help with training fully-connected feed-forward deep neural networks. It proposes to normalize each training-batch in order to alleviate the problem caused by internal covariate shift. The original method claimed that Batch Normalization must be performed before the ReLu activation in the training process for optimal results. However, a second method has since gained ground which stresses the importance of performing BN after the ReLu activation in order to maximize performance. In fact, in the source code of PyTorch, common architectures such as VGG16, ResNet and DenseNet have Batch Normalization layer after the ReLU activation layer. Our work is the first to demystify the aforementioned debate and offer a comprehensive answer as to the proper order for Batch Normalization in the neural network training process. We demonstrate that for convolutional neural networks (CNNs) without skip connections, it is optimal to do ReLu activation before Batch Normalization as a result of higher gradient flow. In Residual Networks with skip connections, the order does not affect the performance or the gradient flow between the layers.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Chollet, F.: Keras (2015)
Google Scholar
Goodfellow, I.: Chapter 8: Optimization for Training Deep Models [Deep Learning Book]. Retrieved from Deep Learning Book (2016)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Google Scholar
Ioffe, S., Szegedy, C.: arxiv preprint arxiv:1502.03167 (2015)
LeCun, Y.A., Bottou, L., Orr, G.B., Müller, K.-R.: Efficient BackProp. In: Montavon, G., Orr, G.B., Müller, K.-R. (eds.) Neural Networks: Tricks of the Trade. LNCS, vol. 7700, pp. 9–48. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35289-8_3
Chapter Google Scholar
Paszke, A., Gros, S., Chintala, S., Chanan, G.: Pytorch. Comput. Softw. Vers. 0.3, 1 (2017)
Google Scholar
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Columbia University, New York, NY, 10025, USA
Dinko D. Franceschi & Jun Hyek Jang

Authors

Dinko D. Franceschi
View author publications
You can also search for this author in PubMed Google Scholar
Jun Hyek Jang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dinko D. Franceschi .

Editor information

Editors and Affiliations

University of Cádiz, Cádiz, Spain
Bernabé Dorronsoro
University of Cádiz, Cádiz, Spain
Patricia Ruiz
University of Cádiz, Cádiz, Spain
Juan Carlos de la Torre
University of Cádiz, Cádiz, Spain
Daniel Urda
University of Lille, Lille, France
El-Ghazali Talbi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Franceschi, D.D., Jang, J.H. (2020). Demystifying Batch Normalization: Analysis of Normalizing Layer Inputs in Neural Networks. In: Dorronsoro, B., Ruiz, P., de la Torre, J., Urda, D., Talbi, EG. (eds) Optimization and Learning. OLA 2020. Communications in Computer and Information Science, vol 1173. Springer, Cham. https://doi.org/10.1007/978-3-030-41913-4_5

Download citation

DOI: https://doi.org/10.1007/978-3-030-41913-4_5
Published: 15 February 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-41912-7
Online ISBN: 978-3-030-41913-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics