Stochastic Normalizations as Bayesian Learning

Shekhovtsov, Alexander; Flach, Boris

doi:10.1007/978-3-030-20890-5_30

Alexander Shekhovtsov¹⁸ &
Boris Flach¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11362))

Included in the following conference series:

Asian Conference on Computer Vision

2310 Accesses

Abstract

In this work we investigate the reasons why Batch Normalization (BN) improves the generalization performance of deep networks. We argue that one major reason, distinguishing it from data-independent normalization methods, is randomness of batch statistics. This randomness appears in the parameters rather than in activations and admits an interpretation as a practical Bayesian learning. We apply this idea to other (deterministic) normalization techniques that are oblivious to the batch size. We show that their generalization performance can be improved significantly by Bayesian learning of the same form. We obtain test performance comparable to BN and, at the same time, better validation losses suitable for subsequent output uncertainty estimation through approximate Bayesian posterior.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Uncertainty Estimation via Stochastic Batch Normalization

Weight and Gradient Centralization in Deep Neural Networks

Revisiting Batch Norm Initialization

Notes

1.
Using well known results for the distribution of the sample mean and variance of normally distributed variables. The inverse chi distribution is the distribution of 1 / S when $S^2$ has a chi squared distribution [13].

References

Arpit, D., Zhou, Y., Kota, B.U., Govindaraju, V.: Normalization propagation: a parametric technique for removing internal covariate shift in deep networks. In: ICML, pp. 1168–1176 (2016)
Google Scholar
Atanov, A., Ashukha, A., Molchanov, D., Neklyudov, K., Vetrov, D.: Uncertainty estimation via stochastic batch normalization. In: ICLR Workshop Track (2018)
Google Scholar
Blundell, C., Cornebise, J., Kavukcuoglu, K., Wierstra, D.: Weight uncertainty in neural networks. In: ICML, pp. 1613–1622 (2015)
Google Scholar
Clevert, D.A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (ELUs). In: ICLR (2016)
Google Scholar
Gal, Y., Ghahramani, Z.: Dropout as a Bayesian approximation: representing model uncertainty in deep learning. In: ICML, pp. 1050–1059 (2016)
Google Scholar
Gal, Y., Ghahramani, Z.: A theoretically grounded application of dropout in recurrent neural networks. In: NIPS, pp. 1027–1035 (2016)
Google Scholar
Gast, J., Roth, S.: Lightweight probabilistic deep networks. In: CVPR, June 2018
Google Scholar
Gitman, I., Ginsburg, B.: Comparison of batch normalization and weight normalization algorithms for the large-scale image classification. CoRR abs/1709.08145 (2017)
Google Scholar
Graves, A.: Practical variational inference for neural networks. In: NIPS, pp. 2348–2356 (2011)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML, vol. 37, pp. 448–456 (2015)
Google Scholar
Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. In: NIPS, pp. 2575–2583 (2015)
Google Scholar
Lee, P.: Bayesian Statistics: An Introduction. Wiley, Hoboken (2012)
MATH Google Scholar
Lei Ba, J., Kiros, J.R., Hinton, G.E.: Layer normalization. ArXiv e-prints, July 2016
Google Scholar
Li, X., Chen, S., Hu, X., Yang, J.: Understanding the disharmony between dropout and batch normalization by variance shift. CoRR abs/1801.05134 (2018)
Google Scholar
Luenberger, D.G., Ye, Y.: Linear and Nonlinear Programming. Springer, New York (2015)
MATH Google Scholar
Maška, M., et al.: A benchmark for comparison of cell tracking algorithms. Bioinformatics 30(11), 1609–1617 (2014)
Article Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Salimans, T., Kingma, D.P.: Weight normalization: a simple reparameterization to accelerate training of deep neural networks. In: NIPS (2016)
Google Scholar
Santurkar, S., Tsipras, D., Ilyas, A., Madry, A.: How does batch normalization help optimization? (no, it is not about internal covariate shift). CoRR 1805.11604 (2018)
Google Scholar
Schulman, J., Heess, N., Weber, T., Abbeel, P.: Gradient estimation using stochastic computation graphs. In: NIPS, pp. 3528–3536 (2015)
Google Scholar
Shekhovtsov, A., Flach, B.: Normalization of neural networks using analytic variance propagation. In: Computer Vision Winter Workshop, pp. 45–53 (2018)
Google Scholar
Springenberg, J., Dosovitskiy, A., Brox, T., Riedmiller, M.: Striving for simplicity: the all convolutional net. In: ICLR (Workshop Track) (2015)
Google Scholar
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. JMLR 15, 1929–1958 (2014)
MathSciNet MATH Google Scholar
Teye, M., Azizpour, H., Smith, K.: Bayesian uncertainty estimation for batch normalized deep networks. In: ICML (2018)
Google Scholar
Zagoruyko, S., Komodakis, N.: Wide residual networks. In: BMVC, pp. 87.1–87.12, September 2016
Google Scholar

Download references

Acknowledgments

A.S. has been supported by Czech Science Foundation grant 18-25383S and Toyota Motor Europe. B.F. gratefully acknowledges support by the Czech OP VVV project “Research Center for Informatics” (CZ.02.1.01/0.0/0.0/16_019/0000765).

Author information

Authors and Affiliations

Department of Cybernetics, Czech Technical University in Prague, Prague, Czech Republic
Alexander Shekhovtsov & Boris Flach

Authors

Alexander Shekhovtsov
View author publications
You can also search for this author in PubMed Google Scholar
Boris Flach
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alexander Shekhovtsov .

Editor information

Editors and Affiliations

IIIT Hyderabad, Hyderabad, India
C. V. Jawahar
ANU, Canberra, ACT, Australia
Hongdong Li
Simon Fraser University, Burnaby, BC, Canada
Greg Mori
ETH Zurich, Zurich, Zürich, Switzerland
Konrad Schindler

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 651 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shekhovtsov, A., Flach, B. (2019). Stochastic Normalizations as Bayesian Learning. In: Jawahar, C., Li, H., Mori, G., Schindler, K. (eds) Computer Vision – ACCV 2018. ACCV 2018. Lecture Notes in Computer Science(), vol 11362. Springer, Cham. https://doi.org/10.1007/978-3-030-20890-5_30

Download citation

DOI: https://doi.org/10.1007/978-3-030-20890-5_30
Published: 02 June 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-20889-9
Online ISBN: 978-3-030-20890-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics