Skip to main content

Stochastic Normalizations as Bayesian Learning

  • Conference paper
  • First Online:
Computer Vision – ACCV 2018 (ACCV 2018)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11362))

Included in the following conference series:

  • 2310 Accesses

Abstract

In this work we investigate the reasons why Batch Normalization (BN) improves the generalization performance of deep networks. We argue that one major reason, distinguishing it from data-independent normalization methods, is randomness of batch statistics. This randomness appears in the parameters rather than in activations and admits an interpretation as a practical Bayesian learning. We apply this idea to other (deterministic) normalization techniques that are oblivious to the batch size. We show that their generalization performance can be improved significantly by Bayesian learning of the same form. We obtain test performance comparable to BN and, at the same time, better validation losses suitable for subsequent output uncertainty estimation through approximate Bayesian posterior.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Using well known results for the distribution of the sample mean and variance of normally distributed variables. The inverse chi distribution is the distribution of 1 / S when \(S^2\) has a chi squared distribution [13].

References

  1. Arpit, D., Zhou, Y., Kota, B.U., Govindaraju, V.: Normalization propagation: a parametric technique for removing internal covariate shift in deep networks. In: ICML, pp. 1168–1176 (2016)

    Google Scholar 

  2. Atanov, A., Ashukha, A., Molchanov, D., Neklyudov, K., Vetrov, D.: Uncertainty estimation via stochastic batch normalization. In: ICLR Workshop Track (2018)

    Google Scholar 

  3. Blundell, C., Cornebise, J., Kavukcuoglu, K., Wierstra, D.: Weight uncertainty in neural networks. In: ICML, pp. 1613–1622 (2015)

    Google Scholar 

  4. Clevert, D.A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (ELUs). In: ICLR (2016)

    Google Scholar 

  5. Gal, Y., Ghahramani, Z.: Dropout as a Bayesian approximation: representing model uncertainty in deep learning. In: ICML, pp. 1050–1059 (2016)

    Google Scholar 

  6. Gal, Y., Ghahramani, Z.: A theoretically grounded application of dropout in recurrent neural networks. In: NIPS, pp. 1027–1035 (2016)

    Google Scholar 

  7. Gast, J., Roth, S.: Lightweight probabilistic deep networks. In: CVPR, June 2018

    Google Scholar 

  8. Gitman, I., Ginsburg, B.: Comparison of batch normalization and weight normalization algorithms for the large-scale image classification. CoRR abs/1709.08145 (2017)

    Google Scholar 

  9. Graves, A.: Practical variational inference for neural networks. In: NIPS, pp. 2348–2356 (2011)

    Google Scholar 

  10. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)

    Google Scholar 

  11. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML, vol. 37, pp. 448–456 (2015)

    Google Scholar 

  12. Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. In: NIPS, pp. 2575–2583 (2015)

    Google Scholar 

  13. Lee, P.: Bayesian Statistics: An Introduction. Wiley, Hoboken (2012)

    MATH  Google Scholar 

  14. Lei Ba, J., Kiros, J.R., Hinton, G.E.: Layer normalization. ArXiv e-prints, July 2016

    Google Scholar 

  15. Li, X., Chen, S., Hu, X., Yang, J.: Understanding the disharmony between dropout and batch normalization by variance shift. CoRR abs/1801.05134 (2018)

    Google Scholar 

  16. Luenberger, D.G., Ye, Y.: Linear and Nonlinear Programming. Springer, New York (2015)

    MATH  Google Scholar 

  17. Maška, M., et al.: A benchmark for comparison of cell tracking algorithms. Bioinformatics 30(11), 1609–1617 (2014)

    Article  Google Scholar 

  18. Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28

    Chapter  Google Scholar 

  19. Salimans, T., Kingma, D.P.: Weight normalization: a simple reparameterization to accelerate training of deep neural networks. In: NIPS (2016)

    Google Scholar 

  20. Santurkar, S., Tsipras, D., Ilyas, A., Madry, A.: How does batch normalization help optimization? (no, it is not about internal covariate shift). CoRR 1805.11604 (2018)

    Google Scholar 

  21. Schulman, J., Heess, N., Weber, T., Abbeel, P.: Gradient estimation using stochastic computation graphs. In: NIPS, pp. 3528–3536 (2015)

    Google Scholar 

  22. Shekhovtsov, A., Flach, B.: Normalization of neural networks using analytic variance propagation. In: Computer Vision Winter Workshop, pp. 45–53 (2018)

    Google Scholar 

  23. Springenberg, J., Dosovitskiy, A., Brox, T., Riedmiller, M.: Striving for simplicity: the all convolutional net. In: ICLR (Workshop Track) (2015)

    Google Scholar 

  24. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. JMLR 15, 1929–1958 (2014)

    MathSciNet  MATH  Google Scholar 

  25. Teye, M., Azizpour, H., Smith, K.: Bayesian uncertainty estimation for batch normalized deep networks. In: ICML (2018)

    Google Scholar 

  26. Zagoruyko, S., Komodakis, N.: Wide residual networks. In: BMVC, pp. 87.1–87.12, September 2016

    Google Scholar 

Download references

Acknowledgments

A.S. has been supported by Czech Science Foundation grant 18-25383S and Toyota Motor Europe. B.F. gratefully acknowledges support by the Czech OP VVV project “Research Center for Informatics” (CZ.02.1.01/0.0/0.0/16_019/0000765).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alexander Shekhovtsov .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 651 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Shekhovtsov, A., Flach, B. (2019). Stochastic Normalizations as Bayesian Learning. In: Jawahar, C., Li, H., Mori, G., Schindler, K. (eds) Computer Vision – ACCV 2018. ACCV 2018. Lecture Notes in Computer Science(), vol 11362. Springer, Cham. https://doi.org/10.1007/978-3-030-20890-5_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-20890-5_30

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-20889-9

  • Online ISBN: 978-3-030-20890-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics