Skip to main content

Training 1-Bit Networks on a Sphere: A Geometric Approach

  • Conference paper
  • First Online:
Artificial Neural Networks and Machine Learning – ICANN 2022 (ICANN 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13531))

Included in the following conference series:

  • 1728 Accesses

Abstract

Weight binarization offers a promising alternative towards building highly efficient Deep Neural Networks (DNNs) that can be deployed in low-power, constrained devices. However, given their discrete nature, training 1-bit DNNs is not a straightforward or uniquely defined process and several strategies have been proposed to address this issue yielding every time closer performance to their full-precision counterparts. In this paper we analyze 1-bit DNNs from a differential geometry perspective. We part from noticing that for a given model with d binary weights, all possible weight configurations lie on a sphere of radius \(\sqrt{d}\). Along with the traditional training procedure based on the Straight Through Estimator (STE), we leverage concepts from the fields of Riemannian optimization to constrain the search space to spherical manifolds, a subset of Riemannian manifolds. Our approach offers a principled solution; nevertheless, in practice we found that simply constraining the norm of the underlying auxiliary network works just as effectively. Additionally, we observe that by enforcing a unit norm on the network parameters, our network explores a space of well-conditioned matrices. Complementary to our approach, we additionally define an angle based regularization that guides the auxiliary space exploration. We binarize a ResNet architecture in order to demonstrate the effectiveness of our approach in the tasks of image classification on the CIFAR-100 and ImageNet datasets.

Supported by the Australian Centre for Robotic Vision.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ajanthan, T., Dokania, P.K., Hartley, R., Torr, P.H.: Proximal mean-field for neural network quantization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4871–4880 (2019)

    Google Scholar 

  2. Anderson, A.G., Berg, C.P.: The high-dimensional geometry of binary neural networks. arXiv preprint arXiv:1705.07199 (2017)

  3. Bengio, Y., Léonard, N., Courville, A.: Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432 (2013)

  4. Bonnabel, S.: Stochastic gradient descent on Riemannian manifolds. IEEE Trans. Autom. Control 58(9), 2217–2229 (2013)

    Article  MathSciNet  Google Scholar 

  5. Boumal, N., Mishra, B., Absil, P.A., Sepulchre, R.: Manopt, a Matlab toolbox for optimization on manifolds. J. Mach. Learn. Res. 15(42), 1455–1459 (2014). https://www.manopt.org

  6. Cai, Z., He, X., Sun, J., Vasconcelos, N.: Deep learning with low precision by half-wave gaussian quantization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5918–5926 (2017)

    Google Scholar 

  7. Courbariaux, M., Bengio, Y., David, J.P.: BinaryConnect: training deep neural networks with binary weights during propagations. In: Proceedings of the Advances in Neural Information Processing Systems, pp. 3123–3131 (2015)

    Google Scholar 

  8. Darabi, S., Belbahri, M., Courbariaux, M., Nia, V.P.: BNN+: improved binary network training. arXiv preprint arXiv:1812.11800 (2018)

  9. Denton, E.L., Zaremba, W., Bruna, J., LeCun, Y., Fergus, R.: Exploiting linear structure within convolutional networks for efficient evaluation. In: Proceedings of the Advances in Neural Information Processing Systems, pp. 1269–1277 (2014)

    Google Scholar 

  10. Gao, Z., Wu, Y., Jia, Y., Harandi, M.: Learning to optimize on SPD manifolds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7700–7709 (2020)

    Google Scholar 

  11. Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the 13th International Conference on Artificial Intelligence and Statistics, pp. 249–256 (2010)

    Google Scholar 

  12. Gong, R., et al.: Differentiable soft quantization: bridging full-precision and low-bit neural networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4852–4861 (2019)

    Google Scholar 

  13. Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)

    Google Scholar 

  14. Guerra, L., Zhuang, B., Reid, I., Drummond, T.: Switchable precision neural networks. arXiv preprint arXiv:2002.02815 (2020)

  15. Han, S., Mao, H., Dally, W.J.: Deep compression: compressing deep neural networks with pruning, trained quantization and Huffman coding. In: Proceedings of the International Conference on Learning Representations (2016)

    Google Scholar 

  16. He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034 (2015)

    Google Scholar 

  17. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  18. He, Y., Liu, P., Wang, Z., Yang, Y.: Pruning filter via geometric median for deep convolutional neural networks acceleration. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019)

    Google Scholar 

  19. Howard, A.G., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)

  20. Hu, Q., Wang, P., Cheng, J.: From hashing to CNNs: training binary weight networks via hashing. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)

    Google Scholar 

  21. Hubara, I., Courbariaux, M., Soudry, D., El-Yaniv, R., Bengio, Y.: Binarized neural networks. In: Proceedings of the Advances in Neural Information Processing Systems, pp. 4107–4115 (2016)

    Google Scholar 

  22. Iandola, F.N., Han, S., Moskewicz, M.W., Ashraf, K., Dally, W.J., Keutzer, K.: SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and \(<\)0.5 mb model size. arXiv preprint arXiv:1602.07360 (2016)

  23. Jaderberg, M., Vedaldi, A., Zisserman, A.: Speeding up convolutional neural networks with low rank expansions. arXiv preprint arXiv:1405.3866 (2014)

  24. Kasai, H., Sato, H., Mishra, B.: Riemannian stochastic recursive gradient algorithm. In: International Conference on Machine Learning, pp. 2516–2524. PMLR (2018)

    Google Scholar 

  25. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of the International Conference on Learning Representations (2015)

    Google Scholar 

  26. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Proceedings of the Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)

    Google Scholar 

  27. LeCun, Y., Denker, J.S., Solla, S.A.: Optimal brain damage. In: Advances in Neural Information Processing Systems, pp. 598–605 (1990)

    Google Scholar 

  28. Li, F., Zhang, B., Liu, B.: Ternary weight networks. arXiv preprint arXiv:1605.04711 (2016)

  29. Li, H., Kadav, A., Durdanovic, I., Samet, H., Graf, H.P.: Pruning filters for efficient convnets. In: Proceedings of the International Conference on Learning Representations (2017)

    Google Scholar 

  30. Lin, M., et al.: Rotated binary neural network. arXiv preprint arXiv:2009.13055 (2020)

  31. Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: Proceedings of the International Conference on Learning Representations (2019)

    Google Scholar 

  32. Liu, W., et al.: Learning towards minimum hyperspherical energy. In: Advances in Neural Information Processing Systems, pp. 6222–6233 (2018)

    Google Scholar 

  33. Liu, Z., Wu, B., Luo, W., Yang, X., Liu, W., Cheng, K.T.: Bi-Real Net: enhancing the performance of 1-bit CNNs with improved representational capability and advanced training algorithm. In: Proceedings of the European Conference on Computer Vision (2018)

    Google Scholar 

  34. Louizos, C., Reisser, M., Blankevoort, T., Gavves, E., Welling, M.: Relaxed quantization for discretized neural networks. In: Proceedings of the International Conference on Learning Representations (2019)

    Google Scholar 

  35. Louizos, C., Welling, M., Kingma, D.P.: Learning sparse neural networks through \( l_0 \) regularization. In: Proceedings of the International Conference on Learning Representations (2018)

    Google Scholar 

  36. McDonnell, M.D.: Training wide residual networks for deployment using a single bit for each weight. arXiv preprint arXiv:1802.08530 (2018)

  37. Mishkin, D., Matas, J.: All you need is a good init. arXiv preprint arXiv:1511.06422 (2015)

  38. Miyato, T., Kataoka, T., Koyama, M., Yoshida, Y.: Spectral normalization for generative adversarial networks. arXiv preprint arXiv:1802.05957 (2018)

  39. Pennington, J., Worah, P.: The spectrum of the fisher information matrix of a single-hidden-layer neural network. In: Advances in Neural Information Processing Systems, pp. 5410–5419 (2018)

    Google Scholar 

  40. Rastegari, M., Ordonez, V., Redmon, J., Farhadi, A.: XNOR-Net: ImageNet classification using binary convolutional neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 525–542. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_32

    Chapter  Google Scholar 

  41. Real, E., Aggarwal, A., Huang, Y., Le, Q.V.: Regularized evolution for image classifier architecture search. arXiv preprint arXiv:1802.01548 (2018)

  42. Roy, S.K., Mhammedi, Z., Harandi, M.: Geometry aware constrained optimization techniques for deep learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4460–4469 (2018)

    Google Scholar 

  43. Salimans, T., Kingma, D.P.: Weight normalization: a simple reparameterization to accelerate training of deep neural networks. In: Advances in Neural Information Processing Systems, pp. 901–909 (2016)

    Google Scholar 

  44. Sato, H., Kasai, H., Mishra, B.: Riemannian stochastic variance reduced gradient algorithm with retraction and vector transport. SIAM J. Optim. 29(2), 1444–1472 (2019)

    Article  MathSciNet  Google Scholar 

  45. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)

  46. Yang, J., et al.: Quantization networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7308–7316 (2019)

    Google Scholar 

  47. Yin, P., Zhang, S., Lyu, J., Osher, S., Qi, Y., Xin, J.: BinaryRelax: a relaxation approach for training deep neural networks with quantized weights. SIAM J. Imaging Sci. 11(4), 2205–2223 (2018)

    Article  MathSciNet  Google Scholar 

  48. Yoshida, Y., Miyato, T.: Spectral norm regularization for improving the generalizability of deep learning. arXiv preprint arXiv:1705.10941 (2017)

  49. Zagoruyko, S., Komodakis, N.: Wide residual networks (2016)

    Google Scholar 

  50. Zhang, X., Zhou, X., Lin, M., Sun, J.: ShuffleNet: an extremely efficient convolutional neural network for mobile devices. arXiv preprint arXiv:1707.01083 (2017)

  51. Zhang, X., Zou, J., He, K., Sun, J.: Accelerating very deep convolutional networks for classification and detection. IEEE Trans. Pattern Anal. Mach. Intell. 38(10), 1943–1955 (2016)

    Article  Google Scholar 

  52. Zhuang, Z., et al.: Discrimination-aware channel pruning for deep neural networks. In: Proceedings of the Advances in Neural Information Processing Systems (2018)

    Google Scholar 

  53. Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In: Proceedings of the International Conference on Learning Representations (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Luis Guerra .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Guerra, L., Thalaiyasingam, A., Avraham, G., Zou, Y., Drummond, T. (2022). Training 1-Bit Networks on a Sphere: A Geometric Approach. In: Pimenidis, E., Angelov, P., Jayne, C., Papaleonidas, A., Aydin, M. (eds) Artificial Neural Networks and Machine Learning – ICANN 2022. ICANN 2022. Lecture Notes in Computer Science, vol 13531. Springer, Cham. https://doi.org/10.1007/978-3-031-15934-3_65

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-15934-3_65

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-15933-6

  • Online ISBN: 978-3-031-15934-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics