Training 1-Bit Networks on a Sphere: A Geometric Approach

Guerra, Luis; Thalaiyasingam, Ajanthan; Avraham, Gil; Zou, Yan; Drummond, Tom

doi:10.1007/978-3-031-15934-3_65

Luis Guerra¹²,
Ajanthan Thalaiyasingam¹³,
Gil Avraham¹²,
Yan Zou¹² &
…
Tom Drummond¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13531))

Included in the following conference series:

International Conference on Artificial Neural Networks

1728 Accesses

Abstract

Weight binarization offers a promising alternative towards building highly efficient Deep Neural Networks (DNNs) that can be deployed in low-power, constrained devices. However, given their discrete nature, training 1-bit DNNs is not a straightforward or uniquely defined process and several strategies have been proposed to address this issue yielding every time closer performance to their full-precision counterparts. In this paper we analyze 1-bit DNNs from a differential geometry perspective. We part from noticing that for a given model with d binary weights, all possible weight configurations lie on a sphere of radius \(\sqrt{d}\). Along with the traditional training procedure based on the Straight Through Estimator (STE), we leverage concepts from the fields of Riemannian optimization to constrain the search space to spherical manifolds, a subset of Riemannian manifolds. Our approach offers a principled solution; nevertheless, in practice we found that simply constraining the norm of the underlying auxiliary network works just as effectively. Additionally, we observe that by enforcing a unit norm on the network parameters, our network explores a space of well-conditioned matrices. Complementary to our approach, we additionally define an angle based regularization that guides the auxiliary space exploration. We binarize a ResNet architecture in order to demonstrate the effectiveness of our approach in the tasks of image classification on the CIFAR-100 and ImageNet datasets.

Supported by the Australian Centre for Robotic Vision.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ajanthan, T., Dokania, P.K., Hartley, R., Torr, P.H.: Proximal mean-field for neural network quantization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4871–4880 (2019)
Google Scholar
Anderson, A.G., Berg, C.P.: The high-dimensional geometry of binary neural networks. arXiv preprint arXiv:1705.07199 (2017)
Bengio, Y., Léonard, N., Courville, A.: Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432 (2013)
Bonnabel, S.: Stochastic gradient descent on Riemannian manifolds. IEEE Trans. Autom. Control 58(9), 2217–2229 (2013)
Article MathSciNet Google Scholar
Boumal, N., Mishra, B., Absil, P.A., Sepulchre, R.: Manopt, a Matlab toolbox for optimization on manifolds. J. Mach. Learn. Res. 15(42), 1455–1459 (2014). https://www.manopt.org
Cai, Z., He, X., Sun, J., Vasconcelos, N.: Deep learning with low precision by half-wave gaussian quantization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5918–5926 (2017)
Google Scholar
Courbariaux, M., Bengio, Y., David, J.P.: BinaryConnect: training deep neural networks with binary weights during propagations. In: Proceedings of the Advances in Neural Information Processing Systems, pp. 3123–3131 (2015)
Google Scholar
Darabi, S., Belbahri, M., Courbariaux, M., Nia, V.P.: BNN+: improved binary network training. arXiv preprint arXiv:1812.11800 (2018)
Denton, E.L., Zaremba, W., Bruna, J., LeCun, Y., Fergus, R.: Exploiting linear structure within convolutional networks for efficient evaluation. In: Proceedings of the Advances in Neural Information Processing Systems, pp. 1269–1277 (2014)
Google Scholar
Gao, Z., Wu, Y., Jia, Y., Harandi, M.: Learning to optimize on SPD manifolds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7700–7709 (2020)
Google Scholar
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the 13th International Conference on Artificial Intelligence and Statistics, pp. 249–256 (2010)
Google Scholar
Gong, R., et al.: Differentiable soft quantization: bridging full-precision and low-bit neural networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4852–4861 (2019)
Google Scholar
Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)
Google Scholar
Guerra, L., Zhuang, B., Reid, I., Drummond, T.: Switchable precision neural networks. arXiv preprint arXiv:2002.02815 (2020)
Han, S., Mao, H., Dally, W.J.: Deep compression: compressing deep neural networks with pruning, trained quantization and Huffman coding. In: Proceedings of the International Conference on Learning Representations (2016)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034 (2015)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
He, Y., Liu, P., Wang, Z., Yang, Y.: Pruning filter via geometric median for deep convolutional neural networks acceleration. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019)
Google Scholar
Howard, A.G., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
Hu, Q., Wang, P., Cheng, J.: From hashing to CNNs: training binary weight networks via hashing. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)
Google Scholar
Hubara, I., Courbariaux, M., Soudry, D., El-Yaniv, R., Bengio, Y.: Binarized neural networks. In: Proceedings of the Advances in Neural Information Processing Systems, pp. 4107–4115 (2016)
Google Scholar
Iandola, F.N., Han, S., Moskewicz, M.W., Ashraf, K., Dally, W.J., Keutzer, K.: SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and \(<\)0.5 mb model size. arXiv preprint arXiv:1602.07360 (2016)
Jaderberg, M., Vedaldi, A., Zisserman, A.: Speeding up convolutional neural networks with low rank expansions. arXiv preprint arXiv:1405.3866 (2014)
Kasai, H., Sato, H., Mishra, B.: Riemannian stochastic recursive gradient algorithm. In: International Conference on Machine Learning, pp. 2516–2524. PMLR (2018)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of the International Conference on Learning Representations (2015)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Proceedings of the Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Google Scholar
LeCun, Y., Denker, J.S., Solla, S.A.: Optimal brain damage. In: Advances in Neural Information Processing Systems, pp. 598–605 (1990)
Google Scholar
Li, F., Zhang, B., Liu, B.: Ternary weight networks. arXiv preprint arXiv:1605.04711 (2016)
Li, H., Kadav, A., Durdanovic, I., Samet, H., Graf, H.P.: Pruning filters for efficient convnets. In: Proceedings of the International Conference on Learning Representations (2017)
Google Scholar
Lin, M., et al.: Rotated binary neural network. arXiv preprint arXiv:2009.13055 (2020)
Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: Proceedings of the International Conference on Learning Representations (2019)
Google Scholar
Liu, W., et al.: Learning towards minimum hyperspherical energy. In: Advances in Neural Information Processing Systems, pp. 6222–6233 (2018)
Google Scholar
Liu, Z., Wu, B., Luo, W., Yang, X., Liu, W., Cheng, K.T.: Bi-Real Net: enhancing the performance of 1-bit CNNs with improved representational capability and advanced training algorithm. In: Proceedings of the European Conference on Computer Vision (2018)
Google Scholar
Louizos, C., Reisser, M., Blankevoort, T., Gavves, E., Welling, M.: Relaxed quantization for discretized neural networks. In: Proceedings of the International Conference on Learning Representations (2019)
Google Scholar
Louizos, C., Welling, M., Kingma, D.P.: Learning sparse neural networks through \( l_0 \) regularization. In: Proceedings of the International Conference on Learning Representations (2018)
Google Scholar
McDonnell, M.D.: Training wide residual networks for deployment using a single bit for each weight. arXiv preprint arXiv:1802.08530 (2018)
Mishkin, D., Matas, J.: All you need is a good init. arXiv preprint arXiv:1511.06422 (2015)
Miyato, T., Kataoka, T., Koyama, M., Yoshida, Y.: Spectral normalization for generative adversarial networks. arXiv preprint arXiv:1802.05957 (2018)
Pennington, J., Worah, P.: The spectrum of the fisher information matrix of a single-hidden-layer neural network. In: Advances in Neural Information Processing Systems, pp. 5410–5419 (2018)
Google Scholar
Rastegari, M., Ordonez, V., Redmon, J., Farhadi, A.: XNOR-Net: ImageNet classification using binary convolutional neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 525–542. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_32
Chapter Google Scholar
Real, E., Aggarwal, A., Huang, Y., Le, Q.V.: Regularized evolution for image classifier architecture search. arXiv preprint arXiv:1802.01548 (2018)
Roy, S.K., Mhammedi, Z., Harandi, M.: Geometry aware constrained optimization techniques for deep learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4460–4469 (2018)
Google Scholar
Salimans, T., Kingma, D.P.: Weight normalization: a simple reparameterization to accelerate training of deep neural networks. In: Advances in Neural Information Processing Systems, pp. 901–909 (2016)
Google Scholar
Sato, H., Kasai, H., Mishra, B.: Riemannian stochastic variance reduced gradient algorithm with retraction and vector transport. SIAM J. Optim. 29(2), 1444–1472 (2019)
Article MathSciNet Google Scholar
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
Yang, J., et al.: Quantization networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7308–7316 (2019)
Google Scholar
Yin, P., Zhang, S., Lyu, J., Osher, S., Qi, Y., Xin, J.: BinaryRelax: a relaxation approach for training deep neural networks with quantized weights. SIAM J. Imaging Sci. 11(4), 2205–2223 (2018)
Article MathSciNet Google Scholar
Yoshida, Y., Miyato, T.: Spectral norm regularization for improving the generalizability of deep learning. arXiv preprint arXiv:1705.10941 (2017)
Zagoruyko, S., Komodakis, N.: Wide residual networks (2016)
Google Scholar
Zhang, X., Zhou, X., Lin, M., Sun, J.: ShuffleNet: an extremely efficient convolutional neural network for mobile devices. arXiv preprint arXiv:1707.01083 (2017)
Zhang, X., Zou, J., He, K., Sun, J.: Accelerating very deep convolutional networks for classification and detection. IEEE Trans. Pattern Anal. Mach. Intell. 38(10), 1943–1955 (2016)
Article Google Scholar
Zhuang, Z., et al.: Discrimination-aware channel pruning for deep neural networks. In: Proceedings of the Advances in Neural Information Processing Systems (2018)
Google Scholar
Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In: Proceedings of the International Conference on Learning Representations (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

Monash University, Melbourne, Australia
Luis Guerra, Gil Avraham, Yan Zou & Tom Drummond
Australian National University, Canberra, Australia
Ajanthan Thalaiyasingam

Authors

Luis Guerra
View author publications
You can also search for this author in PubMed Google Scholar
Ajanthan Thalaiyasingam
View author publications
You can also search for this author in PubMed Google Scholar
Gil Avraham
View author publications
You can also search for this author in PubMed Google Scholar
Yan Zou
View author publications
You can also search for this author in PubMed Google Scholar
Tom Drummond
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Luis Guerra .

Editor information

Editors and Affiliations

University of the West of England, Bristol, UK
Elias Pimenidis
Lancaster University, Lancaster, UK
Plamen Angelov
Digital Innovation, Teeside University, Middlesbrough, UK
Chrisina Jayne
Democritus University of Thrace, Xanthi, Greece
Antonios Papaleonidas
The University of the West of England, Bristol, UK
Mehmet Aydin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Guerra, L., Thalaiyasingam, A., Avraham, G., Zou, Y., Drummond, T. (2022). Training 1-Bit Networks on a Sphere: A Geometric Approach. In: Pimenidis, E., Angelov, P., Jayne, C., Papaleonidas, A., Aydin, M. (eds) Artificial Neural Networks and Machine Learning – ICANN 2022. ICANN 2022. Lecture Notes in Computer Science, vol 13531. Springer, Cham. https://doi.org/10.1007/978-3-031-15934-3_65

Download citation

DOI: https://doi.org/10.1007/978-3-031-15934-3_65
Published: 15 September 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-15933-6
Online ISBN: 978-3-031-15934-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Training 1-Bit Networks on a Sphere: A Geometric Approach