Skip to main content

Advertisement

Log in

Learning geometric invariants through neural networks

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Convolution neural networks have become a fundamental model for solving various computer vision tasks. However, these operations are only invariant to translations of objects and their performance suffer under rotation and other affine transformations. This work proposes a novel neural network that leverages geometric invariants, including curvature, higher-order differentials of curves extracted from object boundaries at multiple scales, and the relative orientations of edges. These features are invariant to affine transformation and can improve the robustness of shape recognition in neural networks. Our experiments on the smallNORB dataset with a 2-layer network operating over these geometric invariants outperforms a 3-layer convolutional network by 9.69% while being more robust to affine transformations, even when trained without any data augmentations. Notably, our network exhibits a mere 6% degradation in test accuracy when test images are rotated by 40\(^{\circ }\), in contrast to significant drops of 51.7 and 69% observed in VGG networks and convolution networks, respectively, under the same transformations. Additionally, our models show superior robustness than invariant feature descriptors such as the SIFT-based bag-of-words classifier, and its rotation invariant extension, the RIFT descriptor that suffer drops of 35 and 14.1% respectively, under similar image transformations. Our experimental results further show improved robustness against scale and shear transformations. Furthermore, the multi-scale extension of our geometric invariant network, that extracts curve differentials of higher orders, show enhanced robustness to scaling and shearing transformations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data Availability Statement

All the datasets that are analyzed and used in our experiments are freely available for public use and research in the following repositories https://cs.nyu.edu/~ylclab/data/norb-v1.0-small/, and as a Mendeley format at https://data.mendeley.com/datasets/55xv4y25rs and can be cited as: Rai, Arpit (2022), “smallNORB”, Mendeley Data, V1, doi: 10.17632/55xv4y25rs.1. The different transformations that were applied to the datasets during the experiments are part of the Tensorflow, the tensorflow image processing, and the tensorflow-addons library.

References

  1. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25 (2012)

  2. Albawi, S., Mohammed, T.A., Al-Zawi, S.: in 2017 international conference on engineering and technology (ICET) (IEEE, 2017), pp 1–6

  3. Marr, D., Nishihara, H.K.: Representation and recognition of the spatial organization of three-dimensional shapes. Proceedings of the Royal Society of London. Series B. Biological Sciences 200(1140), pp 269–294 (1978)

  4. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60, 91–110 (2004)

    Article  Google Scholar 

  5. Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks (2017)

  6. Mukhopadhyay, P., Chaudhuri, B.B.: A survey of hough transform. Pattern Recogn. 48(3), 993–1010 (2015)

    Article  Google Scholar 

  7. Belongie, S., Malik, J., Puzicha, J.: Shape context: A new descriptor for shape matching and object recognition. Advances in neural information processing systems 13 (2000)

  8. Koushik, J.: Understanding convolutional neural networks. arXiv preprint arXiv:1605.09081 (2016)

  9. Lazebnik, S., Schmid, C., Ponce, J.: in British machine vision conference (BMVC’04) (The British Machine Vision Association (BMVA), 2004), pp. 779–788

  10. Mokhtarian, F., Bober, M.: in Curvature scale space representation: theory, applications, and MPEG-7 Standardization (Springer, 2003), pp. 215–242

  11. Geirhos, R., Rubisch, P., Michaelis, C., Bethge, M., Wichmann, F.A., Brendel, W.: in International conference on learning representations (2019). https://openreview.net/forum?id=Bygh9j09KX

  12. Cohen, T.S. , Welling, M.: Group equivariant convolutional networks (2016)

  13. Kanopoulos, N., Vasanthavada, N., Baker, R.: Design of an image edge detection filter using the sobel operator. IEEE J. Solid-State Circuits 23(2), 358–367 (1988). https://doi.org/10.1109/4.996

    Article  Google Scholar 

  14. V. Nair, G.E. Hinton,: in Proceedings of the 27th international conference on machine learning (ICML-10), pp. 807–814 (2010)

  15. Simonyan, K., Zisserman, A..: Very deep convolutional networks for large-scale image recognition (2014). https://doi.org/10.48550/ARXIV.1409.1556

  16. Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R.: Improving neural networks by preventing co-adaptation of feature detectors (2012). https://doi.org/10.48550/ARXIV.1207.0580

  17. Ahmed, M., Seraj, R., Islam, S.M.S.: The k-means algorithm: a comprehensive survey and performance evaluation. Electronics 9(8), 1295 (2020)

    Article  Google Scholar 

  18. Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)

Download references

Acknowledgements

The author would like to thank the University of Edinburgh as a parent institution and resources and the GPU provided by the Google Colaboratory team used for training and testing the models.

Funding

No funding was received from any organization for conducting the study and the experiments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Arpit Rai.

Ethics declarations

Conflict of interest

The authors have no Conflict of interest to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendices

Code for the curvature filter operation \(C_{x}, C_{y}\) over the image

figure b

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rai, A. Learning geometric invariants through neural networks. Vis Comput 40, 7093–7106 (2024). https://doi.org/10.1007/s00371-024-03398-z

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-024-03398-z

Keywords