Abstract
Convolution neural networks have become a fundamental model for solving various computer vision tasks. However, these operations are only invariant to translations of objects and their performance suffer under rotation and other affine transformations. This work proposes a novel neural network that leverages geometric invariants, including curvature, higher-order differentials of curves extracted from object boundaries at multiple scales, and the relative orientations of edges. These features are invariant to affine transformation and can improve the robustness of shape recognition in neural networks. Our experiments on the smallNORB dataset with a 2-layer network operating over these geometric invariants outperforms a 3-layer convolutional network by 9.69% while being more robust to affine transformations, even when trained without any data augmentations. Notably, our network exhibits a mere 6% degradation in test accuracy when test images are rotated by 40\(^{\circ }\), in contrast to significant drops of 51.7 and 69% observed in VGG networks and convolution networks, respectively, under the same transformations. Additionally, our models show superior robustness than invariant feature descriptors such as the SIFT-based bag-of-words classifier, and its rotation invariant extension, the RIFT descriptor that suffer drops of 35 and 14.1% respectively, under similar image transformations. Our experimental results further show improved robustness against scale and shear transformations. Furthermore, the multi-scale extension of our geometric invariant network, that extracts curve differentials of higher orders, show enhanced robustness to scaling and shearing transformations.






Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data Availability Statement
All the datasets that are analyzed and used in our experiments are freely available for public use and research in the following repositories https://cs.nyu.edu/~ylclab/data/norb-v1.0-small/, and as a Mendeley format at https://data.mendeley.com/datasets/55xv4y25rs and can be cited as: Rai, Arpit (2022), “smallNORB”, Mendeley Data, V1, doi: 10.17632/55xv4y25rs.1. The different transformations that were applied to the datasets during the experiments are part of the Tensorflow, the tensorflow image processing, and the tensorflow-addons library.
References
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25 (2012)
Albawi, S., Mohammed, T.A., Al-Zawi, S.: in 2017 international conference on engineering and technology (ICET) (IEEE, 2017), pp 1–6
Marr, D., Nishihara, H.K.: Representation and recognition of the spatial organization of three-dimensional shapes. Proceedings of the Royal Society of London. Series B. Biological Sciences 200(1140), pp 269–294 (1978)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60, 91–110 (2004)
Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks (2017)
Mukhopadhyay, P., Chaudhuri, B.B.: A survey of hough transform. Pattern Recogn. 48(3), 993–1010 (2015)
Belongie, S., Malik, J., Puzicha, J.: Shape context: A new descriptor for shape matching and object recognition. Advances in neural information processing systems 13 (2000)
Koushik, J.: Understanding convolutional neural networks. arXiv preprint arXiv:1605.09081 (2016)
Lazebnik, S., Schmid, C., Ponce, J.: in British machine vision conference (BMVC’04) (The British Machine Vision Association (BMVA), 2004), pp. 779–788
Mokhtarian, F., Bober, M.: in Curvature scale space representation: theory, applications, and MPEG-7 Standardization (Springer, 2003), pp. 215–242
Geirhos, R., Rubisch, P., Michaelis, C., Bethge, M., Wichmann, F.A., Brendel, W.: in International conference on learning representations (2019). https://openreview.net/forum?id=Bygh9j09KX
Cohen, T.S. , Welling, M.: Group equivariant convolutional networks (2016)
Kanopoulos, N., Vasanthavada, N., Baker, R.: Design of an image edge detection filter using the sobel operator. IEEE J. Solid-State Circuits 23(2), 358–367 (1988). https://doi.org/10.1109/4.996
V. Nair, G.E. Hinton,: in Proceedings of the 27th international conference on machine learning (ICML-10), pp. 807–814 (2010)
Simonyan, K., Zisserman, A..: Very deep convolutional networks for large-scale image recognition (2014). https://doi.org/10.48550/ARXIV.1409.1556
Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R.: Improving neural networks by preventing co-adaptation of feature detectors (2012). https://doi.org/10.48550/ARXIV.1207.0580
Ahmed, M., Seraj, R., Islam, S.M.S.: The k-means algorithm: a comprehensive survey and performance evaluation. Electronics 9(8), 1295 (2020)
Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)
Acknowledgements
The author would like to thank the University of Edinburgh as a parent institution and resources and the GPU provided by the Google Colaboratory team used for training and testing the models.
Funding
No funding was received from any organization for conducting the study and the experiments.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no Conflict of interest to declare that are relevant to the content of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendices
Code for the curvature filter operation \(C_{x}, C_{y}\) over the image

Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Rai, A. Learning geometric invariants through neural networks. Vis Comput 40, 7093–7106 (2024). https://doi.org/10.1007/s00371-024-03398-z
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-024-03398-z