Skip to main content
Log in

Rotation-equivariant spherical vector networks for objects recognition with unknown poses

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Analyzing 3D objects without pose priors using neural networks is challenging. In view of the shortcoming that spherical convolutional networks lack the construction of a part–whole hierarchy with rotation equivariance for 3D object recognition with unknown poses, which generates whole rotation-equivariant features that cannot be effectively preserved, a rotation-equivariant part–whole hierarchy spherical vector network is proposed in this paper. In our experiments, we map a 3D object onto the unit sphere, construct an ordered list of vectors from the convolutional layers of the rotation-equivariant spherical convolutional network and then construct a part–whole hierarchy to classify the 3D object using the proposed rotation-equivariant routing algorithm. The experimental results show that the proposed method improves not only the recognition of 3D objects with known poses, but also the recognition of 3D objects with unknown poses compared to previous spherical convolutional neural networks. This finding validates the construction of the rotation-equivariant part–whole hierarchy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data availability

We evaluate our method on the public MNIST, ModleNet40 and SHREC15 datasets. The MNIST, ModleNet40 and SHREC15 datasets are available at http://yann.lecun.com/exdb/mnist/, http://modelnet.cs.prin-ceton.edu/ and https://www.icst.pku.edu.cn/zlian/rep-resenta/3d15/dataset/index.htm, respectively.

References

  1. Esteves, C., Allen-Blanchette, C., Makadia, A., Daniilidis, K.: Learning SO(3) equivariant representations with spherical cnns. Int. J. Comput. Vis. 128(3), 588–600 (2020)

    Article  Google Scholar 

  2. Spezialetti, R., Stella, F., Marcon, M., Silva, L., Salti, S., di Stefano, L.: “Learning to orient surfaces by self-supervised spherical cnns,” in Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems, (2020)

  3. Lafarge, M.W., Bekkers, E.J., Pluim, J.P.W., Duits, R., Veta, M.: Roto-translation equivariant convolutional networks: application to histopathology image analysis. Medical Image Anal. 68, 101849 (2021)

    Article  Google Scholar 

  4. Han, J., Ding, J., Xue, N., Xia, G.: “Redet: a rotation-equivariant detector for aerial object detection,” in IEEE Conference on Computer Vision and Pattern Recognition, (2021), pp. 2786–2795

  5. Batzner, S., Musaelian, A., Sun, L.: E(3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials. Nat. Commun. 13, 2453 (2022)

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  6. Huang, Y., Peng, X., Ma, J., Zhang, M.: “3dlinker: An E(3) equivariant variational autoencoder for molecular linker design,” in International Conference on Machine Learning, vol. 162, (2022), pp. 9280–9294

  7. Ganea, O., Huang, X., Bunne, C., Bian, Y., Barzilay, R., Jaakkola, T. S., Krause, A.:“Independent se(3)-equivariant models for end-to-end rigid protein docking,” in International Conference on Learning Representations, (2022)

  8. Chen, Y., Liu, L., Phonevilay, V., Gu, K., Xia, R., Xie, J., Zhang, Q., Yang, K.: Image super-resolution reconstruction based on feature map attention mechanism. Appl. Intell. 51(7), 4367–4380 (2021)

    Article  Google Scholar 

  9. Xia, R., Chen, Y., Ren, B.: Improved anti-occlusion object tracking algorithm using unscented rauch-tung-striebel smoother and kernel correlation filter. J. King Saud Univ. Comput. Inf. Sci. 34(8), 6008–6018 (2022)

    Google Scholar 

  10. Chen, Y., Xia, R., Yang, K., Zou, K.: MFFN: image super-resolution via multi-level features fusion network. Vis. Comput. (2023). https://doi.org/10.1007/s00371-023-02795-0

    Article  Google Scholar 

  11. Chen, P.Y., Xia, R., Zou, K., Yang, K.: FFTI: image inpainting algorithm via features fusion and two-steps inpainting. J. Vis. Commun. Image Represent. 91, 103776 (2023)

    Article  Google Scholar 

  12. Guo, Y., Wang, H., Hu, Q., Liu, H., Liu, L., Bennamoun, M.: Deep learning for 3d point clouds: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 43(12), 4338–4364 (2021)

    Article  PubMed  Google Scholar 

  13. Zhao, Y., Birdal, T., Lenssen, J. E., Menegatti, E., Guibas, L. J., Tombari, F.:“Quaternion equivariant capsule networks for 3d point clouds,” in European Conference on Computer Vision, ser. Lecture Notes in Computer Science, vol. 12346, (2020), pp. 1–19

  14. Shen, Z., Shen, T., Lin, Z., Ma, J.: “Pdo-es2cnns: Partial differential operator based equivariant spherical cnns,” in AAAI Conference on Artificial Intelligence, (2021), pp. 9585–9593

  15. Mensah, P.K., Adekoya, A.F., Ayidzoe, M.A., Baagyire, E.Y.: Capsule networks–a survey. J. King Saud Univ. Comput. Inf. Sci. 34(1), 1295–1310 (2022)

    Google Scholar 

  16. Hinton, G.E.: How to represent part-whole hierarchies in a neural network. Neural Comput. 35(3), 413–452 (2023)

    Article  MathSciNet  PubMed  Google Scholar 

  17. Sabour, S., Frosst, N., Hinton, G. E.: “Dynamic routing between capsules,” in Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems, (2017), pp. 3856–3866

  18. Bengio, Y., Lecun, Y., Hinton, G.: Deep learning for AI. Commun. ACM 64(7), 58–65 (2021)

    Article  Google Scholar 

  19. Chen, Y., Zhao, J., Qiu, Q.: A transformer-based capsule network for 3d part-whole relationship learning. Entropy 24(5), 678 (2022)

    Article  PubMed  PubMed Central  ADS  Google Scholar 

  20. Cohen, T., Welling, M.: “Group equivariant convolutional networks,” in Proceedings of the 33nd International Conference on Machine Learning, 2016, pp. 2990–2999

  21. Lenc, K., Vedaldi, A.: Understanding image representations by measuring their equivariance and equivalence. Int. J. Comput. Vis. 127(5), 456–476 (2019)

    Article  MathSciNet  PubMed  Google Scholar 

  22. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017)

    Article  Google Scholar 

  23. Weiler, M., Cesa, G.: “General e(2)-equivariant steerable cnns,” in Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems, 2019, pp. 14,334–14,345

  24. Wiersma, R., Eisemann, E., Hildebrandt, K.: Cnns on surfaces using rotation-equivariant features. ACM Trans. Graph. 39(4), 92 (2020)

    Article  Google Scholar 

  25. Su, Y., Grauman, K.: “Learning spherical convolution for fast features from 360 degree imagery,” in Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems, (2017), pp. 529–539

  26. Zhao, Q., Zhu, C., Dai, F., Ma, Y., Jin, G., Zhang, Y.: “Distortion-aware cnns for spherical images,” in Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, (2018), pp. 1198–1204

  27. Lee, Y., Jeong, J., Yun, J., Cho, W., Yoon, K.: Spherephd: applying cnns on 360 degree images with non-euclidean spherical polyhedron representation. IEEE Trans. Pattern Anal. Mach. Intell. 44(2), 834–847 (2022)

    Article  PubMed  Google Scholar 

  28. Cohen, T. S., Geiger, M., Köhler, J., Welling, M.: “Spherical cnns,” in International Conference on Learning Representations, (2018)

  29. Jiang, C.M., .Huang, J., Kashinath, K., Prabhat, P.M.,  Nießner, M., “Spherical cnns on unstructured grids,” in International Conference on Learning Representations, (2019)

  30. Perraudin, N., Defferrard, M., Kacprzak, T., Sgier, R.: Deepsphere: Efficient spherical convolutional neural network with healpix sampling for cosmological applications. Astron. Comput. 27, 130–146 (2019)

    Article  ADS  Google Scholar 

  31. McEwen, J.D., Wallis, C.G.R., Mavor-Parker, A.N., “Scattering networks on the sphere for scalable and rotationally equivariant spherical cnns,” in International Conference on Learning Representations, (2022)

  32. Hinton, G.E., Sabour, S., Frosst, N.: “Matrix capsules with EM routing,” in International Conference on Learning Representations, (2018)

  33. Bahadori, M.T.: “Spectral capsule networks,” International Conference on Learning Representations, p. 5, (2018)

  34. Wang, D., Liu, Q.: “An optimization view on dynamic routing between capsules,” in International Conference on Learning Representations, (2018)

  35. Liu, X., Chen, Q., Liu, Y., Siebert, J., Hu, B., Wu, X., Tang, B.: Decomposing word embedding with the capsule network. Knowl. Based Syst. 212, 106611 (2021)

    Article  Google Scholar 

  36. Li, D., Hu, B., Chen, Q., Wang, X., Qi, Q., Wang, L., Liu, H.: Attentive capsule network for click-through rate and conversion rate prediction in online advertising. Knowl. Based Syst. 211, 106522 (2021)

    Article  Google Scholar 

  37. Lian, Y., Gu, D., Hua, J.: Sorcnet: robust non-rigid shape correspondence with enhanced descriptors by shared optimized res-capsuleNet. Vis. Comput. 39(2), 749–763 (2023)

    Article  Google Scholar 

  38. Kostelec, P.J., Rockmore, D.N.: FFTs on the rotation group. J. Fourier Anal. Appl. 14(2), 145–179 (2008)

    Article  MathSciNet  Google Scholar 

  39. Kondor, R., Lin, Z., Trivedi, S.: “Clebsch-gordan nets: a fully fourier space spherical convolutional neural network,” in Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems, (2018), pp. 10 138–10 147

  40. Lian, Z., Shu, C., et al.: “Shrec’15 track: non-rigid 3d shape retrieval,” in Proceedings of the 8th Eurographics Conf. on 3D Object Retrieval, pp. 107-120, (2015)

  41. Bronstein, M.M., Kokkinos, I.: “Scale-invariant heat kernel signatures for non-rigid shape recognition,” in IEEE Conference on Computer Vision and Pattern Recognition, (2010), pp. 1704–1711

  42. Rusu, R.B., Blodow, N., Beetz, M.: “Fast point feature histograms (FPFH) for 3d registration,” in IEEE International Conference on Robotics and Automation, pp. 3212–3217, (2009)

  43. Zheng, Y., Zhao, J., Chen, Y., Tang, C., Yu, S.: 3D mesh model classification with a capsule network. Algorithms 14(3), 99 (2021)

    Article  MathSciNet  Google Scholar 

  44. Chen, Y., Zhao, J., Shi, C., Yuan, D.: Mesh convolution: a novel feature extraction method for 3d nonrigid object classification. IEEE Trans. Multimed. 23, 3098–3111 (2021)

    Article  Google Scholar 

  45. Qi, C.R., Su, H., Mo, K., Guibas, L.J.: “Pointnet: Deep learning on point sets for 3D classification and segmentation,” in IEEE Conference on Computer Vision and Pattern Recognition, pp. 77–85, (2017)

  46. Qi, C.R., Yi, L., Su, H., Guibas, L.J.:“Pointnet++: deep hierarchical feature learning on point sets in a metric space,” in Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems, pp. 5099–5108, (2017)

  47. Kanezaki, A., Matsushita, Y., Nishida, Y.: “Rotationnet: joint object categorization and pose estimation using multiviews from unsupervised viewpoints,” in IEEE Conference on Computer Vision and Pattern Recognition, pp. 5010–5019, (2018)

  48. Wang, Y., Sun, Y., Liu, Z., Sarma, S.E., Bronstein, M.M., Solomon, J.M.: Dynamic graph CNN for learning on point clouds. ACM Trans. Graph. 38(5), 1–12 (2019)

    Article  Google Scholar 

  49. You, Y., Lou, Y., Shi, R., Liu, Q., Tai, Y., Ma, L., Wang, W., Lu, C.: PRIN/SPRIN: on extracting point-wise rotation invariant features. IEEE Trans. Pattern Anal. Mach. Intell. 44(12), 9489–9502 (2022)

    Article  PubMed  Google Scholar 

  50. You, Y., Lou, Y., Liu,Q., Tai, Y., Ma, L., Lu, C., Wang, W.: “Pointwise rotation-invariant network with adaptive sampling and 3d spherical voxel convolution,” in The Thirty-Fourth AAAI Conference on Artificial Intelligence, pp. 717–724, (2020)

  51. Kazhdan, M., Solomon, J., Ben-Chen, M.: Can mean-curvature flow be modified to be non-singular? Comput. Graph. Forum 31(5), 1745–1754 (2012)

    Article  Google Scholar 

Download references

Funding

This research was supported by the National Natural Science Foundation of China (Grant Nos. 62071260 and 62006131) and the Natural Science Foundation of Zhejiang Province (Grant Nos. LZ22F020001 and LQ21F020009).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jieyu Zhao.

Ethics declarations

Conflict of interest

We declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, H., Zhao, J. & Zhang, Q. Rotation-equivariant spherical vector networks for objects recognition with unknown poses. Vis Comput 40, 2089–2101 (2024). https://doi.org/10.1007/s00371-023-02904-z

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-023-02904-z

Keywords

Navigation