Skip to main content
Log in

Complementary spatial transformer network for real-time 3D object recognition

A tiny deep learning model in target space

  • Research
  • Published:
Journal of Real-Time Image Processing Aims and scope Submit manuscript

Abstract

Tiny Deep Learning Models offer many advantages in various applications. From the perspective of statistical machine learning theory the contributions of this paper is to complement the research advances and results obtained so far in real-time 3D object recognition. We propose a Tiny Deep Learning Model named Complementary Spatial Transformer Network (CSTN) for Real-Time 3D object recognition. It turns out that CSTN’s working, and analysis are much simplified in a target space setting. We make algorithmic enhancements to perform CSTN computations faster and keep the learning part of CSTN in minimal size. Finally, we provide the experimental verifications of the results obtained in publicly available point cloud data sets ModelNet40 and ShapeNetCore with our model performing 1.65–2 times better in DPS (Detections/s) rate on GPU hardware for 3D object recognition, when compared to state-of-the-art networks. Complementary Spatial Transformer Network architecture requires only 10–35% of trainable parameters, when compared to state-of-the-art networks, making the network easier to deploy in edge AI devices.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data availability

The data that support the findings of this study are available in the public domain with the following url https://github.com/antao97/PointCloudDatasets under MIT License, Copyright (c) 2019, An Tao.

References

  1. Batty, M., Morphet, R., Masucci, P., Stanilov, K.: Entropy, complexity, and spatial information. J. Geogr. Syst. 16, 363–385 (2014)

    Article  Google Scholar 

  2. Chen, L., Xu, J., Wang, C., Huang, H., Huang, H., Hu, R.: Uprightrl: upright orientation estimation of 3d shapes via reinforcement learning. In: Computer Graphics Forum, vol. 40, pp. 265–275. Wiley Online Library (2021)

  3. Cheney, E.W., Light, W.A.: A Course in Approximation Theory, vol. 101. American Mathematical Soc, Washington, DC (2009)

    MATH  Google Scholar 

  4. Curry, J., Ghrist, R., Nanda, V.: Discrete morse theory for computing cellular sheaf cohomology. Found. Comput. Math. 16, 875–897 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  5. Disabato, S.: Deep and wide tiny machine learning. In: Special Topics in Information Technology, pp. 79–92. Springer International Publishing, Cham (2022)

  6. Disabato, S., Roveri, M.: Tiny machine learning for concept drift. IEEE Trans. Neural Netw. Learn. Syst. 2022, 89 (2022)

    Google Scholar 

  7. Fairbank, M., Samothrakis, S., Citi, L.: Deep learning in target space. Rev. Geophys. 59, 3 (2021)

    MATH  Google Scholar 

  8. Ghrist, R.W.: Elementary Applied Topology, volume 1. Createspace, Seattle (2014)

  9. Guo, M.-H., Cai, J.-X., Liu, Z.-N., Tai-Jiang, M., Martin, R.R., Shi-Min, H.: Pct: Point cloud transformer. Comput. Vis. Media 7, 187–199 (2021)

    Article  Google Scholar 

  10. Hackbusch, W., Kühn, S.: A new scheme for the tensor representation. J. Fourier Anal. Appl. 15(5), 706–722 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  11. Huang, X., Mei, G., Zhang, J., Abbas, R.: A comprehensive survey on point cloud registration. arXiv:2103.02690 (2021)

  12. Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. Adv. Neural Inf. Process. Syst. 28, 78 (2015)

    Google Scholar 

  13. Lin, J., Chen, W.-M., Lin, Y., Gan, C., Han, S., et al.: Mcunet: tiny deep learning on iot devices. Adv. Neural. Inf. Process. Syst. 33, 11711–11722 (2020)

    Google Scholar 

  14. Liu, Z., Zhang, J., Liu, L.: Upright orientation of 3d shapes with convolutional networks. Graph. Models 85, 22–29 (2016)

    Article  MathSciNet  Google Scholar 

  15. Lu, D., Xie, Q., Wei, M., Xu, L., Li, J.: Transformers in 3d point clouds: a survey. arXiv:2205.07417 (2022)

  16. Maturana, D., Scherer, S.: Voxnet: a 3d convolutional neural network for real-time object recognition. In: 2015 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp. 922–928, IEEE (2015)

  17. Mazenc, E.A., Ranard, D.: Target space entanglement entropy. arXiv:1910.07449 (2019)

  18. Murphy, K.P.: Machine Learning: A Probabilistic Perspective. MIT press, Cambridge (2012)

    MATH  Google Scholar 

  19. Panagakis, Y., Kossaifi, J., Chrysos, G.G., Oldfield, J., Nicolaou, M.A., Anandkumar, A., Zafeiriou, S.: Tensor methods in computer vision and deep learning. Proc. IEEE 109(5), 863–890 (2021)

    Article  Google Scholar 

  20. Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch. In: NIPS-W (2017)

  21. Qi, C.R.., Su, H., Mo, K., Guibas, L.J.: Pointnet: Deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 652–660 (2017)

  22. Robinson, M., Ghrist, R.: Topological localization via signals of opportunity. IEEE Trans. Signal Process. 60(5), 2362–2373 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  23. Rotman, J.J.: An Introduction to Algebraic Topology, vol. 119. Springer Science & Business Media, Berlin (2013)

    MATH  Google Scholar 

  24. Tao, A.: Unsupervised point cloud reconstruction for classific feature learning. https://github.com/antao97/UnsupervisedPointCloudReconstruction (2020)

  25. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł, Polosukhin, I.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30, 89 (2017)

    Google Scholar 

  26. Zhi, S., Liu, Y., Li, X., Guo, Y.: Lightnet: a lightweight 3d convolutional neural network for real-time 3d object recognition. In: 3DOR@ Eurographics (2017)

  27. Zhi, S., Liu, Y., Li, X., Guo, Y.: Toward real-time 3d object recognition: a lightweight volumetric cnn framework using multitask learning. Comput. Graph. 71, 199–207 (2018)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Contributions

(1) A new tiny machine learning model generalizing volumetric deep learning (Volumetric representation-VR) on 3D point clouds connecting areas of computational Topology and target subspace learning is proposed. Our experiments reveal the relationship between spatial entropy and Complementary Spatial Transformer Network performance from a statistical learning perspective. The proposed framework and working of network architecture complements the result obtained so far in Volumetric Representation 3D tiny deep learning models. (2) Complementary Spatial Transformer Network architecture requires only 10–35% of trainable parameters, when compared to state-of-the-art networks, making the net- work easily deploy-able in edge AI devices. (3) Complementary Spatial Transformer Network architecture achieves 1.65–2 times more network through- put DPS (Detections/s) performance on GPU Hardware in batch mode for 3D Object recognition when compared to state of the art networks.

Corresponding author

Correspondence to K. P. Krishna Kumar.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Krishna Kumar, K.P., Paul, V. Complementary spatial transformer network for real-time 3D object recognition. J Real-Time Image Proc 20, 88 (2023). https://doi.org/10.1007/s11554-023-01340-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11554-023-01340-5

Keywords

Navigation