Complementary spatial transformer network for real-time 3D object recognition

Krishna Kumar, K. P.; Paul, Varghese

doi:10.1007/s11554-023-01340-5

Complementary spatial transformer network for real-time 3D object recognition

A tiny deep learning model in target space

Research
Published: 22 July 2023

Volume 20, article number 88, (2023)
Cite this article

Journal of Real-Time Image Processing Aims and scope Submit manuscript

K. P. Krishna Kumar¹ &
Varghese Paul²

142 Accesses
Explore all metrics

Abstract

Tiny Deep Learning Models offer many advantages in various applications. From the perspective of statistical machine learning theory the contributions of this paper is to complement the research advances and results obtained so far in real-time 3D object recognition. We propose a Tiny Deep Learning Model named Complementary Spatial Transformer Network (CSTN) for Real-Time 3D object recognition. It turns out that CSTN’s working, and analysis are much simplified in a target space setting. We make algorithmic enhancements to perform CSTN computations faster and keep the learning part of CSTN in minimal size. Finally, we provide the experimental verifications of the results obtained in publicly available point cloud data sets ModelNet40 and ShapeNetCore with our model performing 1.65–2 times better in DPS (Detections/s) rate on GPU hardware for 3D object recognition, when compared to state-of-the-art networks. Complementary Spatial Transformer Network architecture requires only 10–35% of trainable parameters, when compared to state-of-the-art networks, making the network easier to deploy in edge AI devices.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 6

Object detection using YOLO: challenges, architectural successors, datasets and applications

Article 08 August 2022

YOLO-based Object Detection Models: A Review and its Applications

Article 14 March 2024

Image Matching from Handcrafted to Deep Features: A Survey

Article Open access 04 August 2020

Data availability

References

Batty, M., Morphet, R., Masucci, P., Stanilov, K.: Entropy, complexity, and spatial information. J. Geogr. Syst. 16, 363–385 (2014)
Article Google Scholar
Chen, L., Xu, J., Wang, C., Huang, H., Huang, H., Hu, R.: Uprightrl: upright orientation estimation of 3d shapes via reinforcement learning. In: Computer Graphics Forum, vol. 40, pp. 265–275. Wiley Online Library (2021)
Cheney, E.W., Light, W.A.: A Course in Approximation Theory, vol. 101. American Mathematical Soc, Washington, DC (2009)
MATH Google Scholar
Curry, J., Ghrist, R., Nanda, V.: Discrete morse theory for computing cellular sheaf cohomology. Found. Comput. Math. 16, 875–897 (2016)
Article MathSciNet MATH Google Scholar
Disabato, S.: Deep and wide tiny machine learning. In: Special Topics in Information Technology, pp. 79–92. Springer International Publishing, Cham (2022)
Disabato, S., Roveri, M.: Tiny machine learning for concept drift. IEEE Trans. Neural Netw. Learn. Syst. 2022, 89 (2022)
Google Scholar
Fairbank, M., Samothrakis, S., Citi, L.: Deep learning in target space. Rev. Geophys. 59, 3 (2021)
MATH Google Scholar
Ghrist, R.W.: Elementary Applied Topology, volume 1. Createspace, Seattle (2014)
Guo, M.-H., Cai, J.-X., Liu, Z.-N., Tai-Jiang, M., Martin, R.R., Shi-Min, H.: Pct: Point cloud transformer. Comput. Vis. Media 7, 187–199 (2021)
Article Google Scholar
Hackbusch, W., Kühn, S.: A new scheme for the tensor representation. J. Fourier Anal. Appl. 15(5), 706–722 (2009)
Article MathSciNet MATH Google Scholar
Huang, X., Mei, G., Zhang, J., Abbas, R.: A comprehensive survey on point cloud registration. arXiv:2103.02690 (2021)
Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. Adv. Neural Inf. Process. Syst. 28, 78 (2015)
Google Scholar
Lin, J., Chen, W.-M., Lin, Y., Gan, C., Han, S., et al.: Mcunet: tiny deep learning on iot devices. Adv. Neural. Inf. Process. Syst. 33, 11711–11722 (2020)
Google Scholar
Liu, Z., Zhang, J., Liu, L.: Upright orientation of 3d shapes with convolutional networks. Graph. Models 85, 22–29 (2016)
Article MathSciNet Google Scholar
Lu, D., Xie, Q., Wei, M., Xu, L., Li, J.: Transformers in 3d point clouds: a survey. arXiv:2205.07417 (2022)
Maturana, D., Scherer, S.: Voxnet: a 3d convolutional neural network for real-time object recognition. In: 2015 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp. 922–928, IEEE (2015)
Mazenc, E.A., Ranard, D.: Target space entanglement entropy. arXiv:1910.07449 (2019)
Murphy, K.P.: Machine Learning: A Probabilistic Perspective. MIT press, Cambridge (2012)
MATH Google Scholar
Panagakis, Y., Kossaifi, J., Chrysos, G.G., Oldfield, J., Nicolaou, M.A., Anandkumar, A., Zafeiriou, S.: Tensor methods in computer vision and deep learning. Proc. IEEE 109(5), 863–890 (2021)
Article Google Scholar
Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch. In: NIPS-W (2017)
Qi, C.R.., Su, H., Mo, K., Guibas, L.J.: Pointnet: Deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 652–660 (2017)
Robinson, M., Ghrist, R.: Topological localization via signals of opportunity. IEEE Trans. Signal Process. 60(5), 2362–2373 (2012)
Article MathSciNet MATH Google Scholar
Rotman, J.J.: An Introduction to Algebraic Topology, vol. 119. Springer Science & Business Media, Berlin (2013)
MATH Google Scholar
Tao, A.: Unsupervised point cloud reconstruction for classific feature learning. https://github.com/antao97/UnsupervisedPointCloudReconstruction (2020)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł, Polosukhin, I.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30, 89 (2017)
Google Scholar
Zhi, S., Liu, Y., Li, X., Guo, Y.: Lightnet: a lightweight 3d convolutional neural network for real-time 3d object recognition. In: 3DOR@ Eurographics (2017)
Zhi, S., Liu, Y., Li, X., Guo, Y.: Toward real-time 3d object recognition: a lightweight volumetric cnn framework using multitask learning. Comput. Graph. 71, 199–207 (2018)
Article Google Scholar

Download references

Author information

Authors and Affiliations

APJ Abdul Kalam Technological University, CET Campus, Thiruvananthapuram, 695016, Kerala, India
K. P. Krishna Kumar
Department of Computer Science and Engineering, Rajagiri School of Engineering and Technology, Rajagiri Valley, Kochi, 682 039, Kerala, India
Varghese Paul

Authors

K. P. Krishna Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Varghese Paul
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

(1) A new tiny machine learning model generalizing volumetric deep learning (Volumetric representation-VR) on 3D point clouds connecting areas of computational Topology and target subspace learning is proposed. Our experiments reveal the relationship between spatial entropy and Complementary Spatial Transformer Network performance from a statistical learning perspective. The proposed framework and working of network architecture complements the result obtained so far in Volumetric Representation 3D tiny deep learning models. (2) Complementary Spatial Transformer Network architecture requires only 10–35% of trainable parameters, when compared to state-of-the-art networks, making the net- work easily deploy-able in edge AI devices. (3) Complementary Spatial Transformer Network architecture achieves 1.65–2 times more network through- put DPS (Detections/s) performance on GPU Hardware in batch mode for 3D Object recognition when compared to state of the art networks.

Corresponding author

Correspondence to K. P. Krishna Kumar.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Krishna Kumar, K.P., Paul, V. Complementary spatial transformer network for real-time 3D object recognition. J Real-Time Image Proc 20, 88 (2023). https://doi.org/10.1007/s11554-023-01340-5

Download citation

Received: 24 May 2023
Accepted: 03 July 2023
Published: 22 July 2023
DOI: https://doi.org/10.1007/s11554-023-01340-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Complementary spatial transformer network for real-time 3D object recognition

Abstract

Access this article

Similar content being viewed by others

Object detection using YOLO: challenges, architectural successors, datasets and applications

YOLO-based Object Detection Models: A Review and its Applications

Image Matching from Handcrafted to Deep Features: A Survey

Data availability

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Complementary spatial transformer network for real-time 3D object recognition

Abstract

Access this article

Similar content being viewed by others

Object detection using YOLO: challenges, architectural successors, datasets and applications

YOLO-based Object Detection Models: A Review and its Applications

Image Matching from Handcrafted to Deep Features: A Survey

Data availability

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation