Abstract
Tiny Deep Learning Models offer many advantages in various applications. From the perspective of statistical machine learning theory the contributions of this paper is to complement the research advances and results obtained so far in real-time 3D object recognition. We propose a Tiny Deep Learning Model named Complementary Spatial Transformer Network (CSTN) for Real-Time 3D object recognition. It turns out that CSTN’s working, and analysis are much simplified in a target space setting. We make algorithmic enhancements to perform CSTN computations faster and keep the learning part of CSTN in minimal size. Finally, we provide the experimental verifications of the results obtained in publicly available point cloud data sets ModelNet40 and ShapeNetCore with our model performing 1.65–2 times better in DPS (Detections/s) rate on GPU hardware for 3D object recognition, when compared to state-of-the-art networks. Complementary Spatial Transformer Network architecture requires only 10–35% of trainable parameters, when compared to state-of-the-art networks, making the network easier to deploy in edge AI devices.
Similar content being viewed by others
Data availability
The data that support the findings of this study are available in the public domain with the following url https://github.com/antao97/PointCloudDatasets under MIT License, Copyright (c) 2019, An Tao.
References
Batty, M., Morphet, R., Masucci, P., Stanilov, K.: Entropy, complexity, and spatial information. J. Geogr. Syst. 16, 363–385 (2014)
Chen, L., Xu, J., Wang, C., Huang, H., Huang, H., Hu, R.: Uprightrl: upright orientation estimation of 3d shapes via reinforcement learning. In: Computer Graphics Forum, vol. 40, pp. 265–275. Wiley Online Library (2021)
Cheney, E.W., Light, W.A.: A Course in Approximation Theory, vol. 101. American Mathematical Soc, Washington, DC (2009)
Curry, J., Ghrist, R., Nanda, V.: Discrete morse theory for computing cellular sheaf cohomology. Found. Comput. Math. 16, 875–897 (2016)
Disabato, S.: Deep and wide tiny machine learning. In: Special Topics in Information Technology, pp. 79–92. Springer International Publishing, Cham (2022)
Disabato, S., Roveri, M.: Tiny machine learning for concept drift. IEEE Trans. Neural Netw. Learn. Syst. 2022, 89 (2022)
Fairbank, M., Samothrakis, S., Citi, L.: Deep learning in target space. Rev. Geophys. 59, 3 (2021)
Ghrist, R.W.: Elementary Applied Topology, volume 1. Createspace, Seattle (2014)
Guo, M.-H., Cai, J.-X., Liu, Z.-N., Tai-Jiang, M., Martin, R.R., Shi-Min, H.: Pct: Point cloud transformer. Comput. Vis. Media 7, 187–199 (2021)
Hackbusch, W., Kühn, S.: A new scheme for the tensor representation. J. Fourier Anal. Appl. 15(5), 706–722 (2009)
Huang, X., Mei, G., Zhang, J., Abbas, R.: A comprehensive survey on point cloud registration. arXiv:2103.02690 (2021)
Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. Adv. Neural Inf. Process. Syst. 28, 78 (2015)
Lin, J., Chen, W.-M., Lin, Y., Gan, C., Han, S., et al.: Mcunet: tiny deep learning on iot devices. Adv. Neural. Inf. Process. Syst. 33, 11711–11722 (2020)
Liu, Z., Zhang, J., Liu, L.: Upright orientation of 3d shapes with convolutional networks. Graph. Models 85, 22–29 (2016)
Lu, D., Xie, Q., Wei, M., Xu, L., Li, J.: Transformers in 3d point clouds: a survey. arXiv:2205.07417 (2022)
Maturana, D., Scherer, S.: Voxnet: a 3d convolutional neural network for real-time object recognition. In: 2015 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp. 922–928, IEEE (2015)
Mazenc, E.A., Ranard, D.: Target space entanglement entropy. arXiv:1910.07449 (2019)
Murphy, K.P.: Machine Learning: A Probabilistic Perspective. MIT press, Cambridge (2012)
Panagakis, Y., Kossaifi, J., Chrysos, G.G., Oldfield, J., Nicolaou, M.A., Anandkumar, A., Zafeiriou, S.: Tensor methods in computer vision and deep learning. Proc. IEEE 109(5), 863–890 (2021)
Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch. In: NIPS-W (2017)
Qi, C.R.., Su, H., Mo, K., Guibas, L.J.: Pointnet: Deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 652–660 (2017)
Robinson, M., Ghrist, R.: Topological localization via signals of opportunity. IEEE Trans. Signal Process. 60(5), 2362–2373 (2012)
Rotman, J.J.: An Introduction to Algebraic Topology, vol. 119. Springer Science & Business Media, Berlin (2013)
Tao, A.: Unsupervised point cloud reconstruction for classific feature learning. https://github.com/antao97/UnsupervisedPointCloudReconstruction (2020)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł, Polosukhin, I.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30, 89 (2017)
Zhi, S., Liu, Y., Li, X., Guo, Y.: Lightnet: a lightweight 3d convolutional neural network for real-time 3d object recognition. In: 3DOR@ Eurographics (2017)
Zhi, S., Liu, Y., Li, X., Guo, Y.: Toward real-time 3d object recognition: a lightweight volumetric cnn framework using multitask learning. Comput. Graph. 71, 199–207 (2018)
Author information
Authors and Affiliations
Contributions
(1) A new tiny machine learning model generalizing volumetric deep learning (Volumetric representation-VR) on 3D point clouds connecting areas of computational Topology and target subspace learning is proposed. Our experiments reveal the relationship between spatial entropy and Complementary Spatial Transformer Network performance from a statistical learning perspective. The proposed framework and working of network architecture complements the result obtained so far in Volumetric Representation 3D tiny deep learning models. (2) Complementary Spatial Transformer Network architecture requires only 10–35% of trainable parameters, when compared to state-of-the-art networks, making the net- work easily deploy-able in edge AI devices. (3) Complementary Spatial Transformer Network architecture achieves 1.65–2 times more network through- put DPS (Detections/s) performance on GPU Hardware in batch mode for 3D Object recognition when compared to state of the art networks.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Krishna Kumar, K.P., Paul, V. Complementary spatial transformer network for real-time 3D object recognition. J Real-Time Image Proc 20, 88 (2023). https://doi.org/10.1007/s11554-023-01340-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11554-023-01340-5