Efficient Deep Learning-based Semantic Mapping Approach using Monocular Vision for Resource-Limited Mobile Robots

Singh, Aditya; Raj, Kislay; Roy, Arunabha M.

doi:10.1007/s10846-023-01988-y

Efficient Deep Learning-based Semantic Mapping Approach using Monocular Vision for Resource-Limited Mobile Robots

Short Paper
Published: 11 November 2023

Volume 109, article number 69, (2023)
Cite this article

Journal of Intelligent & Robotic Systems Aims and scope Submit manuscript

113 Accesses
Explore all metrics

Abstract

In recent years, the demand for robots is not only limited to sophisticated industrial setups, there exists an unprecedented demand for low-cost robots in living places with the capabilities of performing human-centric operations. For the semantic-rich mapping of random environments, current state-of-the-art techniques include sophisticated hardware like Kinect sensor, Lidar, deep learning (DL)-based vision, and stereo vision-based systems. Inevitably, these systems increase the cost of the product which requires expensive hardware for processing the information. It, therefore, creates a hurdle to implementing them on low-cost service robots where interaction matters more than precision. To overcome these issues, in this paper, we propose two novel techniques: 1) a light, yet efficient, semantic mapping technique for scene-wise localization of objects by combining object detection and camera geometry; 2) an accurate and robust novel integration technique for coalition of scene-wise information for large-scale maps. The main goal of this framework is to host a semantic mapping process on a limited processing device like Raspberry Pi. The semantic information can be further integrated into any Human-Robot Interaction (HRI) system. A tensorflow-lite version of Single Shot Detection (SSD) for object detection, a wheel odometer for odometry tracking, and pinhole camera geometry are used for the whole mapping process. The proposed model has demonstrated promising results by accurately mapping the environment with semantic-rich features. Current work is time efficient and suitable for object-orientated task execution of low-cost robots, such as smart toys and other smart home gadgets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient deep learning-based semantic mapping approach using monocular vision for resource-limited mobile robots

Article 25 April 2022

Integration of CNN into a Robotic Architecture to Build Semantic Maps of Indoor Environments

Follow Me: Real-Time in the Wild Person Tracking Application for Autonomous Robotics

Data Availability

The data that support the findings of this study are available to reasonable request.

References

Chandio, A., Shen, Y., Bendechache, M., Inayat, I., Kumar, T.: AUDD: audio Urdu digits dataset for automatic audio Urdu digit recognition. Appl. Sci. 11, 8842 (2021)
Article Google Scholar
Singh, A., Ranjbarzadeh, R., Raj, K., Kumar, T., Roy, A.: Understanding EEG signals for subject-wise definition of armoni activities. ArXiv:2301.00948 (2023)
Singh, A., Raj, K., Kumar, T., Verma, S., Roy, A.: Deep learning-based cost-effective and responsive robot for autism treatment. Drones. 7, 81 (2023)
Article Google Scholar
Kumar, T., Park, J., Ali, M., Uddin, A., Bae, S.: Class specific autoencoders enhance sample diversity. J. Broadcast. Eng. 26, 844–854 (2021)
Google Scholar
Roy, A.M.: An efficient multi-scale CNN model with intrinsic feature integration for motor imagery EEG subject classification in brain-machine interfaces. Biomed. Signal Proc. Control. 74, 103496 (2022)
Article Google Scholar
Roy, A.M.: A multi-scale fusion CNN model based on adaptive transfer learning for multi-class MI-classification in BCI system. bioRxiv. https://doi.org/10.1101/2022.03.17.481909 (2022)
Roy, A.M., Bhaduri, J.: A deep learning enabled multi-class plant disease detection model based on computer vision. AI. 2, 413–428. https://doi.org/10.3390/ai2030026 (2022)
Roy, A.M., Bose, R., Bhaduri, J.: A fast accurate fine-grain object detection model based on YOLOv4 deep neural network. Neural Comput. & Applic. 34, 3895–3921 (2022)
Article Google Scholar
Roy, A.M., Bose, R., Bhaduri, J.: Real-time growth stage detection model for high degree of occultation using DenseNet-fused YOLOv4. Comput. Electron. Agric. 193, 106694 (2022)
Article Google Scholar
Jamil, S., Abbas, M.S., Roy, A.M.: Distinguishing malicious drones using vision transformer. AI. 3, 260–273 (2022)
Article Google Scholar
Roy, A.M.: Adaptive transfer learning-based multiscale feature fused deep convolutional neural network for EEG MI multiclassification in brain-computer interface. Eng. Appl. Artif. Intell. 116, 105347, https://doi.org/10.1016/j.engappai.2022.105347 (2022)
Aleem, S., Kumar, T., Little, S., Bendechache, M., Brennan, R., McGuinness, K.: Random data augmentation based enhancement: a generalized enhancement approach for medical datasets. ArXiv:2210.00824 (2022)
Chandio, A., Gui, G., Kumar, T., Ullah, I., Ranjbarzadeh, R., Roy, A., Hussain, A., Shen, Y.: Precise single-stage detector. ArXiv:2210.04252 (2022)
Roy, A., Bhaduri, J., Kumar, T., Raj, K.: A computer vision-based object localization model for endangered wildlife detection. Ecol. Econ, Forthcom (2022)
Kumar, T., Turab, M., Talpur, S., Brennan, R., Bendechache, M.: Forged character detection datasets: passports, driving licences and visa stickers. Int. J. Artif. Intell. Appl. 13, 21–35 (2022)
Google Scholar
Roy, A.M., Guha, S.: A data-driven physics-constrained deep learning computational framework for solving von Mises plasticity. Eng. Appl. Artif. Intell. 122, 106049 (2023)
Article Google Scholar
Roy, A.M., Bose, R.: Physics-aware deep learning framework for linear elasticity. arXiv:2302.09668 (2023)
Roy, A.M., Guha, S.: Elastoplastic physics-informed deep learning approach for J2 plasticity. SSRN. SSRN 4332254 (2023)
Crespo, J., Castillo, J.C., Mozos, O.M., Barber, R.: Semantic information for robot navigation: a survey. Appl. Sci. 10(2), 497 (2020)
Article Google Scholar
Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural. Inf. Process. Syst. 27 (2014)
Fernandez-Chaves, D., Ruiz-Sarmiento, J.R., Jaenal, A., Petkov, N., Gonzalez-Jimenez, J.: Robot@ VirtualHome, an ecosystem of virtual environments and tools for realistic indoor robotic simulation. Expert Syst. Appl. 208, 117970 (2022)
Article Google Scholar
Zhao, C., Sun, Q., Zhang, C., Tang, Y., Qian, F.: Monocular depth estimation based on deep learning: an overview. Sci. China Technol. Sci. 63(9), 1612–1627 (2020)
Article Google Scholar
Ming, Y., Meng, X., Fan, C., Yu, H.: Deep learning for monocular depth estimation: a review. Neurocomputing 438, 14–33 (2021)
Article Google Scholar
Khan, F., Salahuddin, S., Javidnia, H.: Deep learning-based monocular depth estimation methods-A state-of-the-art review. Sensors 20(8), 2272 (2020)
Article Google Scholar
Zama Ramirez, P., Poggi, M., Tosi, F., Mattoccia, S., Di Stefano, L.: Geometry meets semantics for semi-supervised monocular depth estimation. In: Computer vision-ACCV 2018: 14th asian conference on computer vision, Perth, Australia, December 2-6, 2018, Revised Selected Papers, Part III 14, pp. 298–313. Springer International Publishing (2019)
Tosi, F., Aleotti, F., Poggi, M., Mattoccia, S.: Learning monocular depth estimation infusing traditional stereo knowledge. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 9799–9809 (2019)
Zou, D., Tan, P., Yu, W.: Collaborative visual SLAM for multiple agents: a brief survey. Virtual Reality & Intelligent Hardware 1(5), 461–482 (2019)
Article Google Scholar
Saputra, M.R.U., Markham, A., Trigoni, N.: Visual SLAM and structure from motion in dynamic environments: a survey. ACM Comput. Surv. (CSUR) 51(2), 1–36 (2018)
Article Google Scholar
Poole, A., Sutcliffe, M., Pierce, G., Gachagan, A.: A novel complete-surface-finding algorithm for online surface scanning with limited view sensors. Sensors 21(22), 7692 (2021)
Article Google Scholar
Liu, F., Shen, C., Lin, G., Reid, I.: Learning depth from single monocular images using deep convolutional neural fields. IEEE Trans. Pattern. Anal. Mach. Intell. 38(10), 2024–2039 (2015)
Article Google Scholar
Kuznietsov, Y., Stuckler, J. and Leibe, B.: Semi-supervised deep learning for monocular depth map prediction. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 6647-6655 (2017)
Cao, Yuanzhouhan, Zifeng Wu, and Chunhua Shen.: Estimating depth from monocular images as classification using deep fully convolutional residual networks. IEEE Trans. Circ. Syst Video Technol. 28(11), 3174–3182 (2017)
Tulsiani, S., Zhou, T., Efros, A.A., Malik, J.: Multi-view supervision for single-view reconstruction via differentiable ray consistency. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2626-2634 (2017)
Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: Proceedings of the IEEE international conference on computer vision, pp. 2650–2658 (2015)
Wang, X., Fouhey, D., Gupta, A.: Designing deep networks for surface normal estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 539-547 (2015)
Celik, K., Somani, A.K.: Monocular vision SLAM for indoor aerial vehicles. J. Electr. Comput. Eng. 2013, 4–4 (2013)
Google Scholar
Nguyen, V., Harati, A., Martinelli, A., Siegwart, R., Tomatis, N.: Orthogonal SLAM: a step toward lightweight indoor autonomous navigation. In: 2006 IEEE/RSJ International conference on intelligent robots and systems, pp. 5007–5012. IEEE (2006)
Lin, W., Hu, J., Xu, H., Ye, C., Ye, X., Li, Z.: Graph-based SLAM in indoor environment using corner feature from laser sensor. In: 2017 32nd Youth academic annual conference of chinese association of automation (YAC), pp. 1211–1216. IEEE (2017)
Ruiz-Sarmiento, J.R., Galindo, C., González-Jiménez, J.: Robot@ home, a robotic dataset for semantic mapping of home environments. Int. J. Robot. Res. 36(2), 131–141 (2017)
Article Google Scholar
Janoch, A., Darrell, T., Abbeel, P., Malik, J.: The berkeley 3d object dataset. Techn. Report No. UCB/EECS-2012-85. University of California at Berkeley (2012)
Singh, A., Sha, J., Narayan, K.S., Achim, T. and Abbeel, P.: Bigbird: A large-scale 3d database of object instances. In: 2014 IEEE international conference on robotics and automation (ICRA), pp. 509–516. IEEE (2014)
Xiao, J., Owens, A., Torralba, A.: Sun3d: A database of big spaces reconstructed using sfm and object labels. In: Proceedings of the IEEE international conference on computer vision, pp. 1625–1632 (2013)
Lai, K., Bo, L., Ren, X., Fox, D.: A large-scale hierarchical multi-view rgb-d object dataset. In: 2011 IEEE international conference on robotics and automation, pp. 1817–1824. IEEE (2011)
Singh, A., Narula, R., Rashwan, H.A., Abdel-Nasser, M., Puig, D., Nandi, G.C.: Efficient deep learning-based semantic mapping approach using monocular vision for resource-limited mobile robots. Neural Comput. & Applic. pp. 1–15 (2022)
Sünderhauf, N., Pham, T.T., Latif, Y., Milford, M., Reid, I.: Meaningful maps with object-oriented semantic mapping. In: 2017 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp. 5079-5085. IEEE (2017)
Maolanon, P., Sukvichai, K., Chayopitak, N., Takahashi, A.: Indoor room identify and mapping with virtual based SLAM using furnitures and household objects relationship based on CNNs. In: 2019 10th International conference of information and communication technology for embedded systems (IC-ICTES), pp. 1–6. IEEE (2019)
Sünderhauf, N., Dayoub, F., McMahon, S., Talbot, B., Schulz, R., Corke, P., Wyeth, G., Upcroft, B., Milford, M.: Place categorization and semantic mapping on a mobile robot. In: 2016 IEEE international conference on robotics and automation (ICRA), pp. 5729–5736. IEEE (2016)
McCormac, J., Handa, A., Davison, A., Leutenegger, S.: Semanticfusion: Dense 3d semantic mapping with convolutional neural networks. In: 2017 IEEE international conference on robotics and automation (ICRA), pp. 4628–4635. IEEE (2017)
Ma, L., Stückler, J., Kerl, C., Cremers, D.: Multi-view deep learning for consistent semantic mapping with rgb-d cameras. In: 2017 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp. 598–605. IEEE (2017)
Wang, W., Yang, J., You, X.: Combining ElasticFusion with PSPNet for RGB-D based indoor semantic mapping. In: 2018 Chinese automation congress (CAC), pp. 2996–3001. IEEE (2018)
ermans, A., Floros, G., Leibe, B.: Dense 3d semantic mapping of indoor scenes from rgb-d images. In: 2014 IEEE international conference on robotics and automation (ICRA), pp. 2631–2638. IEEE( 2014)
Liu, B., Gould, S., Koller, D.: Single image depth estimation from predicted semantic labels. In: 2010 IEEE computer society conference on computer vision and pattern recognition, pp. 1253–1260. IEEE (2010)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: Ssd: single shot multibox detector. In: European conference on computer vision, pp. 21–37. Springer, Cham (2016)
Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861 (2017)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 580–587 (2014)
Ren, Shaoqing, Kaiming He, Ross Girshick, and Jian Sun. "Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 28 (2015)
Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv:1804.02767 (2018)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779–788 (2016)
Xie, S., Sun, C., Huang, J., Tu, Z., Murphy, K.: Rethinking spatiotemporal feature learning: speed-accuracy trade-offs in video classification. In: Proceedings of the european conference on computer vision (ECCV), pp. 305–321 (2018)
Clémençon, S., De Arazoza, H., Rossi, F., Tran, V.C.: Hierarchical clustering for graph visualization. arXiv:1210.5693 (2012)
Anand, G., Kumawat, A.K.: Object detection and position tracking in real time using Raspberry Pi. Mater. Today Proc. 47, 3221–3226 (2021)
Article Google Scholar
Dai, J.: Real-time and accurate object detection on edge device with TensorFlow Lite. In: Journal of physics: conference series, vol. 1651, no. 1, pp. 012114. IOP Publishing (2020)
Nachammai, R.M., Kansara, N.M., Lavanya, G., Gopalakrishnan, R.: White line follower using firebird V robot. Int. J. Sci. Res. Dev. 3(10), 224–228 (2015)
Google Scholar
Yi, Z., Yongliang, S., Jun, Z.: An improved tiny-yolov3 pedestrian detection algorithm. Optik 183, 17–23 (2019)
Article Google Scholar
Buratowski, T., Giergiel, J.: Dynamics modeling and identification of the amigobot robot. Mech. Mech. Eng. 14(1), 65–79 (2010)
Google Scholar
Macario Barros, A., Michel, M., Moline, Y., Corre, G., Carrel, F.: A comprehensive survey of visual slam algorithms. Robotics 11(1), 24 (2022)
Article Google Scholar
Choi, W., Chao, Y.W., Pantofaru, C., Savarese, S.: Indoor scene understanding with geometric and semantic contexts. Int. J. Comput. Vis. 112(2), 204–220 (2015)

Download references

Acknowledgements

The support of the Aeronautical Research and Development Board (Grant No. DARO/08/1051450/M/I) is gratefully acknowledged.

Author information

Authors and Affiliations

Center of Intelligent Robots, Indian Institute of Information Technology, Allahabad, India
Aditya Singh
SFI for Research Training in Artificial Intelligence, Dublin City University, Dublin 9, Ireland
Kislay Raj
Aerospace Engineering, University of Michigan, Ann Arbor, MI, USA
Arunabha M. Roy

Authors

Aditya Singh
View author publications
You can also search for this author in PubMed Google Scholar
Kislay Raj
View author publications
You can also search for this author in PubMed Google Scholar
Arunabha M. Roy
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

AS, KR, AMR - Conceptualization; AS, KR - Data curation; Formal analysis; Investigation; Methodology; Software; Validation; Visualization; Roles/Writing - original draft; Writing - review, and editing. AA, KR Formal analysis; Investigation; Methodology; Roles/Writing - original draft; AS, KR, AMR - Writing, review, and editing; AMR- Supervision; Project Administration.

Corresponding author

Correspondence to Arunabha M. Roy.

Ethics declarations

Conflicts of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Ethical and informed consent for data used

We used a publically available open-source dataset. Thus, informed consent for the dataset included in the study is not required.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Singh, A., Raj, K. & Roy, A.M. Efficient Deep Learning-based Semantic Mapping Approach using Monocular Vision for Resource-Limited Mobile Robots. J Intell Robot Syst 109, 69 (2023). https://doi.org/10.1007/s10846-023-01988-y

Download citation

Received: 26 May 2023
Accepted: 03 October 2023
Published: 11 November 2023
DOI: https://doi.org/10.1007/s10846-023-01988-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient Deep Learning-based Semantic Mapping Approach using Monocular Vision for Resource-Limited Mobile Robots

Abstract

Access this article

Similar content being viewed by others

Efficient deep learning-based semantic mapping approach using monocular vision for resource-limited mobile robots

Integration of CNN into a Robotic Architecture to Build Semantic Maps of Indoor Environments

Follow Me: Real-Time in the Wild Person Tracking Application for Autonomous Robotics

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflicts of interest

Ethical and informed consent for data used

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Efficient Deep Learning-based Semantic Mapping Approach using Monocular Vision for Resource-Limited Mobile Robots

Abstract

Access this article

Similar content being viewed by others

Efficient deep learning-based semantic mapping approach using monocular vision for resource-limited mobile robots

Integration of CNN into a Robotic Architecture to Build Semantic Maps of Indoor Environments

Follow Me: Real-Time in the Wild Person Tracking Application for Autonomous Robotics

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflicts of interest

Ethical and informed consent for data used

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation