Skip to main content
Log in

Efficient Deep Learning-based Semantic Mapping Approach using Monocular Vision for Resource-Limited Mobile Robots

  • Short Paper
  • Published:
Journal of Intelligent & Robotic Systems Aims and scope Submit manuscript

Abstract

In recent years, the demand for robots is not only limited to sophisticated industrial setups, there exists an unprecedented demand for low-cost robots in living places with the capabilities of performing human-centric operations. For the semantic-rich mapping of random environments, current state-of-the-art techniques include sophisticated hardware like Kinect sensor, Lidar, deep learning (DL)-based vision, and stereo vision-based systems. Inevitably, these systems increase the cost of the product which requires expensive hardware for processing the information. It, therefore, creates a hurdle to implementing them on low-cost service robots where interaction matters more than precision. To overcome these issues, in this paper, we propose two novel techniques: 1) a light, yet efficient, semantic mapping technique for scene-wise localization of objects by combining object detection and camera geometry; 2) an accurate and robust novel integration technique for coalition of scene-wise information for large-scale maps. The main goal of this framework is to host a semantic mapping process on a limited processing device like Raspberry Pi. The semantic information can be further integrated into any Human-Robot Interaction (HRI) system. A tensorflow-lite version of Single Shot Detection (SSD) for object detection, a wheel odometer for odometry tracking, and pinhole camera geometry are used for the whole mapping process. The proposed model has demonstrated promising results by accurately mapping the environment with semantic-rich features. Current work is time efficient and suitable for object-orientated task execution of low-cost robots, such as smart toys and other smart home gadgets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Data Availability

The data that support the findings of this study are available to reasonable request.

References

  1. Chandio, A., Shen, Y., Bendechache, M., Inayat, I., Kumar, T.: AUDD: audio Urdu digits dataset for automatic audio Urdu digit recognition. Appl. Sci. 11, 8842 (2021)

    Article  Google Scholar 

  2. Singh, A., Ranjbarzadeh, R., Raj, K., Kumar, T., Roy, A.: Understanding EEG signals for subject-wise definition of armoni activities. ArXiv:2301.00948 (2023)

  3. Singh, A., Raj, K., Kumar, T., Verma, S., Roy, A.: Deep learning-based cost-effective and responsive robot for autism treatment. Drones. 7, 81 (2023)

    Article  Google Scholar 

  4. Kumar, T., Park, J., Ali, M., Uddin, A., Bae, S.: Class specific autoencoders enhance sample diversity. J. Broadcast. Eng. 26, 844–854 (2021)

    Google Scholar 

  5. Roy, A.M.: An efficient multi-scale CNN model with intrinsic feature integration for motor imagery EEG subject classification in brain-machine interfaces. Biomed. Signal Proc. Control. 74, 103496 (2022)

    Article  Google Scholar 

  6. Roy, A.M.: A multi-scale fusion CNN model based on adaptive transfer learning for multi-class MI-classification in BCI system. bioRxiv. https://doi.org/10.1101/2022.03.17.481909 (2022)

  7. Roy, A.M., Bhaduri, J.: A deep learning enabled multi-class plant disease detection model based on computer vision. AI. 2, 413–428. https://doi.org/10.3390/ai2030026 (2022)

  8. Roy, A.M., Bose, R., Bhaduri, J.: A fast accurate fine-grain object detection model based on YOLOv4 deep neural network. Neural Comput. & Applic. 34, 3895–3921 (2022)

    Article  Google Scholar 

  9. Roy, A.M., Bose, R., Bhaduri, J.: Real-time growth stage detection model for high degree of occultation using DenseNet-fused YOLOv4. Comput. Electron. Agric. 193, 106694 (2022)

    Article  Google Scholar 

  10. Jamil, S., Abbas, M.S., Roy, A.M.: Distinguishing malicious drones using vision transformer. AI. 3, 260–273 (2022)

    Article  Google Scholar 

  11. Roy, A.M.: Adaptive transfer learning-based multiscale feature fused deep convolutional neural network for EEG MI multiclassification in brain-computer interface. Eng. Appl. Artif. Intell. 116, 105347, https://doi.org/10.1016/j.engappai.2022.105347 (2022)

  12. Aleem, S., Kumar, T., Little, S., Bendechache, M., Brennan, R., McGuinness, K.: Random data augmentation based enhancement: a generalized enhancement approach for medical datasets. ArXiv:2210.00824 (2022)

  13. Chandio, A., Gui, G., Kumar, T., Ullah, I., Ranjbarzadeh, R., Roy, A., Hussain, A., Shen, Y.: Precise single-stage detector. ArXiv:2210.04252 (2022)

  14. Roy, A., Bhaduri, J., Kumar, T., Raj, K.: A computer vision-based object localization model for endangered wildlife detection. Ecol. Econ, Forthcom (2022)

  15. Kumar, T., Turab, M., Talpur, S., Brennan, R., Bendechache, M.: Forged character detection datasets: passports, driving licences and visa stickers. Int. J. Artif. Intell. Appl. 13, 21–35 (2022)

    Google Scholar 

  16. Roy, A.M., Guha, S.: A data-driven physics-constrained deep learning computational framework for solving von Mises plasticity. Eng. Appl. Artif. Intell. 122, 106049 (2023)

    Article  Google Scholar 

  17. Roy, A.M., Bose, R.: Physics-aware deep learning framework for linear elasticity. arXiv:2302.09668 (2023)

  18. Roy, A.M., Guha, S.: Elastoplastic physics-informed deep learning approach for J2 plasticity. SSRN. SSRN 4332254 (2023)

  19. Crespo, J., Castillo, J.C., Mozos, O.M., Barber, R.: Semantic information for robot navigation: a survey. Appl. Sci. 10(2), 497 (2020)

    Article  Google Scholar 

  20. Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural. Inf. Process. Syst. 27 (2014)

  21. Fernandez-Chaves, D., Ruiz-Sarmiento, J.R., Jaenal, A., Petkov, N., Gonzalez-Jimenez, J.: Robot@ VirtualHome, an ecosystem of virtual environments and tools for realistic indoor robotic simulation. Expert Syst. Appl. 208, 117970 (2022)

    Article  Google Scholar 

  22. Zhao, C., Sun, Q., Zhang, C., Tang, Y., Qian, F.: Monocular depth estimation based on deep learning: an overview. Sci. China Technol. Sci. 63(9), 1612–1627 (2020)

    Article  Google Scholar 

  23. Ming, Y., Meng, X., Fan, C., Yu, H.: Deep learning for monocular depth estimation: a review. Neurocomputing 438, 14–33 (2021)

    Article  Google Scholar 

  24. Khan, F., Salahuddin, S., Javidnia, H.: Deep learning-based monocular depth estimation methods-A state-of-the-art review. Sensors 20(8), 2272 (2020)

    Article  Google Scholar 

  25. Zama Ramirez, P., Poggi, M., Tosi, F., Mattoccia, S., Di Stefano, L.: Geometry meets semantics for semi-supervised monocular depth estimation. In: Computer vision-ACCV 2018: 14th asian conference on computer vision, Perth, Australia, December 2-6, 2018, Revised Selected Papers, Part III 14, pp. 298–313. Springer International Publishing (2019)

  26. Tosi, F., Aleotti, F., Poggi, M., Mattoccia, S.: Learning monocular depth estimation infusing traditional stereo knowledge. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 9799–9809 (2019)

  27. Zou, D., Tan, P., Yu, W.: Collaborative visual SLAM for multiple agents: a brief survey. Virtual Reality & Intelligent Hardware 1(5), 461–482 (2019)

    Article  Google Scholar 

  28. Saputra, M.R.U., Markham, A., Trigoni, N.: Visual SLAM and structure from motion in dynamic environments: a survey. ACM Comput. Surv. (CSUR) 51(2), 1–36 (2018)

    Article  Google Scholar 

  29. Poole, A., Sutcliffe, M., Pierce, G., Gachagan, A.: A novel complete-surface-finding algorithm for online surface scanning with limited view sensors. Sensors 21(22), 7692 (2021)

    Article  Google Scholar 

  30. Liu, F., Shen, C., Lin, G., Reid, I.: Learning depth from single monocular images using deep convolutional neural fields. IEEE Trans. Pattern. Anal. Mach. Intell. 38(10), 2024–2039 (2015)

    Article  Google Scholar 

  31. Kuznietsov, Y., Stuckler, J. and Leibe, B.: Semi-supervised deep learning for monocular depth map prediction. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 6647-6655 (2017)

  32. Cao, Yuanzhouhan, Zifeng Wu, and Chunhua Shen.: Estimating depth from monocular images as classification using deep fully convolutional residual networks. IEEE Trans. Circ. Syst Video Technol. 28(11), 3174–3182 (2017)

  33. Tulsiani, S., Zhou, T., Efros, A.A., Malik, J.: Multi-view supervision for single-view reconstruction via differentiable ray consistency. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2626-2634 (2017)

  34. Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: Proceedings of the IEEE international conference on computer vision, pp. 2650–2658 (2015)

  35. Wang, X., Fouhey, D., Gupta, A.: Designing deep networks for surface normal estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 539-547 (2015)

  36. Celik, K., Somani, A.K.: Monocular vision SLAM for indoor aerial vehicles. J. Electr. Comput. Eng. 2013, 4–4 (2013)

    Google Scholar 

  37. Nguyen, V., Harati, A., Martinelli, A., Siegwart, R., Tomatis, N.: Orthogonal SLAM: a step toward lightweight indoor autonomous navigation. In: 2006 IEEE/RSJ International conference on intelligent robots and systems, pp. 5007–5012. IEEE (2006)

  38. Lin, W., Hu, J., Xu, H., Ye, C., Ye, X., Li, Z.: Graph-based SLAM in indoor environment using corner feature from laser sensor. In: 2017 32nd Youth academic annual conference of chinese association of automation (YAC), pp. 1211–1216. IEEE (2017)

  39. Ruiz-Sarmiento, J.R., Galindo, C., González-Jiménez, J.: Robot@ home, a robotic dataset for semantic mapping of home environments. Int. J. Robot. Res. 36(2), 131–141 (2017)

    Article  Google Scholar 

  40. Janoch, A., Darrell, T., Abbeel, P., Malik, J.: The berkeley 3d object dataset. Techn. Report No. UCB/EECS-2012-85. University of California at Berkeley (2012)

  41. Singh, A., Sha, J., Narayan, K.S., Achim, T. and Abbeel, P.: Bigbird: A large-scale 3d database of object instances. In: 2014 IEEE international conference on robotics and automation (ICRA), pp. 509–516. IEEE (2014)

  42. Xiao, J., Owens, A., Torralba, A.: Sun3d: A database of big spaces reconstructed using sfm and object labels. In: Proceedings of the IEEE international conference on computer vision, pp. 1625–1632 (2013)

  43. Lai, K., Bo, L., Ren, X., Fox, D.: A large-scale hierarchical multi-view rgb-d object dataset. In: 2011 IEEE international conference on robotics and automation, pp. 1817–1824. IEEE (2011)

  44. Singh, A., Narula, R., Rashwan, H.A., Abdel-Nasser, M., Puig, D., Nandi, G.C.: Efficient deep learning-based semantic mapping approach using monocular vision for resource-limited mobile robots. Neural Comput. & Applic. pp. 1–15 (2022)

  45. Sünderhauf, N., Pham, T.T., Latif, Y., Milford, M., Reid, I.: Meaningful maps with object-oriented semantic mapping. In: 2017 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp. 5079-5085. IEEE (2017)

  46. Maolanon, P., Sukvichai, K., Chayopitak, N., Takahashi, A.: Indoor room identify and mapping with virtual based SLAM using furnitures and household objects relationship based on CNNs. In: 2019 10th International conference of information and communication technology for embedded systems (IC-ICTES), pp. 1–6. IEEE (2019)

  47. Sünderhauf, N., Dayoub, F., McMahon, S., Talbot, B., Schulz, R., Corke, P., Wyeth, G., Upcroft, B., Milford, M.: Place categorization and semantic mapping on a mobile robot. In: 2016 IEEE international conference on robotics and automation (ICRA), pp. 5729–5736. IEEE (2016)

  48. McCormac, J., Handa, A., Davison, A., Leutenegger, S.: Semanticfusion: Dense 3d semantic mapping with convolutional neural networks. In: 2017 IEEE international conference on robotics and automation (ICRA), pp. 4628–4635. IEEE (2017)

  49. Ma, L., Stückler, J., Kerl, C., Cremers, D.: Multi-view deep learning for consistent semantic mapping with rgb-d cameras. In: 2017 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp. 598–605. IEEE (2017)

  50. Wang, W., Yang, J., You, X.: Combining ElasticFusion with PSPNet for RGB-D based indoor semantic mapping. In: 2018 Chinese automation congress (CAC), pp. 2996–3001. IEEE (2018)

  51. ermans, A., Floros, G., Leibe, B.: Dense 3d semantic mapping of indoor scenes from rgb-d images. In: 2014 IEEE international conference on robotics and automation (ICRA), pp. 2631–2638. IEEE( 2014)

  52. Liu, B., Gould, S., Koller, D.: Single image depth estimation from predicted semantic labels. In: 2010 IEEE computer society conference on computer vision and pattern recognition, pp. 1253–1260. IEEE (2010)

  53. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: Ssd: single shot multibox detector. In: European conference on computer vision, pp. 21–37. Springer, Cham (2016)

  54. Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861 (2017)

  55. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 580–587 (2014)

  56. Ren, Shaoqing, Kaiming He, Ross Girshick, and Jian Sun. "Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 28 (2015)

  57. Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv:1804.02767 (2018)

  58. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779–788 (2016)

  59. Xie, S., Sun, C., Huang, J., Tu, Z., Murphy, K.: Rethinking spatiotemporal feature learning: speed-accuracy trade-offs in video classification. In: Proceedings of the european conference on computer vision (ECCV), pp. 305–321 (2018)

  60. Clémençon, S., De Arazoza, H., Rossi, F., Tran, V.C.: Hierarchical clustering for graph visualization. arXiv:1210.5693 (2012)

  61. Anand, G., Kumawat, A.K.: Object detection and position tracking in real time using Raspberry Pi. Mater. Today Proc. 47, 3221–3226 (2021)

    Article  Google Scholar 

  62. Dai, J.: Real-time and accurate object detection on edge device with TensorFlow Lite. In: Journal of physics: conference series, vol. 1651, no. 1, pp. 012114. IOP Publishing (2020)

  63. Nachammai, R.M., Kansara, N.M., Lavanya, G., Gopalakrishnan, R.: White line follower using firebird V robot. Int. J. Sci. Res. Dev. 3(10), 224–228 (2015)

    Google Scholar 

  64. Yi, Z., Yongliang, S., Jun, Z.: An improved tiny-yolov3 pedestrian detection algorithm. Optik 183, 17–23 (2019)

    Article  Google Scholar 

  65. Buratowski, T., Giergiel, J.: Dynamics modeling and identification of the amigobot robot. Mech. Mech. Eng. 14(1), 65–79 (2010)

    Google Scholar 

  66. Macario Barros, A., Michel, M., Moline, Y., Corre, G., Carrel, F.: A comprehensive survey of visual slam algorithms. Robotics 11(1), 24 (2022)

    Article  Google Scholar 

  67. Choi, W., Chao, Y.W., Pantofaru, C., Savarese, S.: Indoor scene understanding with geometric and semantic contexts. Int. J. Comput. Vis. 112(2), 204–220 (2015)

Download references

Acknowledgements

The support of the Aeronautical Research and Development Board (Grant No. DARO/08/1051450/M/I) is gratefully acknowledged.

Author information

Authors and Affiliations

Authors

Contributions

AS, KR, AMR - Conceptualization; AS, KR - Data curation; Formal analysis; Investigation; Methodology; Software; Validation; Visualization; Roles/Writing - original draft; Writing - review, and editing. AA, KR Formal analysis; Investigation; Methodology; Roles/Writing - original draft; AS, KR, AMR - Writing, review, and editing; AMR- Supervision; Project Administration.

Corresponding author

Correspondence to Arunabha M. Roy.

Ethics declarations

Conflicts of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Ethical and informed consent for data used

We used a publically available open-source dataset. Thus, informed consent for the dataset included in the study is not required.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Singh, A., Raj, K. & Roy, A.M. Efficient Deep Learning-based Semantic Mapping Approach using Monocular Vision for Resource-Limited Mobile Robots. J Intell Robot Syst 109, 69 (2023). https://doi.org/10.1007/s10846-023-01988-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10846-023-01988-y

Keywords

Navigation