Efficient deep learning-based semantic mapping approach using monocular vision for resource-limited mobile robots

Original Article
Published: 25 April 2022

Volume 34, pages 15617–15631, (2022)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Aditya Singh ORCID: orcid.org/0000-0001-6281-9431^1,2,
Raghav Narula^2,3,
Hatem A. Rashwan¹,
Mohamed Abdel-Nasser⁴,
Domenec Puig¹ &
…
G. C. Nandi²

654 Accesses
4 Citations
Explore all metrics

Abstract

Semantic mapping is still challenging for household collaborative robots. Deep learning models have proved their capability to extract semantics from the scene and learn robot odometry. For interfacing semantic information with robot odometry, existing approaches extract both semantics and robot odometry separately and then integrate them using fusion techniques. Such approaches face many issues while integration, and the mapping procedure requires a lot of memory and resources to process the information. In an attempt to produce accurate semantic mapping with resource-limited devices, this paper proposes an efficient deep learning-based model to simultaneously estimate robot odometry by using monocular sequence frames and detecting objects in the frames. The proposed model includes two main components: using a YOLOv3 object detector as a backbone and a convolutional long short-term (Conv-LSTM) recurrent neural network to model the changes in camera pose. The unique advantage of the proposed model is that it boycotts the need for data association and the requirement of multi-sensor fusion. We conducted the experiments on a LoCoBot robot in a laboratory environment, attaining satisfactory results with such limited computational resources. Additionally, we tested the proposed method on the Kitti dataset, reaching an average test loss of 15.93 on various sequences. The experiments are documented in this video https://www.youtube.com/watch?v=hnmqwxpaTEw.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Fig. 2

Fig. 3

Fig. 4

Fig. 5

Fig. 6

Fig. 7

Similar content being viewed by others

Efficient Deep Learning-based Semantic Mapping Approach using Monocular Vision for Resource-Limited Mobile Robots

Article 11 November 2023

Automatic robot Manoeuvres detection using computer vision and deep learning techniques: a perspective of internet of robotics things (IoRT)

Article 16 November 2022

Integration of CNN into a Robotic Architecture to Build Semantic Maps of Indoor Environments

Chapter © 2019

References

Han X, Li S, Wang X, Zhou W (2021) Semantic mapping for mobile robots in indoor scenes: a survey. Information 12(2):21, 4734. https://doi.org/10.3390/s21144734
Article Google Scholar
Chen Y, Zhang J, Lou Y (2021) Topological and semantic map generation for mobile robot indoor navigation. In: International conference on intelligent robotics and applications. Springer, Cham, pp 337–347
Maolanon P, Sukvichai K, Chayopitak N, Takahashi A (2019) Indoor room identify and mapping with virtual based slam using furnitures and household objects relationship based on cnns. In: 10th International conference of information and communication technology for embedded systems (IC-ICTES), pp 1-6. IEEE
Narita G, Seno T, Ishikawa T, Kaji Y (2019) Panopticfusion: Online volumetric semantic mapping at the level of stuff and things. In: 2019 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 4205-4212. IEEE
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 779-788. IEEE
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: Single shot multibox detector. In: European conference on computer vision (ECCV). Springer, Cham, pp 21–37
Stadnik AV, Sazhin PS, Hnatic S (2020) Comparative performance analysis of neural network real-time object detections in different implementations. In: EPJ web of conferences, Vol. 226, p 02020. EDP Sciences
Wang S, Clark R, Wen H, Trigoni N (2017) Deepvo: Towards end-to-end visual odometry with deep recurrent convolutional neural networks. In: 2017 IEEE international conference on robotics and automation (ICRA), pp 2043-2050. IEEE
Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767
Liu Y, Sun P, Wergeles N, Shang Y (2021) A survey and performance evaluation of deep learning methods for small object detection. Expert Syst Appl 172:114602. https://doi.org/10.1016/j.eswa.2021.114602
Article Google Scholar
Bouguettaya A, Yu Q, Liu X, Zhou X, Song A (2015) Efficient agglomerative hierarchical clustering. Expert Syst Appl 42(5):2785–2797
Article Google Scholar
Krul S, Pantos C, Frangulea M, Valente J (2021) Visual SLAM for indoor livestock and farming using a small drone with a monocular camera: a feasibility study. Drones 5(2):41
Article Google Scholar
Alsadik B, Karam S (2021) The simultaneous localization and mapping (SLAM)-An overview. Surv Geospatial Eng J 2(01):01–12
Google Scholar
Ismail H, Roy R, Sheu LJ, Chieng WH, Tang LC (2022) Exploration-based SLAM (e-SLAM) for the indoor mobile robot using lidar. Sensors 22(4):1689
Article Google Scholar
Pham TT, Reid I, Latif Y, Gould S (2015) Hierarchical higher-order regression forest fields: an application to 3d indoor scene labelling. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp. 2246-2254. IEEE
Mozos OM, Triebel R, Jensfelt P, Rottmann A, Burgard W (2007) Supervised semantic labeling of places using information extracted from sensor data. Robot Autonom Syst 55(5):391–402
Article Google Scholar
Vineet V, Miksik O, Lidegaard M, Nießner M, Golodetz S, Prisacariu VA, Torr PH (2015, May) Incremental dense semantic stereo fusion for large-scale semantic scene reconstruction. In: 2015 IEEE international conference on robotics and automation (ICRA), pp. 75-82. IEEE
Kundu A, Li Y, Dellaert F, Li F, Rehg JM (2014) Joint semantic segmentation and 3d reconstruction from monocular video. In: European conference on computer vision (ECCV). Springer, Cham, pp 703–718
Nistér D, Naroditsky O, Bergen J (2004) Visual odometry. In: Proceedings of the 2004 IEEE conference on computer vision and pattern recognition (CVPR), Vol 1, pp. 1-8. IEEE
Kerl C, Sturm J, Cremers D (2013) Dense visual SLAM for RGB-D cameras. In: 2013 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 2100-2106. IEEE
Taketomi T, Uchiyama H, Ikeda S (2017) Visual SLAM algorithms: a survey from 2010 to 2016. IPSJ Trans Comput Vis Appl 9(1):1–11
Article Google Scholar
Davison AJ, Reid ID, Molton ND, Stasse O (2007) MonoSLAM: real-time single camera SLAM. IEEE Trans Pattern Anal Mach Intell 29(6):1052–1067
Article Google Scholar
Davison AJ (2003) Real-time simultaneous localisation and mapping with a single camera. In: IEEE international conference on computer vision, Vol. 3, pp 1403-1403. IEEE
Davison AJ, Reid ID, Molton ND, Stasse O (2007) MonoSLAM: real-time single camera SLAM. IEEE Trans Pattern Anal Mach Intell 29(6):1052–1067
Article Google Scholar
Civera J, Davison AJ, Montiel JM (2008) Inverse depth parametrization for monocular SLAM. IEEE Trans Robot 24(5):932–945
Article Google Scholar
Martinez-Cantin R, Castellanos JA (2005) Unscented SLAM for large-scale outdoor environments. In: 2005 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 3427-3432. IEEE
Chekhlov D, Pupilli M, Mayol-Cuevas W, Calway A (2006) Real-time and robust monocular SLAM using predictive multi-resolution descriptors. In: International symposium on visual computing, pp 276-285. Springer
Holmes S, Klein G, Murray DW (2008) A square root unscented Kalman filter for visual monoSLAM. In: 2008 IEEE International conference on robotics and automation (ICRA), pp 3710-3716. IEEE
Klein G, Murray D (2007) Parallel tracking and mapping for small AR workspaces. In: 6th IEEE and ACM international symposium on mixed and augmented reality, pp. 225-234. IEEE
Klein G, Murray D (2008) Improving the agility of keyframe-based SLAM. In: European conference on computer vision (ECCV), pp. 802-815. Springer
Geiger A, Ziegler J, Stiller C (2011) Stereoscan: Dense 3d reconstruction in real-time. In: 2011 IEEE intelligent vehicles symposium (IV), pp 963-968. IEEE
Mur-Artal R, Montiel JMM, Tardos JD (2015) ORB-SLAM: a versatile and accurate monocular SLAM system. IEEE Trans Robot 31(5):1147–1163
Article Google Scholar
Newcombe RA, Lovegrove SJ, Davison AJ (2011) DTAM: Dense tracking and mapping in real-time. In: 2011 International conference on computer vision (CVPR), pp 2320-2327. IEEE
Abdel-Nasser M, Mahmoud K (2019) Accurate photovoltaic power forecasting models using deep LSTM-RNN. Neural Comput Appl 31(7):2727–2740
Article Google Scholar
Jiao J, Jiao J, Mo Y, Liu W, Deng Z (2019) MagicVO: an end-to-end hybrid CNN and bi-LSTM method for monocular visual odometry. IEEE Access 7:94118–94127
Article Google Scholar
Alzaidy R, Caragea C, Giles CL (2019) Bi-LSTM-CRF sequence labeling for keyphrase extraction from scholarly documents. In: The world wide web conference, pp 2551-2557
Pandey T, Pena D, Byrne J, Moloney D (2021) Leveraging deep learning for visual odometry using optical flow. Sensors 21(4):1313. https://doi.org/10.3390/s21041313
Article Google Scholar
Ban X, Wang H, Chen T, Wang Y and Xiao Y (2021) Monocular Visual Odometry Based on Depth and Optical Flow Using Deep Learning. IEEE Trans Instrument Measure 70:1-19, Art no. 2501619. https://doi.org/10.1109/TIM.2020.3024011
Lalapura VS, Amudha J, Satheesh HS (2021) Recurrent neural networks for edge intelligence: a survey. ACM Comput Surv (CSUR) 54(4):1–38
Article Google Scholar
Abdel-Nasser M, Mahmoud K, Lehtonen M (2021) HIFA: promising heterogeneous solar irradiance forecasting approach based on Kernel mapping. IEEE Access 9:144906–144915
Article Google Scholar
Liu Y, Wang H, Wang J, Wang X (2021) Unsupervised monocular visual odometry based on confidence evaluation. IEEE Trans Intell Transp Syst. Early access, 1-10, https://doi.org/10.1109/TITS.2021.3053412
Coughlan J, Yuille AL (2000) The Manhattan world assumption: Regularities in scene statistics which enable Bayesian inference. In: Proceedings of the 13th international conference on neural information processing systems, pp 809-815
Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Zitnick CL (2014) September) Microsoft coco: Common objects in context. In: European conference on computer vision (ECCV). Springer, Cham, pp 740–755
Pyrobot (accessed date: 21 March 2022). https://pyrobot.org
Keselman L, Iselin Woodfill J, Grunnet-Jepsen A, Bhowmik A (2017) Intel realsense stereoscopic depth cameras. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 1-10
Geiger A, Lenz P, Stiller C, Urtasun R (2013) Vision meets robotics: the Kitti dataset. Int J Robot Res 32(11):1231–1237
Article Google Scholar
Han X, Li S, Wang X, Zhou W (2021) Semantic mapping for mobile robots in indoor scenes: a survey. Information 12(2):92. https://doi.org/10.3390/info12020092
Article Google Scholar
Zeng Z, Zhou Y, Jenkins OC, Desingh K (2018) Semantic mapping with simultaneous object detection and localization. In: 2018 IEEE/RSJ IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 911-918. IEEE
Mazurek P, Hachaj T (2021) SLAM-OR: simultaneous localization, mapping and object recognition using video sensors data in open environments from the sparse points cloud. Sensors 21(14):4734
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Engineering and Mathematics, Universitat Rovira i Virgili, Tarragona, Spain
Aditya Singh, Hatem A. Rashwan & Domenec Puig
Center of Intelligent Robots, Indian Institute of Information Technology, Allahabad, India
Aditya Singh, Raghav Narula & G. C. Nandi
Thapar Institute of Engineering and Technology, Patiala, India
Raghav Narula
Department of Electrical Engineering, Aswan University, Aswan, Egypt
Mohamed Abdel-Nasser

Authors

Aditya Singh
View author publications
You can also search for this author in PubMed Google Scholar
Raghav Narula
View author publications
You can also search for this author in PubMed Google Scholar
Hatem A. Rashwan
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed Abdel-Nasser
View author publications
You can also search for this author in PubMed Google Scholar
Domenec Puig
View author publications
You can also search for this author in PubMed Google Scholar
G. C. Nandi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aditya Singh.

Ethics declarations

Conflict of interest

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Singh, A., Narula, R., Rashwan, H.A. et al. Efficient deep learning-based semantic mapping approach using monocular vision for resource-limited mobile robots. Neural Comput & Applic 34, 15617–15631 (2022). https://doi.org/10.1007/s00521-022-07273-7

Download citation

Received: 04 March 2022
Accepted: 04 April 2022
Published: 25 April 2022
Issue Date: September 2022
DOI: https://doi.org/10.1007/s00521-022-07273-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions