Skip to main content
Log in

A novel target detection and localization method in indoor environment for mobile robot based on improved YOLOv5

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Indoor mobile robots, especially those for the elderly and the disabled, are becoming more and more important to improve their quality of life. The strong interest related to this field can be explained by that the robots can help people grasp or carry things. Accurate detection and localization of target in indoor environment is the premise of this task. Aiming to complete this work, a novel indoor target detection and localization method based on improved YOLOv5 is proposed in this paper for indoor mobile robot equipped with KinectV2 camera. First, we made an indoor scene dataset containing 2000 RGB images and 2000 depth images to enhance the robustness of the 2D detection model in the case of image blur, strong and weak illumination and target occlusion. Second, we proposed an improved YOLOv5-S network for indoor 2D target detection and verified its effectiveness from both theoretical and experimental aspects. When tested on our dataset, our improved YOLOv5-S target detection method achieves the mAP@0.5 indicator of 95.9% and the FPS indicator of 65.36. Third, we proposed an improved mean filtering method to process the depth value of the target center point, so as to solve the noise problem of depth image. Finally, we deduced and sorted out the transformation formula of the target center point from the 2D pixel coordinate system to the 3D camera coordinate system, and used the chessboard calibration method to calibrate our KinectV2 camera, so as to realize the 3D localization of the target center point. When conducting localization experiments in the range of 0.5 m–3 m, the MAE indicator of the localization results of our proposed method is only 11.59 mm, which proves the effectiveness of our proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19

Similar content being viewed by others

Data availability

The processed data required to reproduce these findings cannot be shared at this time as the data also forms part of an ongoing study.

References

  1. Afif M, Ayachi R, Said Y, Pissaloux E, Atri M (2020) An evaluation of RetinaNet on indoor object detection for blind and visually impaired persons assistance navigation. Neural Process Lett 51:2265–2279

    Article  Google Scholar 

  2. Afif M, Ayachi R, Pissaloux E, Said Y, Atri M (2020) Indoor objects detection and recognition for an ICT mobility assistance of visually impaired people. Multimed Tools Appl 79:31645–31662

    Article  Google Scholar 

  3. Amad-ud-Din, Halin IA, Shafie SB (2009) A review on solid state time of flight TOF range image sensors. In: 2009 IEEE Student Conference on Research and Development, pp 246–249

  4. Biswas K, Kumar S et al (2021) SMU: smooth activation function for deep networks using smoothing maximum technique. arXiv preprint http://arXiv.org/2111.04682

  5. Bochkovskiy A, Wang CY et al (2020) Yolov4: optimal speed and accuracy of object detection. arXiv preprint http://arXiv.org/2004.10934

  6. Breiman L (2001) Random forests. Mach Learn 45:5–32

    Article  MATH  Google Scholar 

  7. Cai YX (2020) Li HJ et al. Real-Time Object Detection on Mobile Devices via Compression-Compilation Co-Design. arXiv preprint, YOLObile http://arXiv.org/2009.05697

    Google Scholar 

  8. Chen M, Ren XM et al (2020) Real-time indoor object detection based on deep learning and gradient harmonizing mechanism. In: 2020 IEEE 9th data driven control and learning systems conference, pp 772-777

  9. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE conference on computer vision and pattern recognition, pp 886-893

  10. Ding XT, Li BQ, Wang JB (2021) Geometric property-based convolutional neural network for indoor object detection. Int J Adv Robot Syst 18:172988142199332. https://doi.org/10.1177/1729881421993323

    Article  Google Scholar 

  11. Feng YX, He GT, Wu QZ (2016) A new motion obstacle detection based monocular-vision algorithm. In: 2016 international conference on computational intelligence and applications, pp 31–35

  12. Ge Z, Liu ST et al (2021) YOLOX: exceeding YOLO series in 2021. arXiv preprint http://arXiv.org/2107.08430

  13. Glorot X, Bordes A et al (2011) Deep sparse rectifier neural networks. Proceedings of the fourteenth international conference on artificial intelligence and statistics, In, pp 315–323

    Google Scholar 

  14. Hu T, Zhang H, Zhu XY, Clunis J, Yang G (2018) Depth sensor based human detection for indoor surveillance. Futur Gener Comput Syst 88:540–551

    Article  Google Scholar 

  15. Jung J, Yoon S, Ju S, Heo J (2015) Development of kinematic 3D laser scanning system for indoor mapping and as-built BIM using constrained SLAM. Sensors 15:26430–26456

    Article  Google Scholar 

  16. Kim HS, Choi JS (2008) Advanced indoor localization using ultrasonic sensor and digital compass. In: 2008 international conference on control, automation and systems, pp 223-226

  17. Lin TY, Goyal P, Girshick R, He K, Dollar P (2020) Focal loss for dense object detection. IEEE Trans Pattern Anal Mach Intell 42:318–327

    Article  Google Scholar 

  18. Liu W, Anguelov D et al (2016) SSD: single shot multibox detector. In: computer vision – ECCV 2016, pp 9905:21-37

  19. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60:91–110

    Article  Google Scholar 

  20. Lu FX, Peng HT et al (2020) InstanceFusion: real-time instance-level 3D reconstruction using a single RGBD camera. In: 28th Pacific conference on computer graphics and applications, pp 433-445

  21. Maas AL, Hannun AY et al (2013) Rectifier nonlinearities improve neural network acoustic models. In, Proceedings of the thirteenth international conference on machine learning, p 28

    Google Scholar 

  22. Morar A, Moldoveanu A, Mocanu I, Moldoveanu F, Radoi IE, Asavei V, Gradinaru A, Butean A (2020) A comprehensive survey of indoor localization methods based on computer vision. Sensors. 20. https://doi.org/10.3390/s20092641

  23. Qi CR, Liu W et al (2018) Frustum PointNets for 3D object detection from RGB-D data. In: 31st IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 918-927

  24. Qu SY, Meng C (2014) Statistical classification based fast drivable region detection for indoor Mobile robot. Int J HR 11:1450010. https://doi.org/10.1142/S0219843614500108

    Article  Google Scholar 

  25. Quan L, Pei D, Wang BB et al (2017) Research on human target recognition algorithm of home service robot based on fast-RCNN. International Conference on Intelligent Computation Technology and Automation, In, pp 369–373

    Google Scholar 

  26. Redmon J, Farhadi A (2018) YOLOv3: An Incremental Improvement. arXiv preprint http://arXiv.org/1804.02767

  27. Redmon J, Divvala S et al (2016) You only look once: unified, real-time object detection. In: 2016 IEEE conference on computer vision and pattern recognition, pp 779–788

  28. Redmon J, Farhadi A et al (2017) YOLO9000: better, faster, stronger. In: 2017 IEEE conference on computer vision and pattern recognition, pp 6517–6525

  29. Ren SQ, He KM, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. arXiv preprint http://arXiv.org/1506.01497

  30. Rezatofighi H, Tsoi N et al (2019) Generalized intersection over Union: a metric and a loss for bounding box regression. In: 2019 IEEE/CVF conference on computer vision and pattern recognition, pp 658–666

  31. Sabir MFS, Mehmood I et al (2022) An automated real-time face mask detection system using transfer learning with faster-rcnn in the era of the covid-19 pandemic. Comput Mater Contin 71:4151–4166

    Google Scholar 

  32. Sun H, Meng ZH et al (2018) A 3D convolutional neural network towards real-time Amodal 3D object detection. In: 25th IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 8331-8338

  33. Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vis 57:137–154

    Article  Google Scholar 

  34. Wang S, Sui HG et al (2022) CDSFusion: dense semantic SLAM for indoor environment using CPU computing. Remote Sens 14. https://doi.org/10.3390/rs14040979

  35. Wu XD, Kumar V, Ross Quinlan J, Ghosh J, Yang Q, Motoda H, McLachlan GJ, Ng A, Liu B, Yu PS, Zhou ZH, Steinbach M, Hand DJ, Steinberg D (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14:1–37

    Article  Google Scholar 

  36. Xia JH, Gong J (2021) Precise indoor localization with 3D facility scan data. Comput-Aided Civ Infrastruct Eng 37:1243–1259. https://doi.org/10.1111/mice.12795

    Article  Google Scholar 

  37. Xie Q, Lai YK, Wu J, Wang Z, Zhang Y, Xu K, Wang J (2021) Vote-based 3D object detection with context modeling and SOB-3DNMS. Int J Comput Vis 129:1857–1874. https://doi.org/10.1007/s11263-021-01456-w

    Article  Google Scholar 

  38. Xu YF, Chen J, Yang QN, Guo Q (2019) Human posture recognition and fall detection using Kinect V2 camera. In: 2019 Chinese control conference, pp 8488-8493

  39. Yan B, Fan P, Lei X, Liu Z, Yang F (2021) A real-time apple targets detection method for picking robot based on improved YOLOv5. Remote Sens 13. https://doi.org/10.3390/rs13091619

  40. Zhang ZY (1999) Flexible camera calibration by viewing a plane from unknown orientations. Proceedings of the seventh international conference on computer vision, In, pp 666–673

    Google Scholar 

  41. Zhang Y, Chen HS, Luo Y (2014) A Novel Infrared Landmark Indoor Positioning Method Based on Improved IMM-UKF. In: A novel infrared landmark indoor positioning method based on improved IMM-UKF. Applied Mechanics and Materials, In, pp 880–885

    Google Scholar 

  42. Zheng ZH, Wang P et al (2020) Distance-IoU loss: faster and better learning for bounding box regression. AAAI Conference on Artificial Intelligence, In, pp 12993–13000

    Google Scholar 

  43. Zhou XY, Wang DQ et al (2019) Objects as points. arXiv preprint https://doi.org/10.48550/arXiv.1904.07850

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chunhua Hu.

Ethics declarations

Conflict of interest

None of the authors of this paper has a financial or personal relationship with other people or organizations that could inappropriately influence or bias the content of the paper. It is to specifically state that “No Competing interests are at stake and there is No Conflict of interest” with other people or organizations that could inappropriately influence or bias the content of the paper.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Qian, W., Hu, C., Wang, H. et al. A novel target detection and localization method in indoor environment for mobile robot based on improved YOLOv5. Multimed Tools Appl 82, 28643–28668 (2023). https://doi.org/10.1007/s11042-023-14569-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-14569-w

Keywords

Navigation