Abstract
In the human-following task, the human detection, tracking and identification are fundamental steps to help the mobile robot to follow and maintain an appropriate distance and orientation to the selected target person (STP) without any threatenings. Recently, along with the widespread development of robots in general and service robots in particular, not only the safety, but the flexibility, the naturality and the sociality in applications of human-friendly services and collaborative tasks are also increasingly demanded with a higher level. This request poses more challenges in robustly detecting, tracking and identifying the STP since the human–robot cooperation is more complex and unpredictable. Obviously, the safe natural robot behavior cannot be ensured if the STP is lost or the robot misidentified its target. In this paper, a hierarchical approach is presented to update the states of the STP more robustly during the human-following task. This method is proposed with the goal of achieving good performance (robust, accurate and fast response) to serve safe natural robot behaviors, with modest hardware. The proposed system is verified by a set of experiments, and shown reasonable results.
Graphical abstract

















Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Code or data availability
Not applicable.
References
Islam MJ, Hong J, Sattar J (2019) Person-following by autonomous robots: a categorical overview. Int J Robot Res 38(14):1581–1618
Rudenko A et al (2020) Human motion trajectory prediction: a survey. Int J Robot Res 39(8):895–935
Leigh A et al (2015) Person tracking and following with 2D laser scanners. In: 2015 IEEE international conference on robotics and automation (ICRA), Seattle, Washington, USA, 26–30 May 2015, pp 726–733
Yuan J et al (2018) Laser-based intersection-aware human following with a mobile robot in indoor environments. IEEE Trans Syst Man Cybern Syst 51(1):354–369
Beyer L et al (2018) Deep person detection in two-dimensional range data. IEEE Robot Autom Lett 3(3):2726–2733
Guerrero-Higueras AM et al (2019) Tracking people in a mobile robot from 2D LIDAR scans using full convolutional neural networks for security in cluttered environments. Front Neurorobot 12:85
Eguchi R, Yorozu A, Takahashi M (2019) Spatiotemporal and kinetic gait analysis system based on multisensor fusion of laser range sensor and instrumented insoles. In: 2019 IEEE international conference on robotics and automation (ICRA), Montreal, QC, Canada, 20–24 May 2019, pp 4876–4881
Duong HT, Suh YS (2020) Human gait tracking for normal people and walker users using a 2D LiDAR. IEEE Sens J 20(11):6191–6199
Cha D, Chung W (2020) Human-leg detection in 3D feature space for a person-following mobile robot using 2D LiDARs. Int J Precis Eng Manuf 21(7):1299–1307
Mandischer N et al (2021) Radar tracker for human legs based on geometric and intensity features. In: 2021 29th European signal processing conference (EUSIPCO), Dublin, Ireland, 23–27 August 2021, pp 1521–1525
Eguchi R, Takahashi M (2022) Human leg tracking by fusion of laser range and insole force sensing with Gaussian mixture model-based occlusion compensation. IEEE Sens J 22(4):3704–3714
Torta E et al (2011) Design of robust robotic proxemic behavior. In: Social robotics: third international conference on social robotics, ICSR 2011, Amsterdam, The Netherlands, 24–25 November 2011, Proceedings 3, pp 21–30
Torta E et al (2013) Design of a parametric model of personal space for robotic social navigation. Int J Soc Robot 5(3):357–365
Truong X-T, Ngo T-D (2016) Dynamic social zone based mobile robot navigation for human comfortable safety in social environments. Int J Soc Robot 8(5):663–684
Van Toan N, Khoi PB (2019) Fuzzy-based-admittance controller for safe natural human-robot interaction. Adv Robot 33(15–16):815–823
Van Toan N, Khoi PB (2019) A control solution for closed-form mechanisms of relative manipulation based on fuzzy approach. Int J Adv Robot Syst 16(2):1–11
Van Toan N, Do MH, Jo J (2022) Robust-adaptive-behavior strategy for human-following robots in unknown environments based on fuzzy inference mechanism. Ind Robot Int J Robot Res Appl 49(6):1089–1100
Van Toan N et al (2023) The human-following strategy for mobile robots in mixed environments. Robot Auton Syst 160:104317
Van Toan N, Khoi PB, Yi SY (2021) A MLP-hedge-algebras admittance controller for physical human–robot interaction. Appl Sci 11(12):5459
Van Toan N, Yi S-Y, Khoi PB (2020) Hedge algebras-based admittance controller for safe natural human-robot interaction. Adv Robot 34(24):1546–1558
Khoi PB, Van Toan N (2018) Hedge-algebras-based controller for mechanisms of relative manipulation. Int J Precis Eng Manuf 19(3):377–385
Fosty B et al (2016) Accuracy and reliability of the RGB-D camera for measuring walking speed on a treadmill. Gait Posture 48:113–119
Koide K, Miura J (2016) Identification of a specific person using color, height, and gait features for a person following robot. Robot Auton Syst 84:76–87
Chen BX, Sahdev R, Tsotsos JK (2017) Integrating stereo vision with a CNN tracker for a person-following robot. In: International conference on computer vision systems; computer vision systems. Springer, Berlin/Heidelberg, pp 300–313
Lee B-J et al (2018) Robust human following by deep Bayesian trajectory prediction for home service robots. In: 2018 IEEE international conference on robotics and automation (ICRA), Brisbane, QLD, Australia, 21–25 May 2018, pp 7189–7195
Yang C-A, Song K-T (2019) Control design for robotic human-following and obstacle avoidance using an RGB-D camera. In: 2019 19th International conference on control, automation and systems (ICCAS 2019), Jeju, South Korea, 15–18 October 2019, pp 934–939
Vilas-Boas MC et al (2019) Full-body motion assessment: concurrent validation of two body tracking depth sensors versus a gold standard system during gait. J Biomech 87:189–196
Yagi K et al (2020) Gait measurement at home using a single RGB camera. Gait Posture 76:136–140
Yorozu A, Takahashi M (2020) Estimation of body direction based on gait for service robot applications. Robot Auton Syst 132:103603
Redhwan A, Choi M-T (2020) Deep-learning-based indoor human following of mobile robot using color feature. Sensors (Basel) 20(9):2699
Van Toan N, Hoang MD, Jo J (2022) MoDeT: a low-cost obstacle tracker for self-driving mobile robot navigation using 2D-laser scan. Ind Robot Int J Robot Res Appl 49(6):1032–1041
Ren S et al (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In: Proceedings of the 28th international conference on neural information processing systems, Montreal, Canada, 7–12 December 2015, pp 91–99
Girshick R et al (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: 2014 IEEE conference on computer vision and pattern recognition, Columbus, OH, USA, 23–28 June 2014, pp 580–87
Dai J et al (2016) R-FCN: Object detection via region-based fully convolutional networks. In: Proceedings of the 30th international conference on neural information processing systems, Barcelona, Spain, 5–10 December 2016, pp 379–387
Redmon J et al (2016) You only look once: unified, real-time object detection. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), Las Vegas, NY, USA, 27–30 June 2016, pp 779–788
Liu W et al (2016) SSD: Single shot multibox detector. In: European conference on computer vision, Amsterdam, Netherlands, 11–14 October 2016, pp 21–37
Vu T-H, Osokin A, Laptev I (2015) Context-aware CNNs for person head detection. In: 2015 IEEE international conference on computer vision (ICCV), Santiago, Chile, 07–13 December 2015, pp 2893–2901
Rashid M, Gu X, Lee YJ (2017) Interspecies knowledge transfer for facial keypoint detection. In: IEEE conference on computer vision and pattern recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017, pp 6894–6903
Girdhar R et al (2018) Detect-and-track: efficient pose estimation in videos. In: IEEE conference on computer vision and pattern recognition (CVPR), Salt Lake City, Utah, USA, 19–21 June 2018, pp 350–359
Hong M et al (2022) SSPNet: scale selection pyramid network for tiny person detection from UAV images. IEEE Geosci Remote Sens Lett 19:1–5. https://doi.org/10.1109/LGRS.2021.3103069
Howard AG et al (2017) MobileNets: efficient convolutional neural networks for mobile vision applications. Comput Vis Pattern Recognit. https://doi.org/10.48550/arXiv.1704.04861
Labeling Image (labelImg). Available at: https://github.com/heartexlabs/labelImg
King D (2017) A high quality face recognition with deep metric learning. Available at: http://blog.dlib.net/2017/02/high-quality-face-recognition-with-deep.html
Huang GB, Learned-Miller E (2014) Labeled faces in the wild: updates and new reporting procedures. Technical Report UM-CS-2014–03, University of Massachusetts, Amherst
Huang GB et al (2007) Labeled faces in the wild: a database for studying face recognition in unconstrained environments. Technical Report 07–49, University of Massachusetts, Amherst
Hermans A, Beyer L, Leibe B (2017) In defense of the triplet loss for person re-identification. Comput Vis Patter Recognit. https://arxiv.org/abs/1703.07737
Yuan Y et al (2020) In defense of the triplet loss again: learning robust person re-identification with fast approximated triplet loss and label distillation. In: 2020 IEEE/CVF conference on computer vision and pattern recognition workshops (CVPRW), Seattle, WA, USA, 14–19 June 2020, pp 1454–1463
He K et al (2016) Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016, pp 770–778
He K et al (2016) Identity mappings in deep residual networks. In: Leibe B, Matas J, Sebe N, Welling M (eds) Computer vision—ECCV 2016. ECCV 2016. Lecture notes in computer science, vol 9908. Springer, Cham. https://doi.org/10.1007/978-3-319-46493-0_38
van der Maaten L (2014) Accelerating t-SNE using tree-based algorithm. J Mach Learn Res 15(93):3221–3245
Ku J, Haraked A, Waslander SL (2018) In defense of classical image processing: fast depth completion on the CPU. In: 2018 15th conference on computer and robot vision (CRV), Toronto, Canada, 9–11 May 2018, pp 16–22
Ku J et al (2018) Joint 3D proposal generation and object detection from view aggregation. In: 2018 IEEE/RSJ international conference on intelligent robots and systems (IROS), Madrid, Spain, 01–05 October 2018, pp 1–8
Lahoud J, Ghanem B (2017) 2D-driven 3D object detection in RGB-D images. In: 2017 IEEE international conference on computer vision (ICCV), Venice, Italy, 22–29 October 2017, pp 4622–4630
Qi CR et al (2018) Frustum pointnets for 3D object detection from RGB-D data. In: 2018 IEEE conference on computer vision and pattern recognition (CVPR), Salt Lake City, Utah, USA, 18–22 June 2018, pp 918–927
Shi W et al (2018) Dynamic obstacles rejection for 3D map simultaneous updating. IEEE Access 6:37715–37724
Acknowledgements
Not applicable.
Funding
This work was supported by the Research Program funded by the Seoul National University of Science and Technology.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
No potential conflict of interest was reported by the author(s).
Ethics approval
Not applicable.
Consent to participate
Not applicable.
Consent for publication
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
“Seq” indicates the sequence of the events, “Time” indicates the moment when the action starts in the experiment, and “Snapshot” indicates the time instant in which the action starts in the video.
1.1 Appendix 1: Sequence of events of the experiment in this paper (in the demo video)
Seq | Time (mm:ss) | Action | Snapshot (mm:ss) |
---|---|---|---|
1 | 00:00 | Only use 2D-LiDAR sensor. The robot detects and tracks its STP and other persons appeared in its detection range | |
2 | 00:13 | Only use 2D-LiDAR sensor. The robot is tracking the STP when he is too close to environmental objects | |
3 | 00:29 | Only use 2D-LiDAR sensor. The robot misidentified its STP. The identification number of the STP is switched to the environmental object | |
4 | 00:35 | Hierarchical approach. The STP moves too close to environmental objects. Then, the robot activates the visual human detection | |
5 | 00:55 | Hierarchical approach. The STP moves far from environmental objects. Nothing around the STP, then visual-based modules are deactivated | |
6 | 01:20 | Hierarchical approach. Other people are close to the STP. The robot activates the visual human detection and face identification to identify its correct STP | |
7 | 01:53 | Hierarchical approach. Other people are close to the STP. Firstly, the robot activates the human detection and face identification to identify its STP. If there is no STP’s face, then the body identification is activated | |
8 | 02:37 | Hierarchical approach. There are no other people and environmental objects near the STP. The robot then deactivates visual-based modules | |
9 | 02:46 | Hierarchical approach. During the visual identification procedure, the STP is tracked (2D RGB human tracking and 3D PCL tracking) to match with tracked human-leg after the visual identification finished | |
10 | 02:58 | The STP data collection for the visual identification procedure |
1.2 Appendix 2: Effects and CPU consumptions of sub-methods in the visual identification procedure of the “Appendix 1”
Seq | Time (mm:ss) | Action | Snapshot (mm:ss) |
---|---|---|---|
1 | 00:00 | Only the visual-based human detection is activated | |
2 | 00:31 | Visual-based human detection and body identification are activated | |
3 | 01:24 | Only the face identification is activated |
1.3 Appendix 3: Sequences of the flexible heading during the human–robot cooperation (presented in [17, 18])
Seq | Time (mm:ss) | Action | Snapshot (mm:ss) |
---|---|---|---|
1 | 00:00 (in [17]) | The robot changes its moving heading flexibly when its STP change his moving directions and his sides with respect to the robot local coordinates, in [17] | |
2 | 03:40 (in [17]) | The human-following task is harassed by other people, in [17] | |
3 | 00:00 | ||
(in [18]) | The robot follows and supports the STP to take food trays in the office | ||
4 | 01:00 (in [18]) | The human–robot cooperation in narrow areas, and surrounded by many environmental objects and prohibited areas (in [18]) |
1.4 Appendix 4: Sequences of events of the experiment video of the object tracking using the fusion of 2D-LiDAR and RGB-D cameras.
Seq | Action | Snapshot (mm:ss) |
---|---|---|
1 | The robot is not moving when detecting and tracking objects using only RGB-D cameras | |
2 | The robot is not moving when detecting and tracking objects using only RGB-D cameras. Here, the visualization of the filtered 3D point cloud data is turned on | |
3 | The robot is not moving when detecting and tracking objects using the fusion of the 2D-LiDAR and RGB-D cameras | |
4 | The robot is moving when detecting and tracking objects using the fusion of the 2D-LiDAR and RGB-D cameras | |
5 | The robot is not moving when detecting and tracking objects using the fusion of the 2D-LiDAR and RGB-D cameras |
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Van Toan, N., Bach, SH. & Yi, SY. A hierarchical approach for updating targeted person states in human-following mobile robots. Intel Serv Robotics 16, 287–306 (2023). https://doi.org/10.1007/s11370-023-00463-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11370-023-00463-9