Skip to main content

Advertisement

Log in

Fast RGB-D people tracking for service robots

  • Published:
Autonomous Robots Aims and scope Submit manuscript

Abstract

Service robots have to robustly follow and interact with humans. In this paper, we propose a very fast multi-people tracking algorithm designed to be applied on mobile service robots. Our approach exploits RGB-D data and can run in real-time at very high frame rate on a standard laptop without the need for a GPU implementation. It also features a novel depth-based sub-clustering method which allows to detect people within groups or even standing near walls. Moreover, for limiting drifts and track ID switches, an online learning appearance classifier is proposed featuring a three-term joint likelihood. We compared the performances of our system with a number of state-of-the-art tracking algorithms on two public datasets acquired with three static Kinects and a moving stereo pair, respectively. In order to validate the 3D accuracy of our system, we created a new dataset in which RGB-D data are acquired by a moving robot. We made publicly available this dataset which is not only annotated by hand, but the ground-truth position of people and robot are acquired with a motion capture system in order to evaluate tracking accuracy and precision in 3D coordinates. Results of experiments on these datasets are presented, showing that, even without the need for a GPU, our approach achieves state-of-the-art accuracy and superior speed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Notes

  1. http://www.microsoft.com/en-us/kinectforwindows.

  2. http://dinast.com/cyclopes-od.

  3. http://www.pmdtec.com/product_sservices/pmd_photonics_specs.php.

  4. http://www.dei.unipd.it/~munaro/KTP-dataset.html.

  5. http://www.btsbioengineering.com.

  6. Contained in his Matlab toolbox http://vision.ucsd.edu/~pdollar/toolbox.

  7. http://pascal.inrialpes.fr/data/human.

  8. http://pointclouds.org/documentation/tutorials/ground_based_rgbd_people_detection.php.

  9. http://www.ime.unicamp.br/~cnaber/mvnprop.pdf.

  10. http://www.informatik.uni-freiburg.de/~spinello/RGBD-dataset.html.

  11. Bayes++ - http://bayesclasses.sourceforge.net.

  12. Both computers had 4GB DDR3 memory.

  13. This is the resolution used for most of the tests reported in this paper.

References

  • Bajracharya, M., Moghaddam, B., Howard, A., Brennan, S., & Matthies, L. H. (2009). A fast stereo-based system for detecting and tracking pedestrians from a moving vehicle. International Journal of Robotics Research, 28(11–12), 1466–1485.

    Article  Google Scholar 

  • Basso, F., Munaro, M., Michieletto, S., Pagello, E., & Menegatti, E. (2012). IAS-12 (pp. 265–276). Korea: Jeju Island.

    Google Scholar 

  • Bellotto, N., & Hu, H. (2010). Computationally efficient solutions for tracking people with a mobile robot: an experimental evaluation of bayesian filters. Autonomous Robots, 28, 425–438.

    Article  Google Scholar 

  • Bernardin, K., & Stiefelhagen, R. (2008). Evaluating multiple object tracking performance: The clear mot metrics. Journal of Image Video Processing, 2008, 1:1–1:10.

  • Bradski, G. (2000). The OpenCV Library. Dr. Dobb’s Journal of Software Tools.

  • Breitenstein, M. D., Reichlin, F., Leibe, B., Koller-Meier, E., & Gool, L. V. (2009). Robust tracking-by-detection using a detector confidence particle filter. 12th International Conference on Computer Vision, 1, 1515–1522.

    Google Scholar 

  • Carballo, A., Ohya, A., & Yuta, S. (2011). Reliable people detection using range and intensity data from multiple layers of laser range finders on a mobile robot. International Journal of Social Robotics, 3(2), 167–186.

    Article  Google Scholar 

  • Choi, W., Pantofaru, C., & Savarese, S. (2011). Detecting and tracking people using an rgb-d camera via multiple detector fusion. ICCV Workshops, 2011, 1076–1083.

    Google Scholar 

  • Choi, W., Pantofaru, C., & Savarese, S. (2012). A general framework for tracking multiple people from a moving camera. Pattern Analysis and Machine Intelligence (PAMI), 35(7), 1577–1591.

    Google Scholar 

  • Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. Computer Vision and Pattern Recognition, 1, 886–893.

    Google Scholar 

  • Dollár, P., Wojek, C., Schiele, B., & Perona, P. (2009). Pedestrian detection: A benchmark. Computer Vision and Pattern Recognition, 2009, 304–311.

    Google Scholar 

  • Ess, A., Leibe, B., Schindler, K., & Van Gool, L. (2009). Moving obstacle detection in highly dynamic scenes. International Conference on Robotics and Automation, 4451–4458.

  • Ess, A., Leibe, B., Schindler, K., & Van Gool, L. (2008). A mobile vision system for robust multi-person tracking. Computer Vision and Pattern Recognition, 2008, 1–8.

    Google Scholar 

  • Everingham, M., Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2010). The pascal visual object classes (voc) challenge. International Journal of Computer Vision, 88, 303–338.

    Article  Google Scholar 

  • Felzenszwalb, P., Girshick, R., McAllester, D., & Ramanan, D. (2010). Object detection with discriminatively trained part based models. Pattern Analysis and Machine Intelligence (PAMI), 32(9), 1627–1645.

    Article  Google Scholar 

  • Geiger, A., Lenz, P., & Urtasun, R. (2012). Are we ready for autonomous driving? The kitti vision benchmark suite. CVPR 2012 (pp. 3354–3361). USA: Providence.

    Google Scholar 

  • Grabner, H., & Bischof, H. (2006). On-line boosting and vision. In CVPR, Vol. 1, pp. 260–267. IEEE Computer Society

  • Janoch, A., Karayev, S., Jia, Y., Barron, J., Fritz, M., Saenko, K., et al. (2011). A category-level 3-D object dataset: Putting the kinect to work. In ICCV workshop on consumer depth cameras in computer vision.

  • Kim, W., Yibing, W., Ovsiannikov, I., Lee, S., Park, Y., Chung, C., et al. (2012). A 1.5Mpixel RGBZ CMOS image sensor for simultaneous color and range image capture. In ISSCC 2012, San Francisco, USA, pp. 392–394.

  • Konstantinova, P., Udvarev, A., & Semerdjiev, T. (2003). A study of a target tracking algorithm using global nearest neighbor approach. In CompSysTec 2003: e-Learning, pp. 290–295. ACM

  • Koppula, H. S., Anand, A., Joachims, T., & Saxena, A. (2011). Semantic labeling of 3d point clouds for indoor scenes. Advances in Neural Information Processing Systems, 244–252.

  • Lai, K., Bo, L., Ren, X., & Fox, D. (2011). A large-scale hierarchical multi-view RGB-D object dataset. International Conference on Robotics and Automation, 2011, 1817–1824.

    Google Scholar 

  • Luber, M., Spinello, L., & Arras, K. O. (2011). People tracking in RGB-D data with on-line boosted target models. Intelligent Robots and Systems, 2011, 3844–3849.

    Google Scholar 

  • Martin, C., Schaffernicht, E., Scheidig, A., & Gross, H.-M. (2006). Multi-modal sensor fusion using a probabilistic aggregation scheme for people detection and tracking. Robotics and Autonomous Systems, 54(9), 721–728.

    Article  Google Scholar 

  • Mitzel, D., & Leibe, B. (2011). Real-time multi-person tracking with detector assisted structure propagation. ICCV Workshops, 2011, 974–981.

    Google Scholar 

  • Mozos, O., Kurazume, R., & Hasegawa, T. (2010). Multi-part people detection using 2d range data. International Journal of Social Robotics, 2, 31–40.

    Article  Google Scholar 

  • Munaro, M., Basso, F., & Menegatti, E. (2012). Tracking people within groups with RGB-D data. IROS 2012 (pp. 2101–2107). Portugal: Algarve.

    Google Scholar 

  • Munaro, M., Basso, F., Michieletto, S., Pagello, E., & Menegatti, E. (2013). A software architecture for RGB-D people tracking based on ros framework for a mobile robot. Frontiers of Intelligent Autonomous Systems, 466, 53–68.

    Article  Google Scholar 

  • Navarro-Serment, L. E., Mertz, C., & Hebert, M. (2009). Pedestrian detection and tracking using three-dimensional ladar data. The International Journal of Robotics Research, 103–112.

  • Pandey, G., McBride, J. R., & Eustice, R. M. (2011). Ford campus vision and lidar data set. International Journal of Robotics Research, 30(13), 1543–1552.

    Article  Google Scholar 

  • Pantofaru, C. (2010). The Moving People, Moving Platform Dataset. http://bags.willowgarage.com/downloads/people_dataset/.

  • Quigley, M., Gerkey, B., Conley, K., Faust, J., Foote, T., Leibs, J., et al. (2009). Ros: An open-source robot operating system ICRA.

  • Rusu, R. B., & Cousins, S. (2011). 3D is here: Point Cloud Library (PCL). In ICRA 2011, Shanghai, China, May 9–13, pp. 1–4.

  • Satake, J., & Miura, J. (2009). Robust stereo-based person detection and tracking for a person following robot. Workshop on people detection and tracking (ICRA 2009).

  • Silberman, N., & Fergus, R. (2011). Indoor scene segmentation using a structured light sensor. In ICCV 2011— workshop on 3D representation and recognition, pp. 601–608.

  • Spinello, L., Arras, K. O., Triebel, R., & Siegwart, R. (2010). A layered approach to people detection in 3d range data. In AAAI’10. Atlanta, USA: PGAI Track.

  • Spinello, L., Luber, M., & Arras, K. O. (2011). Tracking people in 3d using a bottom-up top-down people detector. In ICRA 2011 (pp. 1304–1310). Shanghai.

  • Spinello, L., & Arras, K. O. (2011). People detection in RGB-D data. Intelligent Robots and Systems, 2011, 3838–3843.

    Google Scholar 

  • Sturm, J., Engelhard, N., Endres, F., Burgard, W., & Cremers, D. (2012). A benchmark for the evaluation of RGB-D SLAM systems. Intelligent Robots and Systems, 2012, 573–580.

    Google Scholar 

  • Sung, J., Ponce, C., Selman, B., & Saxena, A. (2012). Unstructured human activity detection from RGBD images. IEEE International Conference on Robotics and Automation, 2012, 842–849.

    Google Scholar 

  • Xing, J., Ai, H., & Lao, S. (2009). Multi-object tracking through occlusions by local tracklets filtering and global tracklets association with detection responses. Computer Vision and Pattern Recognition 1200–1207.

  • Zhang, L., Li, Y., & N. R. (2008). Global data association for multi-object tracking using network flows. Computer Vision and Pattern Recognition 1–8.

  • Zhang, H., & Parker, L. E. (2011). 4-dimensional local spatio-temporal features for human activity recognition. Intelligent Robots and Systems, 2011, 2044–2049.

    Google Scholar 

Download references

Acknowledgments

We wish to thank the Biongineering of Movement Laboratory of the University of Padova for providing the motion capture facility, in particular Martina Negretto and Annamaria Guiotto for their help for the data acquisition and all the people who took part to the KTP Dataset. We wish also to thank Filippo Basso and Stefano Michieletto as co-authors of the previous publications related to this work and Mauro Antonello for the advices on the disparity computation for the ETH dataset.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Matteo Munaro.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Munaro, M., Menegatti, E. Fast RGB-D people tracking for service robots. Auton Robot 37, 227–242 (2014). https://doi.org/10.1007/s10514-014-9385-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10514-014-9385-0

Keywords

Navigation