Abstract
In this study, we propose a new integrated computer vision system designed to track multiple human beings and extract their silhouette with a pan-tilt stereo camera, so that it can assist in gesture and gait recognition in the field of Human–Robot Interaction (HRI). The proposed system consists of three modules: detection, tracking and silhouette extraction. These modules are robust to camera movements, and they work interactively in near real-time. Detection was performed by camera ego-motion compensation and disparity segmentation. For tracking, we present an efficient mean shift-based tracking method in which the tracking objects are characterized as disparity weighted color histograms. The silhouette was obtained by two-step segmentation. A trimap was estimated in advance and then effectively incorporated into the graph-cut framework for fine segmentation. The proposed system was evaluated with respect to ground truth data, and it was shown to detect and track multiple people very well and also produce high-quality silhouettes.
Similar content being viewed by others
References
Ahn J-H, Byun H (2006) Accurate foreground extraction using graph cut with trimap estimation. LNCS 4319:1185–1194
Ahn J-H, Kwak S, Choi C, Kim K, Byun H (2006) An integrated robot vision system for multiple human tracking and silhouette extraction. LNCS 4282:575–583
Ahn J-H, Kim K, Byun H (2006) Robust object segmentation using graph cut with object and background seed estimation. Proc Int Conf Pattern Recognit 2:361–364
Balaguer C, Gimenez A, Huete AJ, Sabatini AM, Topping M, Bolmsjo G (2006) The MATS robot: service climbing robot for personal assistance. IEEE Robot Autom Mag 13(1):51–58
Balcells M, DeMenthon D, Doermann D (2004) An appearance-based approach for consistent labeling of humans and objects in video. Pattern Anal Appl 7(4):373–385
Basanez L, Rosell J (2005) Robotic polishing systems. IEEE Robot Autom Mag 12(3):35–43
Baumberg A, Hogg DC (1997) Learning deformable models for tracking the human body. In: Shah M, Jain R (eds) Motion-based recognition. Kluwer, Dordrecht, pp 39–60
Bischoff R, Graefe V (2004) HERMES—a versatile personal robotic assiatance. IEEE Proc Spec Issue Hum Interact Robots Psychol Enrich 92(11):1759–1779
Boykov Y, Jolly M (2001) Interactive graph cuts for optimal boundary and region segmentation of objects in N-D images. Proc Int Conf Comput Vis 1:105–112
Boykov Y, Kolmogorov V (2004) An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE Trans Pattern Anal Mach Intell 26(9):1124–1137
Bradski GR, Davis J (2000) Motion segmentation and pose estimation with motion history gradients. In: IEEE workshop on applications of computer vision, pp 174–184
Breazeal C (2003) Designing sociable robots. Robot Auton Syst 42:167–175
Chang C, Ansari R (2005) Kernel particle filter for visual tracking. IEEE Signal Process Lett 12(3):242–245
Choi C, Ahn J-H, Lee S, Byun H (2006) Disparity weighted histogram-based object tracking for mobile robot systems. LNCS 4282:584–593
Chu CW, Cohen I (2005) Posture and gesture recognition using 3D body shapes decomposition. Proc Int Conf Comput Vis Pattern Recognit 3:69–69
Comaniciu D, Meer P (2002) Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Intell 24(5):603–619
Comaniciu D, Ramesh V (2003) Kernel-based object tracking. IEEE Trans Pattern Anal Mach Intell 25(5):564–577
Crowley J (1997) Vision for man–machine interaction. Robot Auton Syst 19:347–358
Cucchiara R, Grana C, Prati A, Vezzani R (2005) Probabilistic posture classification for human-behavior analysis. IEEE Trans Syst Man Cybern Part A Syst Hum 35(1):42–54
Darrell T, Gordon G, Harville M, Woodfill J (2000) Integrated person tracking using stereo, colour, and pattern detection. Int J Comput Vis 15:175–185
Erickson JK (2006) Living the dream—an overview of the Mars exploration project. IEEE Robot Autom Mag 13(2):12–18
Fernyhough J, Cohn AG, Hogg DC (2000) Constructing qualitative event models automatically from video input. Image Vis Comput 18(9):81–103
Fong T, Nourbakhsh I, Dautenhahn K (2003) A survey of socially interactive robots. Rob Auton Syst 42:143–166
Foresti GL, Micheloni C (2003) A robust feature tracker for active surveillance of outdoor scenes. Electron Lett Comput Vis Image Anal 1(1):21–34
Gavrila DM (1999) The visual analysis of human movement: a survey. Comput Vis Image Underst 73(1):82–98
Hager G, Dewan M, Stewart C (2004) Multiple kernel tracking with ssd. Proc IEEE Conf Comput Vis Pattern Recognit 1:790–797
Haritaoglu I, Harwood D, Davis LS (2000) W4: real-time surveillance of people and their activities. IEEE Trans Pattern Anal Mach Intell 1:790–797
Harris C, Stephens MJ (1988) A combined corner and edge detector. In: Proceedings of the fourth Alvey vision conference, pp 147–151
Hu W, Tan T, Wang L, Maybank S (2004) A survey on visual surveillance of object motion and behaviors. IEEE Trans Syst Man Cybern Part C Appl Rev 34(3):334–352
Jung B, Sukhatme GS (2004) Detecting moving objects using a single camera on a mobile robot in an outdoor environment. In: Proceedings of internationall conference on intelligent autonomous systems, pp 980–987
Kohli P, Torr PHS (2006) Efficiently solving dynamic Markov random fields using graph cuts. Proc Int Conf Comput Vis 2:922–929
Kolmogorov V, Criminisi A, Blake A, Cross G, Rother C (2006) Probabilistic fusion of stereo with color and contrast for bilayer segmentation. IEEE Trans Pattern Anal Mach Intell 28(9):1480–1492
Konolige K (1997) Small vision systems: hardware and implementation. In: Proceedings of international symposium on robotics research, pp 111-116
Li H, Greenspan M (2005) Multi-scale gesture recognition from time-varying contours. Proc IEEE Int Conf Comput Vis 1:236-243
Li Y, Sun J, Shum H-Y (2005) Video object cut and paste. ACM Trans Graph 24(3):595–600
Li Y, Sun J, Tang C-K, Shum H-Y (2005) Lazy snapping. ACM Trans Graph 23(3):303–308
Lin J, Parhi KK (2005) VLSI architectures for stereoscopic video, disparity matching and object extraction. In: Proceedings of 2005 IEEE international symposium on circuits and systems, pp 2373–2376
Liu Z, Sarkar S (2005) Effect of silhouette quality on hard problems in gait recognition. IEEE Trans Syst Man Cybern Part B 35(2):170–183
Ludington B, Johnson E, Vachtsevanos G (2006) Augmenting UAV autonomy. IEEE Rob Autom Mag 13(3):63–71
McKenna S, Jabri S, Duric Z, Rosenfeld A, Wechsler H (2000) Tracking groups of people. Comput Vis Image Underst 80(1-2):42–56
Messelodi S, Modena CM, Zanin M (2005) A computer vision system for the detection and classification of vehicles at urban road intersections. Pattern Anal Appl 8(1-2):42–56
Meyer D, Denzler J, Niemann H (1998) Model based extraction of articulated objects in image sequences for gait analysis. In: Proceedings of IEEE international conference on image processing, pp 78-81
Meyer D, Psl J, Niemann H (1998) Gait classification with HMM’s for trajectories of body parts extracted by mixture densities. In: Proceedings of British machine vision conference, pp 459–468
Pineau J, Montemerlo M, Pollack M, Roy N, Thrun S (2003) Towards robotic assistants in nursing homes: challenges and results. Rob Auton Syst 42:271–281
Rother C, Kolmogorov V, Blake A (2004) GrabCut: interactive foreground extraction using iterated graph cuts. ACM Trans Graph 23(3):309–314
Shi J, Tomasi C (1994) Good features to track. In: Proceedings of international conference on computer vision and pattern recognition, pp 593–600
Stauffer C, Grimson WEL (2000) Learning patterns of activity using real-time tracking. IEEE Trans Pattern Anal Mach Intell 22(8):747-757
Tiand T, Tomasi C (1996) Comparision of approaches to egomotion computation. In: Proceedings of IEEE conference computer vision and pattern recognition, pp 315–320
Wand MP, Jones MC (1995) Kernel smoothing. Chapman and Hall, London
Wang RR, Huang T (2004) A framework of joint object tracking and event detection. Pattern Anal Appl 7(4):343–355
Wang L, Tan T, Ning H, Hu W (2003) Silhouette analysis-based gait recognition for human identification. IEEE Trans Pattern Anal Mach Intell 25(12):1505–1518
Yang C, Duraiswami R, Davis L (2005) Efficient mean-shift tracking via a new similarity measure. Proc IEEE Conf Comput Vis Pattern Recognit 1:176–183
Yang M, Wang S, Lin Y (2005) A multimodal fusion system for people detection and tracking. Int J Imaging Syst Technol 15:131–142
Zelnik-Manor L, Irani M (2006) Statistical analysis of dynamic actions. IEEE Trans Pattern Anal Mach Intell 28(9):1530–1535
Acknowledgments
This research was supported by the MIC (Ministry of Information and Communication), Korea, under the ITRC (Information Technology Research Center) support program supervised by the IITA (Institute of Information Technology Advancement) [IITA-2007-(C1090-0701-0046)]. This research was supported by the KOSEF (Korea Science and Engineering Foundation) (R01-2007-000-11683-0).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ahn, JH., Choi, C., Kwak, S. et al. Human tracking and silhouette extraction for human–robot interaction systems. Pattern Anal Applic 12, 167–177 (2009). https://doi.org/10.1007/s10044-008-0112-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-008-0112-3