Abstract
This paper proposes a method to track pedestrians in crowded scenes and capture the close-up frontal face images of a person of interest (POI) for recognition. Pedestrians are tracked via 3D positions of the head points (the highest point of a person) using 2 static overhead cameras. Head points are located and tracked based on the geometric and color cues in the scene. Possible head areas in a frame acquired from one of the overhead cameras are determined based on projective geometry. Head areas belonging to a person are clustered. Without creating a full disparity map of the scene, the 3D position of a pedestrian is obtained by utilizing the disparity along the line segment that passes through his/her head top. The 3D head position is then tracked using common assumptions on motion velocity. If the tracking is not accurate enough, the color distribution of a head top is integrated as a complementary method. With the 3D head point information, a set of pan-tilt-zoom (PTZ) cameras are scheduled to capture the frontal face images of POI. A most suitable PTZ camera is selected by evaluating the capture quality of each PTZ camera and its current state. The approach is tested using a publicly available visual surveillance simulation test bed. The experiments show that the 3D tracking errors are around 4 cm and high quality frontal face images are captured.
Similar content being viewed by others
References
Bellotto N, Sommerlade E, Benfold B, Bibby C, Reid I, Roth D et al (2009) A distributed camera system for multi-resolution surveillance. In ACM/IEEE International Conference on Distributed Smart Cameras, pp. 1–8
Beymer D (2000) Person counting using stereo. Workshop on Human Motion:127–133
Bimbo AD, Pernici F (2006) Towards on-line saccade planning for high-resolution image sensing. Pattern Recogn Lett 27:1826–1834
Boltes M, Seyfried A (2013) Collecting pedestrian trajectories. Neurocomputing 100:127–133
Boltes M, Seyfried A, Steffen B, Schadschneider A (2010) Automatic extraction of pedestrian trajectories from video recordings. In Pedestrian and Evacuation Dynamics 2008, W. W. F. Klingsch, C. Rogsch, A. Schadschneider, and M. Schreckenberg, Eds., ed, pp. 43–54
Brostow G, Cipolla R (2006) Unsupervised bayesian detection of independent motion in crowds. IEEE Conference on Computer Vision and Pattern Recognition:594–601
Collins RT, Lipton AJ, Fujiyoshi H, Kanade T (2001) Algorithms for cooperative multisensor surveillance. Proc IEEE:1456–1477
Comaniciu D, Meer P (2002) Mean shift: A robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Intell 24:603–619
Comaniciu D, Ramesh V, Meer P (2003) Kernel-based object tracking. IEEE Trans Pattern Anal Mach Intell 25:564–577
Crow FC (1984) Summed-area tables for texture mapping. SIGGRAPH:207–212
Daugman J (2002) How iris recognition works. International Conference on Image Processing:33–36
Delannay D, Danhier N, Vleeschouwer CD (2009) Detection and recognition of sports(wo)man from multiple views. In ACM/IEEE International Conference on Distributed Smart Cameras, pp. 1–7
Eshel R, Moses Y (2008) Homography based multiple camera detection and tracking of people in a dense crowd. IEEE Conference on Computer Vision and Pattern Recognition:1–8
Guo R, Dai Q, Hoiem D (2013) Paired Regions for Shadow Detection and Removal. IEEE Trans Pattern Anal Mach Intell 35:2956–2967
Hampapur A, Pankanti S, Senior A, Tian Y-L, Brown L, Bolle R (2003) Face cataloger: Multi-scale imaging for relating identity to location. IEEE Conference on Advanced Video and Signal Based Surveillance:13–20
Jin Z, Bhanu B (2015) Analysis-by-synthesis: Pedestrian Tracking with Crowd Simulation Models in a Multi-camera Video Network. Comput Vis Image Underst 134:48–63
Kailath T (1967) The divergence and bhattacharyya distance measures in signal selection. IEEE Transactions on Communication Technology 15:52–60
Kawanaka H, Fujiyoshi H, Iwahori Y (2006) Human head tracking in three dimensional voxel space. International Conference on Pattern Recognition:826–829
Khan SM, Shah M (2006) A multi-view approach to tracking people in crowded scenes using a planar homography constraint. European Conference on Computer Vision:133–146
Khan SM, Shah M (2009) Tracking multiple occluding people by localizing on multiple scene planes. IEEE Trans Pattern Anal Mach Intell 31:505–519
Krumm J, Harris S, Meyers B, Brumitt B, Hale M, Sha S (2000) Multi-camera multi-person tracking for easy living. Third IEEE International Workshop on Visual Surveillance
Marchesotti L, Marcenaro L, Regazzoni C (2003) Dual camera system for face detection in unconstrained environments. International Conference on Image Processing:681–684
Mittal A, Larry S (2003) M2tracker: A multi-view approach to segmenting and tracking people in a cluttered scene. 51:189–203
Ning J, Zhang L, Zhang D, Wu C (2012) Scale and orientation adaptive mean shift tracking. IET Comput Vis 6:52–61
Nummiaro K, Koller-Meier E, Van Gool L (2003) An adaptive color-based particle filter. Image Vis Comput 21:99–110
Oosterhout TV, Bakkes S, Kröse BJA (2011) Head detection in stereo data for people counting and segmentation. In: International Conference on Computer Vision Theory and Applications, pp. 620–625.
Oosterhout TV, Englebienne G, Kröse B (2015) RARE: People Detection in Crowded Passages by Range Image Reconstruction. Mach Vis Appl 26:561–573
Oosterhout TV, Kröse BJA, Englebienne G (2012) People counting with stereo cameras - two template-based solutions. In International Conference on Computer Vision Theory and Applications (2), pp. 404–408
Orwell J, Massey S, Remagnino P, Greenhill D, Jones G (1999) A multi-agent framework for visual surveillance. IEEE International 1st Conference on Image Processing
Ozturk O, Yamasaki T, Aizawa K (2009) Tracking of humans and estimation of body/head orientation from top-view single camera for visual focus of attention analysis. International Conference on Computer Vision:1020–1027
Prince SJD, Elder JH, Hou Y, Sizinstev M (2005) Pre-attentive face detection for foveated wide-field surveillance. IEEE Workshop on Applications on Computer Vision:439–446
Qureshi FZ, Terzopoulos D (2006) Surveillance camera scheduling: A virtual vision approach. Multimedia Systems 12:269–283
Rougier C, Meunier J, St-Arnaud A, Rousseau J (2013) 3d head tracking for fall detection using a single calibrated camera. Image Vis Comput 31:246–254
Sanin A, Sanderson C, Lovell BC (2010) Improved Shadow Removal for Robust Person Tracking in Surveillance Scenarios. International Conference on Pattern Recognition:141–144
Santos TT, Morimoto CH (2011) Multiple camera people detection and tracking using support integration. Pattern Recogn Lett 32:47–55
Sasi RK, Govindan VK (2016) Shadow removal using sparse representation over local dictionaries. Engineering Science and Technology, an International Journal 192:1067–1075
Sun L, Di H, Tao L, Xu G (2010) A robust approach for person localization in multi-camera environment. International Conference on Pattern Recognition:4036–4039
Taylor GR, Chosak AJ, Brewer PC (2007) OVVV: using virtual worlds to design and evaluate surveillance systems. IEEE Conference on Computer Vision and Pattern Recognition:1–8
Veksler O (2003) Fast variable window for stereo correspondence using integral images. IEEE Conference on Computer Vision and Pattern Recognition:556–561
Vincent L (1993) Gray scale area openings and closings, their efficient implementation and applications. Workshop on Mathematical Morphology Applications Signal Processing:22–27
Viola P, Jones M (2004) Robust real-time face detection. Int J Comput Vis 57:137–154
Wang J, Zhang C, Shum H (2004) Face image resolution versus face recognition performance based on two global methods. In Asia Conference on Computer Vision
Yatim HSM, Talib AZ, Haron F (2017) An Automated Image-Based Approach for Tracking Pedestrian Movements from Top-View Video. In: International Visual Informatics Conference, pp 279–289
Zhang Z, Cohen F (2013) Pedestrian tracking based on 3d head point detection. International Conference on Computer Vision Theory and Applications 2:382–385
Zhang Z, Cohen F (2013) 3d pedestrian tracking based on overhead cameras. International Conference on Distributed Smart Cameras:1–6
Zhao T, Nevatia R (2004) Tracking multiple humans in complex situations. IEEE Trans Pattern Anal Mach Intell 26:1208–1221
Zhou X, Collins RT, Kanade T, Metes P (2003) A master-slave system to acquire biometric imagery of humans at distance. In First ACM SIGMM international workshop on Video surveillance, pp. 113–120
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Rights and permissions
About this article
Cite this article
Zhang, Z., Cohen, F. 3D pedestrian tracking and frontal face image capture based on head point detection. Multimed Tools Appl 79, 737–764 (2020). https://doi.org/10.1007/s11042-019-08121-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-019-08121-y