Skip to main content
Log in

3D pedestrian tracking and frontal face image capture based on head point detection

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

This paper proposes a method to track pedestrians in crowded scenes and capture the close-up frontal face images of a person of interest (POI) for recognition. Pedestrians are tracked via 3D positions of the head points (the highest point of a person) using 2 static overhead cameras. Head points are located and tracked based on the geometric and color cues in the scene. Possible head areas in a frame acquired from one of the overhead cameras are determined based on projective geometry. Head areas belonging to a person are clustered. Without creating a full disparity map of the scene, the 3D position of a pedestrian is obtained by utilizing the disparity along the line segment that passes through his/her head top. The 3D head position is then tracked using common assumptions on motion velocity. If the tracking is not accurate enough, the color distribution of a head top is integrated as a complementary method. With the 3D head point information, a set of pan-tilt-zoom (PTZ) cameras are scheduled to capture the frontal face images of POI. A most suitable PTZ camera is selected by evaluating the capture quality of each PTZ camera and its current state. The approach is tested using a publicly available visual surveillance simulation test bed. The experiments show that the 3D tracking errors are around 4 cm and high quality frontal face images are captured.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21

Similar content being viewed by others

References

  1. Bellotto N, Sommerlade E, Benfold B, Bibby C, Reid I, Roth D et al (2009) A distributed camera system for multi-resolution surveillance. In ACM/IEEE International Conference on Distributed Smart Cameras, pp. 1–8

  2. Beymer D (2000) Person counting using stereo. Workshop on Human Motion:127–133

  3. Bimbo AD, Pernici F (2006) Towards on-line saccade planning for high-resolution image sensing. Pattern Recogn Lett 27:1826–1834

    Article  Google Scholar 

  4. Boltes M, Seyfried A (2013) Collecting pedestrian trajectories. Neurocomputing 100:127–133

    Article  Google Scholar 

  5. Boltes M, Seyfried A, Steffen B, Schadschneider A (2010) Automatic extraction of pedestrian trajectories from video recordings. In Pedestrian and Evacuation Dynamics 2008, W. W. F. Klingsch, C. Rogsch, A. Schadschneider, and M. Schreckenberg, Eds., ed, pp. 43–54

    Google Scholar 

  6. Brostow G, Cipolla R (2006) Unsupervised bayesian detection of independent motion in crowds. IEEE Conference on Computer Vision and Pattern Recognition:594–601

  7. Collins RT, Lipton AJ, Fujiyoshi H, Kanade T (2001) Algorithms for cooperative multisensor surveillance. Proc IEEE:1456–1477

    Article  Google Scholar 

  8. Comaniciu D, Meer P (2002) Mean shift: A robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Intell 24:603–619

    Article  Google Scholar 

  9. Comaniciu D, Ramesh V, Meer P (2003) Kernel-based object tracking. IEEE Trans Pattern Anal Mach Intell 25:564–577

    Article  Google Scholar 

  10. Crow FC (1984) Summed-area tables for texture mapping. SIGGRAPH:207–212

    Article  Google Scholar 

  11. Daugman J (2002) How iris recognition works. International Conference on Image Processing:33–36

  12. Delannay D, Danhier N, Vleeschouwer CD (2009) Detection and recognition of sports(wo)man from multiple views. In ACM/IEEE International Conference on Distributed Smart Cameras, pp. 1–7

  13. Eshel R, Moses Y (2008) Homography based multiple camera detection and tracking of people in a dense crowd. IEEE Conference on Computer Vision and Pattern Recognition:1–8

  14. Guo R, Dai Q, Hoiem D (2013) Paired Regions for Shadow Detection and Removal. IEEE Trans Pattern Anal Mach Intell 35:2956–2967

    Article  Google Scholar 

  15. Hampapur A, Pankanti S, Senior A, Tian Y-L, Brown L, Bolle R (2003) Face cataloger: Multi-scale imaging for relating identity to location. IEEE Conference on Advanced Video and Signal Based Surveillance:13–20

  16. Jin Z, Bhanu B (2015) Analysis-by-synthesis: Pedestrian Tracking with Crowd Simulation Models in a Multi-camera Video Network. Comput Vis Image Underst 134:48–63

    Article  Google Scholar 

  17. Kailath T (1967) The divergence and bhattacharyya distance measures in signal selection. IEEE Transactions on Communication Technology 15:52–60

    Article  Google Scholar 

  18. Kawanaka H, Fujiyoshi H, Iwahori Y (2006) Human head tracking in three dimensional voxel space. International Conference on Pattern Recognition:826–829

  19. Khan SM, Shah M (2006) A multi-view approach to tracking people in crowded scenes using a planar homography constraint. European Conference on Computer Vision:133–146

  20. Khan SM, Shah M (2009) Tracking multiple occluding people by localizing on multiple scene planes. IEEE Trans Pattern Anal Mach Intell 31:505–519

    Article  Google Scholar 

  21. Krumm J, Harris S, Meyers B, Brumitt B, Hale M, Sha S (2000) Multi-camera multi-person tracking for easy living. Third IEEE International Workshop on Visual Surveillance

  22. Marchesotti L, Marcenaro L, Regazzoni C (2003) Dual camera system for face detection in unconstrained environments. International Conference on Image Processing:681–684

  23. Mittal A, Larry S (2003) M2tracker: A multi-view approach to segmenting and tracking people in a cluttered scene. 51:189–203

  24. Ning J, Zhang L, Zhang D, Wu C (2012) Scale and orientation adaptive mean shift tracking. IET Comput Vis 6:52–61

    Article  MathSciNet  Google Scholar 

  25. Nummiaro K, Koller-Meier E, Van Gool L (2003) An adaptive color-based particle filter. Image Vis Comput 21:99–110

    Article  Google Scholar 

  26. Oosterhout TV, Bakkes S, Kröse BJA (2011) Head detection in stereo data for people counting and segmentation. In: International Conference on Computer Vision Theory and Applications, pp. 620–625.

  27. Oosterhout TV, Englebienne G, Kröse B (2015) RARE: People Detection in Crowded Passages by Range Image Reconstruction. Mach Vis Appl 26:561–573

    Article  Google Scholar 

  28. Oosterhout TV, Kröse BJA, Englebienne G (2012) People counting with stereo cameras - two template-based solutions. In International Conference on Computer Vision Theory and Applications (2), pp. 404–408

  29. Orwell J, Massey S, Remagnino P, Greenhill D, Jones G (1999) A multi-agent framework for visual surveillance. IEEE International 1st Conference on Image Processing

  30. Ozturk O, Yamasaki T, Aizawa K (2009) Tracking of humans and estimation of body/head orientation from top-view single camera for visual focus of attention analysis. International Conference on Computer Vision:1020–1027

  31. Prince SJD, Elder JH, Hou Y, Sizinstev M (2005) Pre-attentive face detection for foveated wide-field surveillance. IEEE Workshop on Applications on Computer Vision:439–446

  32. Qureshi FZ, Terzopoulos D (2006) Surveillance camera scheduling: A virtual vision approach. Multimedia Systems 12:269–283

    Article  Google Scholar 

  33. Rougier C, Meunier J, St-Arnaud A, Rousseau J (2013) 3d head tracking for fall detection using a single calibrated camera. Image Vis Comput 31:246–254

    Article  Google Scholar 

  34. Sanin A, Sanderson C, Lovell BC (2010) Improved Shadow Removal for Robust Person Tracking in Surveillance Scenarios. International Conference on Pattern Recognition:141–144

  35. Santos TT, Morimoto CH (2011) Multiple camera people detection and tracking using support integration. Pattern Recogn Lett 32:47–55

    Article  Google Scholar 

  36. Sasi RK, Govindan VK (2016) Shadow removal using sparse representation over local dictionaries. Engineering Science and Technology, an International Journal 192:1067–1075

    Article  Google Scholar 

  37. Sun L, Di H, Tao L, Xu G (2010) A robust approach for person localization in multi-camera environment. International Conference on Pattern Recognition:4036–4039

  38. Taylor GR, Chosak AJ, Brewer PC (2007) OVVV: using virtual worlds to design and evaluate surveillance systems. IEEE Conference on Computer Vision and Pattern Recognition:1–8

  39. Veksler O (2003) Fast variable window for stereo correspondence using integral images. IEEE Conference on Computer Vision and Pattern Recognition:556–561

  40. Vincent L (1993) Gray scale area openings and closings, their efficient implementation and applications. Workshop on Mathematical Morphology Applications Signal Processing:22–27

  41. Viola P, Jones M (2004) Robust real-time face detection. Int J Comput Vis 57:137–154

    Article  Google Scholar 

  42. Wang J, Zhang C, Shum H (2004) Face image resolution versus face recognition performance based on two global methods. In Asia Conference on Computer Vision

  43. Yatim HSM, Talib AZ, Haron F (2017) An Automated Image-Based Approach for Tracking Pedestrian Movements from Top-View Video. In: International Visual Informatics Conference, pp 279–289

    Chapter  Google Scholar 

  44. Zhang Z, Cohen F (2013) Pedestrian tracking based on 3d head point detection. International Conference on Computer Vision Theory and Applications 2:382–385

    Google Scholar 

  45. Zhang Z, Cohen F (2013) 3d pedestrian tracking based on overhead cameras. International Conference on Distributed Smart Cameras:1–6

  46. Zhao T, Nevatia R (2004) Tracking multiple humans in complex situations. IEEE Trans Pattern Anal Mach Intell 26:1208–1221

    Article  Google Scholar 

  47. Zhou X, Collins RT, Kanade T, Metes P (2003) A master-slave system to acquire biometric imagery of humans at distance. In First ACM SIGMM international workshop on Video surveillance, pp. 113–120

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhongchuan Zhang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

$$ atan2\left(y,x\right)=\left\{\begin{array}{c}\arctan \left(\frac{y}{x}\right)\kern0.75em if\ x>0\\ {}\arctan \left(\frac{y}{x}\right)+\pi \kern0.5em \ if\ x<0\ and\ y\ge 0\ \\ {}\begin{array}{c}\arctan \left(\frac{y}{x}\right)-\pi\ if\ x<0\ and\ y<0\\ {}\ \frac{\pi }{2}\ if\ x=0\ and\ y>0\\ {}\begin{array}{c}-\frac{\pi }{2}\ if\ x=0\ and\ y<0\\ {}\ 0\ if\ x=0\ and\ y=0\end{array}\end{array}\end{array}\right. $$

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, Z., Cohen, F. 3D pedestrian tracking and frontal face image capture based on head point detection. Multimed Tools Appl 79, 737–764 (2020). https://doi.org/10.1007/s11042-019-08121-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-019-08121-y

Keywords

Navigation