Pylon grid: A fast method for human head detection in range images
Introduction
There is a class of applications that make use of automatic tracking and counting of people in a scene. One example is consumer path tracking in shopping malls in order to collect statistical data about the consumers behavior. Another application people counting and pedestrian flow analysis in public areas.
An essential component of any tracking system is the detector, which determines the positions of the people in each frame of the image sequence. In the context of stereo vision, the problem of precise detection can be attacked effectively with a vertical camera orientation (Fig. 1).
While the vertical camera orientation limits the size of the observed area, it provides a clear view, with minimal occlusions among the people. This enables a reliable segmentation of the people from the ground, with accurate head detection, even in crowded scenes with hight people density.
Another advantage of the vertical camera orientation is that human heads appear as local minima in the range image (Fig. 2). Considering this, and assuming that human bodies have convex silhouettes in the range image, the problem of head detection can be reduced to finding all local minima in the range image.
Section snippets
Related work
Vision-based algorithms for the people counting and tracking problem may be divided into two main classes: the direct and the indirect methods [1]. The direct methods [2], [3], [4], [5] are based on the detection of each person in the scene, by using some form of segmentation and object detection algorithm. The counting, in turn, is performed as a second step. The indirect methods [1], [6], [7], [8] instead perform the people counting by using the measurement of some features that do not
Hill climbing
There is a class of hill-climbing algorithms that are able to find one local minimum in sublinear time. The approach is based on putting a climber somewhere in the range image and then check the surrounding neighbor pixels at a close distance. The climber descends by moving to the neighbor pixel with the smallest value. He continues the search from there, until he reaches a position where no neighbor has a smaller value, which is a local minimum.
This method is well suited for strictly
Methodology
First we will discuss the ideal case where the one-to-one relationship between local minima and human heads is given, and human bodies have a convex range silhouette with one minimum at the top of the head.
Preprocessing of the range data
Even when the stereo parameters are well-calibrated to the scene, the range images can still contain invalid or incomplete data, so that the one-to-one relationship between human heads in the scene and local minima in the range image does not hold. We apply a number of preprocessing steps in order to enhance the range data so that it fits into the assumption of the one-to-one relationship.
Results and discussion
The Pylon Grid detector was as a core component in the people counting system that we build using the Bumblebee2 stereo camera and the C# programming language. The performance of the system was tested in three outdoor scenes:
- •
camera height 5.2 m, daylight, with sub-pixel interpolation turned on;
- •
camera height 3.4 m, daylight;
- •
camera height 3.4 m, at night with some artificial light.
The camera was manually calibrated to the scenes and the range images we extracted with a proprietary library from Point
Conclusion
This work confirms that range images captured from a vertically oriented camera provide an excellent base for pedestrian detection and tracking. It also shows that head detection can be performed efficiently and accurately, solely by the analysis of 3-D range data in a context-independent manner by examining only the individual frame.
The work contributes a fast and accurate Pylon Grid algorithm with linear complexity for local minima detection in range images. The algorithm is guaranteed to
Acknowledgments
We want to thank Prof. Guilherme A.S. Pereira from the Department of Electrical Engineering at the Universidade Federal de Minas Gerais for lending us the Bumblebee2 stereo camera from his laboratory. Another thanks to the excellent support team of Point Grey Research, who helped us resolve very specific problems regarding offline processing of the stereo footage.
Special thanks to Anderson Will Fortunato for helping us with the experiments. Special thanks to Gabriel Machado Fonceca and his
Rusi Antonov Filipov completed his M.Sc. in computer science at the Karlsruhe University of Applied Sciences in 2009. During his masters-thesis at CEFET-MG he developed a people counting system based on Stereo Vision together with Flavio Cardeal and Marco Aurélio. His research interests include computer vision, programming languages, algorithms and software engineering.
References (14)
- et al.
A method for counting moving people in video surveillance videos
EURASIP J. Adv. Signal Process.
(2010) - et al.
Tracking multiple occluding people by localizing on multiple scene planes
IEEE Trans. Pattern Anal. Mach. Intell.
(2009) - et al.
Segmentation and tracking of multiple humans in crowded environments
IEEE Trans. Pattern Anal. Mach. Intell.
(2008) - et al.
Tracking people by learning their appearance
IEEE Trans. Pattern Anal. Mach. Intell.
(2007) - D.B. Yang, H.H. González-Baños, L.J. Guibas, Counting people in crowds with a real-time network of simple image...
- A. Albiol, M.J. Silla, A. Albiol, J.M. Mossi, Video analysis using corner motion statistics, in: Proceedings of the...
- et al.
A novel method for tracking and counting pedestrians in real-time using a single camera
IEEE Trans. Veh. Technol.
(2001)
Cited by (0)
Rusi Antonov Filipov completed his M.Sc. in computer science at the Karlsruhe University of Applied Sciences in 2009. During his masters-thesis at CEFET-MG he developed a people counting system based on Stereo Vision together with Flavio Cardeal and Marco Aurélio. His research interests include computer vision, programming languages, algorithms and software engineering.
Flávio Luis Cardeal Pádua received the Bachelor degree in electrical engineering from the Universidade Federal de Minas Gerais (UFMG), Brazil, in 1999, and the M.Sc. and Ph.D. degrees in computer science from the same university, in 2002 and 2005, respectively. He has been an adjunct professor of computer engineering at the Centro Federal de Educação Tecnológica de Minas Gerais (CEFET-MG) since 2005. From 2001 to 2003, he worked at Oi S/A in Brazil where he managed engineering projects for increasing the reliability and availability of telecommunication services. From 1998 to 1999, he participated in an undergraduate program at the Technical University of Berlin in Germany, sponsored by the governments of Brazil and Germany. During this period, he worked as a visiting scientist at the Institute for Machine Tools and Factory Management (IWF). His research interests include computer vision and video information processing, with special focus on visual motion analysis and three-dimensional scene analysis from video.
Marco Aurélio Buono Carone is a distinguished computer engineering student at the Department of Computing in the Federal Center for Technological Education of Minas Gerais (CEFET-MG). He has contributed for the research and development of various research projects including two people counting systems based on computer vision. His research interests include computer vision, 3-D computer graphics, game programming, artificial intelligence and web applications.