Elsevier

Neurocomputing

Volume 100, 16 January 2013, Pages 19-30
Neurocomputing

HoGG: Gabor and HoG-based human detection for surveillance in non-controlled environments

https://doi.org/10.1016/j.neucom.2011.12.037Get rights and content

Abstract

A new method (HoGG) for human detection based on Gabor filters and Histograms of Oriented Gradients is presented in this paper. The effect of Gabor preprocessing is analyzed in detail, in particular the improvement experienced by the image's information and the influence exerted over the extracted feature. To compare the performance of the proposed method, several alternative algorithms for human detection have been considered. In order to evaluate these techniques in non-controlled environments, a collection of standard databases, well known in the surveillance research community, has been used: PETS 2006, PETS 2007, PETS 2009 and CAVIAR. An exhaustive test design has been built based on two complementary evaluations: an evaluation oriented to counting people and a novel evaluation oriented to identification. Moreover, with the purpose of studying the performance of the Gabor-based preprocessing, a test adding Gabor filters to other local feature extraction methods, such as Steerable filters and the SIFT method, has been implemented. The HoGG method has achieved a good performance regardless of the difficulty of the images (occlusions, overlapping, carrying baggage, etc.). The proposed method has surpassed the alternative techniques in most of the analyzed situations. When the Gabor preprocessing is introduced into other local feature extraction methods, they achieve a better detection of the relevant information by enhancing the human shape. The results show that using Gabor preprocessing in techniques based on features like orientation or magnitude of gradient improve their performance. Given the excellent results obtained by HoGG at the identification-oriented evaluation, the method presented in this paper should be taken into account in the future design of intelligent surveillance systems.

Introduction

As an active research topic in computer vision, visual surveillance in dynamic scenes attempts to detect, recognize, and track certain objects from image sequences, and more generally to understand and describe object behavior. The aim behind such topic is to develop intelligent visual surveillance in order to replace the traditional passive video surveillance that has proven ineffective when the number of cameras exceeds the capability of human operators to monitor them [13]. Since humans are the main actors in daily activities of interest, one of the main tasks in video surveillance systems is people detection. This task can be very complex in situations involving critical infrastructures with high density of people, like airports, subway and train stations. That is, places where there are many people and where any attack or suspect situations could turn out to be dangerous. Consequently, this problem has been an active area of research in recent years [38]. Visual surveillance is a challenging scientific problem and an important field of application for computer vision. With increasing processor power, more attention has been given to the development of real-time smart surveillance systems. In addition, surveillance cameras have already been installed in many locations such as highways, streets, stores, ATM machines, homes and offices, and are socially accepted. The ability to analyze and understand human motion and recognition of human activities is key for a machine to interact intelligently and effortlessly with humans in a social environment [32]. Given the recent growth in security needs due to terrorist attacks, thefts, etc., it is crucial to have a reliable system for detecting people. However, this is a challenging problem due to environmental situations like illumination changes, occlusions and the fact that human bodies are non-rigid and highly articulated [36]. In addition, working in non-controlled environments increases the complexity of human detection tasks. In fact, visual surveillance in dynamic scenes, especially of humans and vehicles, is currently one of the most active research topics in computer vision [13]. It has a wide spectrum of promising applications, including access control in special areas, human identification at distance, crowd flux statistics and congestion analysis, detection of anomalous behavior, interactive surveillance use of multiple cameras, etc.

Currently, several human detection algorithms display a good performance in controlled conditions (see for instance [21], [5], [39]). However, when these algorithms are applied to real scenarios there is a sharp decrease in their performance. Hence, the task of human detection in non-controlled environments remains unsolved [30].

There are different approaches to computer vision-based video surveillance systems. One common solution is to extract moving objects and then apply an object tracking technique, whilst another widespread approximation is more tailored to object recognition. The former solution may obtain a good performance in controlled environments, but in the case of non-controlled environments, where the kind of the detected object is more relevant, the latter approach may be more suitable. The approach proposed by this paper is focused on human recognition without tracking. Usually, object recognition systems have three different stages: background segmentation, features extraction, and classification. In the first step, background segmentation and movement detection are applied in order to extract the moving objects from an image. The effectiveness of this task has a big influence on the final result of the surveillance system. In the second step, the feature extraction of moving objects is performed [11]. There are different ways to carry out this task: by using extraction windows, human templates (full body or body parts), histogram features, etc. Finally, in the third step, the features are used to determine whether an object is or not a person. To perform this task, several classifiers have been used: AdaBoost, Support Vector Machines (SVM) and K-Nearest Neighbors algorithm, among others, all well known in the literature. In all computer vision systems the database selected to evaluate the system performance is highly relevant. Usually just one database is considered and the system parameters are fixed to the specific conditions of this database. That is, the generalization capacity of the system is not considered.

As main contribution of this paper, a new method for human detection in non-controlled environments is presented. The proposed technique is based on Histogram of Oriented Gradient (HoG) [5] and Gabor filters [20]. For simplification purposes, this new method will henceforth be called HoGG. The effect of Gabor preprocessing is analyzed in detail, in particular the improvement experienced by the image's information and the influence exerted over the extracted feature. In order to compare HoGG with other state-of-the-art human detection methods [5], [21], [35] from a global point of view, several standard databases have been considered. Thus, all methods are tested in a wide variety of real conditions. To study the performance of every human detection method in each database, a detailed experimental design has been put in place. Also, two evaluations have been carried out to compare the different methods. Firstly, an evaluation oriented to counting people, in addition to which a novel evaluation oriented to identification is proposed. Moreover, an evaluation of the effectiveness of preprocessing images with Gabor filters in some alternative methods, such as Steerable filters [9] and SIFT [25], is also presented. The paper is organized as follows. Section 2 presents a brief review of state-of-the-art human detection algorithms. Section 3 describes the databases used for the tests. The proposed method, HoGG, is detailed in Section 4. The alternative methods and the experiments are shown in Section 5. Section 6 shows the results and, finally, the conclusions are presented in Section 7.

Section snippets

Overview of human detection

Several works dealing with human detection have been published in the last few years. These works can be classified into two different categories: those that use feature extraction windows and those that use human body models, templates or any other human-figure pattern [30].

First, the window-based feature extraction methods are discussed. In [35], the well-known Viola & Jones method (V&J) is presented. Although this is a robust and extremely rapid object detection method focused on face

Databases description

There have been numerous distributed efforts to standardize the performance evaluation of computer vision-based surveillance algorithms [27]. In this paper, the databases: PETS 2006, PETS 2007, PETS 2009 and CAVIAR have been considered for the testing of human detection algorithms. These databases are recognized as standard in the research of human detection and identification. They were all obtained in public spaces: a mall, a train station, an airport and a park (outdoor), and the different

HoGG: Gabor and HoG based human detection

In this section, the proposed method: Gabor and Histogram of Oriented Gradients based human detection (HoGG) is presented. The objective of this work is to develop an effective and efficient algorithm for human detection. The proposed method is based on the extraction of features by using Gabor filters and the HoG algorithm. Fig. 2 shows a general scheme of the HoGG method. First, the convolution of the images are built using a Gabor filter bank. Next, the HoG algorithm is applied to the

Experiments

In order to evaluate its performance, the HoGG method will be compared with a variety of human detection algorithms, including some of the better-known ones: Rapid Object Detection using a Boosted Cascade of Simple Features (V&J) [35], Improvements of Object Detection using Boosted Histograms [21], and Histograms of Oriented Gradients for Human Detection [5]. In the case of the V&J algorithm, two variations have been considered. In the first variation (V&J-1), the algorithm is trained with a

Results and discussion

In this section, the performance of HoGG in the databases described in Section 3 is compared with the alternative methods presented in Section 5. The results of the three evaluations described in Section 5.1 are then introduced. First, a general performance of each method is presented in a Precision–Recall curve, and then more detailed results are shown in terms of the optimal F-measure considering all the databases.

Fig. 3 shows the Precision–Recall curve of each human detection method using a

Conclusions

A novel human detection method for surveillance scenarios in non-controlled environments has been presented in this paper. The HoGG human detection method is based on the use of Histogram of Oriented Gradients and the extraction of information by Gabor filters. The performance of the presented method has been compared with several state-of-the-art algorithms: Rapid Object Detection using a Boosted Cascade of Simple Features, Improvements of Object Detection using Boosted Histograms, and

Acknowledgments

This research is supported by the Spanish Government: VULCANO (TEC2009-10639-C04-04) and SIBAR (TSI-020100-2010-574). We thank the anonymous reviewers and the editor for their suggestions.

Cristina Conde received the M.E. Physics degree from University Complutense, Madrid, Spain, in 1999. And the Ph.D. degree in Computer Science from the University Rey Juan Carlos, Madrid, Spain, in 2006. Her primary research interests include Computer Vision, Image Processing, Biometrics, Pattern Recognition, Intelligent Traffic Systems and Bio-inspired computer Systems.

References (39)

  • M. Enzweiler, D. Gavrila, Monocular pedestrian detection: survey and experiments, Trans. Pattern Anal. Mach. Intell....
  • J. Ferryman, D. Tweed, Overview of the PETS2007 challenge, in: 10th IEEE International Workshop on PETS, 2007, pp....
  • W.T. Freeman et al.

    The design and use of steerable filters

    IEEE Trans. Pattern Anal. Mach. Intell.

    (1991)
  • D. Gavrila, Pedestrian detection from a moving vehicle, in: ECCV, 2000, pp....
  • D. Gerónimo et al.

    Survey of pedestrian detection for advanced driver assistance systems

    Trans. Pattern Anal. Mach. Intell.

    (2010)
  • R.C. Holte

    Very simple classification rules perform well on most commonly used datasets

    Mach. Learn.

    (1993)
  • W. Hu et al.

    A survey on visual surveillance of object motion and behaviors

    Trans. Syst. Man Cybern.

    (2004)
  • INRIA, 2005, INRIA Person Dataset. Online:...
  • W. Jiang et al.

    Efficient edge detection using simplified Gabor wavelets

    IEEE Trans. Syst. Man Cybern. Part B

    (2009)
  • Cited by (56)

    • A visual cognizance based multi-resolution descriptor for human action recognition using key pose

      2019, AEU - International Journal of Electronics and Communications
      Citation Excerpt :

      The poselet information is computed using poselet activation vectors, and contextual information is computed using sparse coding of foreground and background. A sequential approach for detection of human action in the non-controlled environment is presented by Conde et al. [25]. In this they use Gabor filter for preprocessing and enlightening the inherent information of the human pose and a further histogram of oriented gradient (HOG) descriptor is applied for extraction of shape and appearance information.

    • Density-wise two stage mammogram classification using texture exploiting descriptors

      2018, Expert Systems with Applications
      Citation Excerpt :

      Applying a Gabor filter for feature extraction on the whole mammogram patch is not useful since abnormalities are usually very local. The combination of HOG and Gabor filter is not new (see Conde, Moctezuma, De Diego, & Cabello, 2013; Ouanan, Ouanan, & Aksasse, 2015; Xu, Quan, & Ren, 2015). However, none of these works have applied this combination to mammogram patch classification.

    • Deep feature based contextual model for object detection

      2018, Neurocomputing
      Citation Excerpt :

      In the history of computer vision, a number of approaches [13–22] have exploited contextual information in order to boost the performance on the object detection problem. Nevertheless, most of these methods leverage hand-crafted features such as Gabor [23], Gist [13] or HOG [24–26] to represent the input image. Recently, the convolutional neural networks (CNN) have achieved great success in computer vision tasks such as image classification [27], which inspired us to employ the powerful CNN model to devise a novel contextual model.

    • Multimedia datasets for anomaly detection: a review

      2023, Multimedia Tools and Applications
    • OAF-Net: An Occlusion-Aware Anchor-Free Network for Pedestrian Detection in a Crowd

      2022, IEEE Transactions on Intelligent Transportation Systems
    View all citing articles on Scopus

    Cristina Conde received the M.E. Physics degree from University Complutense, Madrid, Spain, in 1999. And the Ph.D. degree in Computer Science from the University Rey Juan Carlos, Madrid, Spain, in 2006. Her primary research interests include Computer Vision, Image Processing, Biometrics, Pattern Recognition, Intelligent Traffic Systems and Bio-inspired computer Systems.

    Daniela Moctezuma is a Ph.D. student from Rey Juan Carlos University since 2009. She received the B.S. in Computer Sciences from Technology Institute of Los Mochis, Sinaloa, Mexico, in 2006. And the M.S. degree from National Center of Research and Technologic Development, Cuernavaca, Mexico, in 2009. Her primary research interests include Computer Vision, Video Surveillance Systems, Soft-Biometrics, Pattern Recognition and Image Processing.

    Isaac Martín de Diego is an associate professor with the Face Recognition and Artificial Vision (FRAV) group at the University Rey Juan Carlos since 2006. He received a Ph.D. in Mathematical Engineering and a B.Sc. in Statistics from the University Carlos III of Madrid. Before joining the FRAV group, he was a teaching assistant at the University Carlos III of Madrid, during 1999–2005. His research interests include Pattern Recognition techniques, Representation of the Information, Combination of Information, and Data Mining.

    Enrique Cabello received the B.S. in Physics (Electronics) from the University of Salamanca and the Ph.D. degree from the Polytechnical University of Madrid (both in Spain). Was awarded with the Extraordinary Prize as one of the best Ph.D. thesis. In 1990, he joined the University of Salamanca, where he was an Assistant Professor with the School of Sciences. From 1998 he is at the University of Rey Juan Carlos (in Spain). Since 2001 he is the Coordinator of the FRAV research group. He has main research in projects funded by national and international institutions and companies. His research interest includes image and video analysis, pattern recognition and machine learning.

    View full text