Joint global–local information pedestrian detection algorithm for outdoor video surveillance

https://doi.org/10.1016/j.jvcir.2014.11.009Get rights and content

Highlights

  • Our paper fully exploits global information to improve recognition performance.

  • LBP extracted from image low-frequency part is used to suppress interference.

  • Topology structure of human body is applied to LBP to make it discriminative.

  • Our algorithm is suitable for small-scale pedestrian for outdoor surveillance.

Abstract

The pedestrian size is usually small in practical outdoor surveillances. The small-scale pedestrian detection for outdoor surveillances is an important but difficult issue due to the limited information and the background interference. According to human cognition, the global information is important for the pedestrian detection. Therefore, a joint global–local information pedestrian detection algorithm is proposed to fully exploit and utilize the global information. The LBP feature is explicitly extracted from the low-frequency component of original images, which are utilized as the global information to suppress the background interference and enrich the description of pedestrian. Moreover, a structure-LBP is proposed to apply the inherent topology structure of human body to LBP. The structure-LBP feature extracted from original images can achieve a more discriminative description of pedestrians compared with the original LBP. The experimental results demonstrate that the proposed algorithm can improve the overall recognition performance for the small-scale pedestrians.

Introduction

Pedestrian detection is one of the active research areas in recent years and has been widely applied in video surveillances [1], collision avoidance on vehicles [2], robotics applications and advanced assistive technology for the visually impaired. Its major purpose is to automatically detect persons in one video and it is a fundamental technology for flow statistics of people in station and human activities analysis [3], which are important for outdoor video surveillance. However, for the outdoor video surveillance, the size of pedestrians is usually small and it is not as large as the size which the current pedestrian detection researches focus on (as shown in Fig. 1). This limitation makes a challenge for the current pedestrian detection algorithms [4], [7], [8], [10], [13], [14], [15], [16], [17], [18], [19], [20], [21], [22], [23], [24], [25], [44], [45], [46], [47], [48], [52], [53] to be directly implemented for the practical applications.

Recently, many pedestrian detection algorithms have been proposed [4], [7], [8], [10], [13], [14], [15], [16], [17], [18], [19], [20], [21], [22], [23], [24], [25] and they can be divided into three categories, namely the human contour modeling algorithms [13], [26], the template matching algorithms [14], and the statistical classification algorithms [4], [7], [8], [10], [13], [22]. Compared with the other two algorithms, the statistical classification algorithms utilize machine learning theory to obtain a pedestrian classifier with a low computational complexity and good generalization. The current literature of statistical classification on pedestrian detection is typified by feature extraction, followed by a trainable classifier such as SVM [5], [9], boosted classifiers [10]. Note that the proposed algorithm in this paper belongs to the statistical classification algorithms.

Feature extraction is one of the key issues in the statistical classification algorithms. The common features include the gray-scale feature (e.g., Haar [19]), the texture feature (e.g., the Local Binary Pattern, LBP [29], [34]), and the shape feature [4], [28], [32] (e.g., the Histogram of Oriented Gradients, HOG [4]). Considering the fact that different features contain different information [33], these features are usually combined to achieve effective feature description over single feature [7], [8], [27], [29].

However, the current researches mainly focus on the public dataset (e.g., MIT [12], INRIA [4], Caltech [11]), wherein the size of pedestrians is relatively big compared to the actual size of pedestrians in practical surveillance systems. Note that the pedestrian bigger than 18  36 pixels and smaller than 25  50 pixels is considered as the small-scale pedestrian and the pedestrians bigger than 25  50 pixels are called the normal-scale pedestrians in this paper. The pedestrian size comparison is shown in Fig. 2. Although the current researches can be directly implemented for the small-scale pedestrians [6], [7], [8], the direct implementation fails to achieve the appropriate performance. According to the experimental result of [22], the best feature detector can achieve 21% mean miss-rate for the normal-scale pedestrians in Caltech Pedestrian Benchmark [11], while the mean miss-rate will rapidly increase to 73% for the small-scale pedestrians.

Therefore, the performance decline with decreasing scale is the major bottleneck for current pedestrian detection researches [22], [56]. There are two major reasons, which will be analyzed as follows.

Firstly, in practical surveillance systems, the small-scale pedestrian is captured at a long distance. The small-scale pedestrians incline to be blurry due to the long distance and the background interference. Thus, a small amount of features will be extracted when the conventional methods aiming at the normal-scale pedestrians are directly implemented for the small-scale pedestrians. This can be observed from Fig. 2, wherein the amount of extracted LBP features [29], [34] on pedestrians of two different scales is compared. It is obvious that the extracted LBP features on the normal-scale pedestrians are plentiful and discriminative when compared with the extracted LBP features on the small-scale pedestrian. Therefore, it is necessary to further exploit the limited information and make full use of it to improve the recognition performance of the small-scale pedestrian.

Secondly, the conventional features are proposed for the normal-scale pedestrians. However, different features are specifically designed to represent different characteristics and not all kinds of features can be directly applied to describe the characteristic of the small-scale pedestrians. As shown in Fig. 3, the HOG feature is good at describing the normal-scale pedestrians, while it fails to describe the small-scale pedestrians well due to degradation of the edge/local shape information. According to our experimental results, the LPB feature can achieve a relatively better performance for the small-scale pedestrians by comparing with other features. However, the direct implementation of LBP feature for small-scale pedestrian detection is still far from the requirement of practical applications. The detailed analysis will be given in the next section.

Note that human can recognition pedestrians from a relatively long distance regardless of the background interference, which provide an inspiration to improve the recognition performance of the small-scale pedestrians. According to the current researches, human cognition process is a process of starting from the global scope to the local part [39], [40], [41]. To be specific, a coarse analysis is made through a wide range of global information and a fine analysis is carried out by capturing the local details. This is because the global information (e.g., structures, contours, and topologies) possesses the salient characteristics of pedestrian and plays an important role in the pedestrian detection [39].

However, the global information is underutilized in the current researches. On the one hand, there is no generally accepted high-level feature which can accurately represent the structures, contours, and topologies. In most current researches, the high-level features are indirectly described based on the low-level features (e.g., HOG, LBP) and the extraction of the high-level features from the low-level features is implicitly finished through the classifier training.

On the other hand, there are several algorithms [5], [6], [20], [24], [30] using part model based on inherent structural information to classify pedestrians. And also, the head-shoulder structure [57] is a widely-used global feature due to its stability. But these algorithms tend to resolve the issue of the normal-scale pedestrian detection and fail on the small-scale pedestrians. Moreover, in terms of the small-scale pedestrians, the original images are usually blurred and they easily mix the global and the local information. Then the global information may be submerged due to the noise and blurring. It cannot discriminate the global information to directly extract features from the original images.

Therefore, it is important to fully exploit and efficiently utilize the global information. This paper proposes a joint global–local information pedestrian detection algorithm aiming at improve the performance of the small-scale pedestrian detection. The proposed algorithm mainly contains two contributions. Firstly, the global–local integrated pedestrian detection model is proposed to separate the low-frequency component from the original images through the Gaussian low-pass filtering and to extract LBP feature from the low-frequency component. The extracted features are explicitly utilized as the global information and they are integrated into local features to enrich feature description of the small-scale pedestrians. The proposed global–local integrated pedestrian detection model can suppress the background interference. Secondly, according to pedestrian prior knowledge, a structure-LBP detector is proposed by applying the inherent topology structure of human body, which combines the local features and the spatial information to enhance the discrimination of high level features.

The experimental results demonstrate that when compared with the state-of-the-art, the proposed algorithm achieves the accuracy of 86.15% (improved by 15.96% on average), and meanwhile, holds the relatively high precision of 84.09% (improved by 3.82% on average). Moreover, the proposed algorithm can achieve a relatively high recall rate of 81.49% (improved by 33.78% on average). The AUC, the numeric reflection of ROC curve, reaches 92.47% (improved by 15.46% on average). On the whole, our algorithm can achieve high accuracy, recall rate and AUC percentage, and meanwhile, can maintain relatively high precision, indicating a stable and discriminative recognition performance.

It should be noted that the proposed algorithm tends to be a part of pedestrian detection in practical video surveillance, wherein the pedestrian detection is usually not isolated and can benefit for relatively matured background modeling algorithms. Thus, this paper abandons the conventional sliding window method [22], which produces numerous background window candidates for the detection stage. Instead, the existing motion detection algorithm is applied to obtain the moving targets candidates only, which can exclude static targets and thus improve detection accuracy and efficiency.

The paper is organized as follow. Starting with the descriptions of related work in Section 1, we present the global–local integrated pedestrian detection model (GLI) in Section 2, the proposed structure-LBP in Section 3. The implementation of the proposed algorithm is introduced in Section 4. The performance of the proposed algorithm is evaluated in Section 5. Two discussions are given in Section 6. And the paper is concluded in the final section.

Section snippets

Global–local integrated pedestrian detection model

In the practical surveillance systems, the actual size of the pedestrians is relatively small. However, the existing pedestrian detection algorithms are proposed aiming at the normal-scale pedestrians. The direct implementation of the existing algorithms for the small-scale pedestrians will result in a poor performance of recognition. This can be confirmed through the following experiments, wherein the experimental setting is the same as that described in Section 5. The experimental results are

The proposed structure-LBP

Since this paper focus on the small-scale pedestrian detection, different kinds of features have different effectiveness for the small-scale pedestrians. The shape features (e.g., HOG) perform well on big-scale pedestrians while fail on the small-scale pedestrians due to the degradation of the edge/local shape information. The gray-scale features (e.g., Haar) fail to obtain the appropriate performance on the large-scale pedestrians due to the variation of human pose, compared with its excellent

The summary of the proposed algorithm

In practical video surveillance, the pedestrian detection is usually not isolated and can benefit for relatively matured background modeling algorithms. Thus, this paper abandons the conventional sliding window method, which produces numerous background window candidates for the recognition stage, and applies the existing motion detection algorithm to obtain the moving targets candidates only, largely excluding non-pedestrian static targets and thus improving detection accuracy and efficiency.

Experiments

In this section, the performance of the proposed algorithm is fully evaluated by comparing with seven algorithms. Four of them are single-feature algorithms, namely Haar (labeled as VJ) [56], LBP [29], HOG [4] and part-based HOG (label as PartHOG) [5]. The other three multi-feature algorithms, namely Hoglbp [8], MultiFtr-Css [1], VeryFast [23], are proposed recently. Our approach is tested on an Intel Pentium D2.00 GHz CPU and 2G RAM. We evaluated the proposed algorithm based on 4 public

Discussions

This section will present three discussions about the proposed algorithm. Firstly, the proposed algorithm focuses on the target recognition after the target detection. It can be combined with conventional target detection methods. For the data preparation, this paper abandons the conventional sliding window method, because it expends a great deal of time to produce hundreds of thousands of background window candidates for the detection [22]. On the other hand, the pedestrian detection is usually

Conclusion

The global information is important for the small-scale pedestrian detection according to human cognition. The proposed joint global–local information pedestrian detection algorithm fully exploits and utilizes the global information to achieve a detection performance improvement. In order to suppress the interference of the local information, the GLI is proposed to extracted LBP features on not only the original images, but also two scale low-frequency components. Moreover, the structure-LBP is

Acknowledgments

This work was partially supported by the National Science Fund for Distinguished Young Scholars (No. 61125206), the National Natural Science Foundation of China (No. 61370121), the National Hi-Tech Research and Development Program (863 Program) of China (No. 2014AA015102), and Outstanding Tutors for doctoral dissertations of S&T project in Beijing (No. 20131000602).

References (59)

  • P. Felzenszwalb et al.

    Object detection with discriminatively trained part based models

    IEEE Trans. Pattern Anal. Mach. Intell. (PAMI)

    (2010)
  • W.R. Schwartz, A. Kembhavi, D. Harwood, L.S. Davis, Human detection using partial least squares analysis, in:...
  • S. Walk, N. Majer, K. Schindler, B. Schiele, New features and insights for pedestrian detection, in: IEEE Conference on...
  • X. Wang, T.X. Han, S. Yan, An HOG-LBP human detector with partial occlusion handling, in: International Conference on...
  • S. Maji, A.C. Berg, J. Malik. Classification using intersection kernel support vector machines is efficient, in: IEEE...
  • P. Dolĺar, Z. Tu, P. Perona, S. Belongie, Integral channel features, in: British Machine Vision Conference (BMVC),...
  • P. Dolĺar, C. Wojek, B. Schiele, P. Perona, Pedestrian detection: a benchmark, in: IEEE Conference on Computer Vision...
  • C. Papageorgiou et al.

    A trainable system for object detection

    Int. J. Comput. Vision (IJCV)

    (2000)
  • C. Curio et al.

    Walking pedestrian recognition

    IEEE Trans. Intell. Transp. Syst.

    (2000)
  • D.M. Gavrila

    Pedestrian detection from a moving vehicle

  • D. Hoiem, A.A. Efros, M. Hebert, Putting objects in perspective, in: IEEE Conference on Computer Vision and Pattern...
  • Y. Song, X. Feng, P. Perona, Towards detection of human motion, in: IEEE Conference on Computer Vision and Pattern...
  • C. Papageorgiou et al.

    A trainable system for object detection

    Int. J. Comput. Vision (IJCV)

    (2000)
  • P. Viola et al.

    Detecting pedestrians using patterns of motion and appearance

    Int. J. Comput. Vision (IJCV)

    (2003)
  • K. Mikolajczyk, C. Schmid, A. Zisserman, Human detection based on a probabilistic assembly of robust part detectors,...
  • P. Sermanet, K. Kavukcuoglu, S. Chintala, Y. LeCun, Pedestrian detection with unsupervised multi-stage feature...
  • P. Dollar et al.

    Pedestrian detection: an evaluation of the state of the art

    IEEE Trans. Pattern Anal. Mach. Intell. (PAMI)

    (2012)
  • R. Benenson, M. Mathias, Timofte R., L. Van Gool, Pedestrian detection at 100 frames per second, in: IEEE Conference on...
  • P. Felzenszwalb, R. Girshick, D. McAllester, Cascade object detection with deformable part models, in: IEEE Conference...
  • Cited by (12)

    • Combining keypoint-based and segment-based features for counting people in crowded scenes

      2016, Information Sciences
      Citation Excerpt :

      With occlusions present in the crowded scenes, the first group of the methods, which are based on a model or appearance of human, cannot perform properly due to partial occlusions. To overcome this challenge, some recent approaches tried to utilize more efficient feature sets [14,20,50,51] or robust classifiers [19,25,52]. However, the hypothesis that a distinct visual separation between people can segment individuals in dense crowds will fail.

    • Segmentation by weighted aggregation and perceptual hash for pedestrian detection

      2016, Journal of Visual Communication and Image Representation
      Citation Excerpt :

      In aspect of algorithm optimization, various studies have acquired good results, together with drawbacks on each other. Hu et al. [11] improved the overall recognition performance for the small-scale pedestrians, although it may not achieve an appropriate result when there is occlusion. Schwartz et al. [12] used partial least squares analysis to reduce the feature dimension to a low level, while computational cost.

    • Pedestrian detection based on hierarchical co-occurrence model for occlusion handling

      2015, Neurocomputing
      Citation Excerpt :

      Pedestrian detection is an important topic for practical applications, such as video surveillances [9,29], intelligent vehicles [10], and robot sensing.

    • Pedestrian Detection Using Pixel Difference Matrix Projection

      2020, IEEE Transactions on Intelligent Transportation Systems
    • Motion Deblurring for Pedestrian Crossing Detection in Advanced Driver Assistance System

      2018, 2017 IEEE International Conference on Computational Intelligence and Computing Research, ICCIC 2017
    View all citing articles on Scopus

    This paper has been recommended for acceptance by M.T. Sun.

    View full text