Elsevier

Signal Processing

Volume 124, July 2016, Pages 72-80
Signal Processing

A unified model sharing framework for moving object detection

https://doi.org/10.1016/j.sigpro.2015.10.011Get rights and content

Highlights

  • The sharing mechanism realizes many-to-one correspondence between pixels and models.

  • The model sharing framework reduces the number of models and enhances precision.

  • The model sharing framework embed existing approaches and improve their performance.

Abstract

Millions of surveillance cameras have been installed in public areas, producing vast amounts of video data every day. It is an urgent need to develop intelligent techniques to automatically detect and segment moving objects which have wide applications. Various approaches have been developed for moving object detection based on background modeling in the literature. Most of them focus on temporal information but partly or totally ignore spatial information, bringing about sensitivity to noise and background motion. In this paper, we propose a unified model sharing framework for moving object detection. To begin with, to exploit the spatial-temporal correlation across different pixels, we establish a many-to-one correspondence by model sharing between pixels, and a pixel is labeled into foreground or background by searching an optimal matched model in the neighborhood. Then a random sampling strategy is introduced for online update of the shared models. In this way, we can reduce the total number of models dramatically and match a proper model for each pixel accurately. Furthermore, existing approaches can be naturally embedded into the proposed sharing framework. Two popular approaches, statistical model and sample consensus model, are used to verify the effectiveness. Experiments and comparisons on ChangeDetection benchmark 2014 demonstrate the superiority of the model sharing solution.

Introduction

Nowadays an increasing number of surveillance cameras are installed in public places and produce massive video data every day. Therefore, efficiently detecting and analyzing objects of interest, such as person and vehicle, across large scale surveillance videos is of great significance and has become a hot spot of research in the field of computer vision. As a fundamental task in video processing, moving object detection has been widely investigated and background subtraction is the most common solution, which distinguishes moving objects (called foreground) from the scene (called background) typically on the basis of appearance modeling and updates in local or global areas. As the preprocessing step of the whole video process, background subtraction heavily affects the subsequent steps and the overall results.

Over the past years, various approaches, benchmarks and libraries have been developed, which witnesses the importance of background subtraction. Background subtraction can be approached in many different ways. Toward a convenient and high-speed implementation, most modern approaches of background subtraction are based on pixel level modeling. Among these approaches, temporal information is fully utilized to build background models. On the assumption that a pixel is irrelevant to its adjacent pixels, pixel based background subtraction is widely investigated, such as Gaussian Mixture Model (GMM) [1], [2], Kernel Density Estimation (KDE) [3], non-parametric approaches based on sample consensus: Visual Background Extractor (ViBe) [4], Pixel-Based Adaptive Segmenter (PBAS) [5] and Self-Balanced SENsitivity SEgmenter (SuBSENSE) [6]. Although pixel based approaches are effective and easy to bootstrap, they ignore the spatial relationship between pixels, thus they are sensitive to the noise and background motion. Traditional pixel based approaches establish and update a model for each pixel with historical values of this pixel. Ignoring the spatial relationship between pixels, a pixel is will be classified to foreground when its value is changed due to background noise or local motion. For example, the pixels of tree leaves are classified to foreground for pixel based approaches when the tree shakes with wind. Without considering the spatial correlation between adjacent pixels, pixel based models are not robust to noise and background motion. In order to exploit the information around one pixel, some region based approaches [7], [8] were proposed by incorporating the adjacent pixels around a central pixel. This context information enhances the robustness for background noise and illumination, e.g., block descriptors [8] and local binary similarity patterns (LBSP) [9]. Some other approaches [10], [11] clustered pixels into different classes to build models. However, the region based approaches are usually sensitive to the region size and the complexity of the video scene, which inevitably leads to precision loss.

Based on our wide observation, it is not necessary to build a background model for all positions since a model can be easily shared by the neighbor pixels with similar appearance. Pixel based models establish superfluous models and are sensitive to background noise and movement, while region based models suffer from region size and precision loss. To fully exploit the spatial-temporal correlation across different pixels and accurately find a model for each pixel, we propose a novel framework to learn shared models for moving object detection. On one hand, we argue that a model can be dynamically shared by different pixels in different frames because adjacent pixels have similar pattern in space and time. It is not necessary to build a background model for each pixel in a texture-consistent region. This kind of sharing framework could dynamically associate between pixels and models to reduce the total number of models. On the other hand, existing approaches can be naturally embedded into the model sharing framework. The pixel based background approaches such as GMM [1] and ViBe [4] can be embedded into our framework to exploit the relation of model sharing whatever feature or model are utilized. While region or block based approaches cannot be embedded. The reason is that our framework can realize the many-to-one sharing mechanism between pixels and models without influencing the feature or mathematical representation of background models. To verify the effectiveness of the proposed framework we apply it on the statistical model and the sample consensus model. With the sharing framework, the noises caused by local small movements can be effectively eliminated by dynamically searching for a shared model around. Meanwhile the number of models is reduced remarkably and the performance of shared models is superior to original models.

Section snippets

Related work

As a fundamental task for image classification [12], [13], [14], [15], saliency object detection [16], image annotation [17], Person re-identification [18], lots of approaches for background modeling have been developed in the literatures. Among various background modeling approaches, the GMM is known to be effective in sustaining background variations. GMM [1] was a typical representative for pixel based approaches, which assumed that historical color intensities at each pixel can be modeled

Model sharing framework

To exploit the spatial-temporal correlation of pixels, we propose a unified model sharing framework for moving object detection. A sharing mechanism is presented to model the many-to-one relationship between pixels and models, and each pixel dynamically searches the best matched model in the neighborhood. Furthermore, the shared models are updated through a randomly sampling strategy, i.e., randomly selecting a pixel that matches the shared model for update. The overall flow of the proposed

Shared statistical model

GMM is a classic representative for statistical approaches for moving object detection, which assumes that the temporal evolvement for a pixel is modeled by a mixture of Gaussians. By a sharing mechanism, we can easily embed the statistical model into our sharing framework. Specifically, the shared models are built for both background and foreground. Each pixel dynamically searches for the optimal model from neighboring models according to its color feature. Thus the noises resulted by local

Shared sample consensus model

Moving object detection with sample consensus is a non-parametric approach, where each model is represented by a sequence of historical samples based on sample consensus like Vibe [4], PBAS [5], and SuBSENSE [6]. Similar to shared statistical models, the sample consensus model can also be embedded into the sharing framework. For a given pixel, we extract the color and texture features. Then we dynamically search a matched model for each pixel around the shared region. The sharing framework

Experiments

To evaluate the performance of the proposed model sharing framework, we perform the experiments on the public ChangeDetection benchmark 2014 [21], which provides a realistic, camera-captured, diverse set of videos. A total of 53 video sequences with human labeled ground truth are used for testing. The video sequences are separated into 11 categories based on different types of challenges shown in Table 4, Table 5.

Conclusion

To fully exploit the spatial-temporal correlation across different pixels, we propose a simple but effective framework to learn sharing models for moving object detection. Through dynamically establishing many-to-one relationship between pixels and models, we allow pixels having similar feature to share the same model. Using the sharing framework, the noises resulted by local small movements can be effectively eliminated, and the number of models is reduced remarkably. To verify the

Acknowledgments

This work was supported by 863 Program 2014AA015104, and National Natural Science Foundation of China 61273034, and 61332016.

References (24)

  • S. Chris, W.E.L. Grimson, Adaptive background mixture models for real-time tracking, in: CVPR, vol. 2, IEEE, Fort...
  • Z. Zoran, Improved adaptive Gaussian mixture model for background subtraction, in: ICPR, vol. 2, IEEE, Cambridge, UK,...
  • E. Ahmed, H. David, D. Larry, Non-parametric model for background subtraction, in: ECCV, Springer, Dublin, Ireland,...
  • B. Olivier, V.D. Marc, Vibe: a powerful random technique to estimate the background in video sequences, in: ICASSP,...
  • M. Hofmann, P. Tiefenbacher, G. Rigoll, Background segmentation with feedback: the pixel-based adaptive segmenter, in:...
  • P.-L. St-Charles et al.

    Subsensea universal change detection method with local adaptive sensitivity

    IEEE Trans. Image Process.

    (2015)
  • X. Fang, W. Xiong, B. Hu, L. Wang, A moving object detection algorithm based on color information, in: Journal of...
  • S. Varadarajan, P. Miller, H. Zhou, Spatial mixture of Gaussians for dynamic background modelling, in: AVSS, IEEE,...
  • G.-A. Bilodeau, J.-P. Jodoin, Change detection in feature space using local binary similarity patterns, in: CRV, IEEE,...
  • H. Bhaskar et al.

    Automatic Target Detection Based on Background Modeling Using Adaptive Cluster Density Estimation

    (2007)
  • B. Valentine et al.

    An efficient, chromatic clustering-based background model for embedded vision platforms

    Comput. Vis. Image Underst.

    (2010)
  • Y. Luo et al.

    Manifold regularized multitask learning for semi-supervised multilabel image classification

    IEEE Trans. Image Process.

    (2013)
  • Cited by (13)

    View all citing articles on Scopus
    View full text