A unified model sharing framework for moving object detection
Introduction
Nowadays an increasing number of surveillance cameras are installed in public places and produce massive video data every day. Therefore, efficiently detecting and analyzing objects of interest, such as person and vehicle, across large scale surveillance videos is of great significance and has become a hot spot of research in the field of computer vision. As a fundamental task in video processing, moving object detection has been widely investigated and background subtraction is the most common solution, which distinguishes moving objects (called foreground) from the scene (called background) typically on the basis of appearance modeling and updates in local or global areas. As the preprocessing step of the whole video process, background subtraction heavily affects the subsequent steps and the overall results.
Over the past years, various approaches, benchmarks and libraries have been developed, which witnesses the importance of background subtraction. Background subtraction can be approached in many different ways. Toward a convenient and high-speed implementation, most modern approaches of background subtraction are based on pixel level modeling. Among these approaches, temporal information is fully utilized to build background models. On the assumption that a pixel is irrelevant to its adjacent pixels, pixel based background subtraction is widely investigated, such as Gaussian Mixture Model (GMM) [1], [2], Kernel Density Estimation (KDE) [3], non-parametric approaches based on sample consensus: Visual Background Extractor (ViBe) [4], Pixel-Based Adaptive Segmenter (PBAS) [5] and Self-Balanced SENsitivity SEgmenter (SuBSENSE) [6]. Although pixel based approaches are effective and easy to bootstrap, they ignore the spatial relationship between pixels, thus they are sensitive to the noise and background motion. Traditional pixel based approaches establish and update a model for each pixel with historical values of this pixel. Ignoring the spatial relationship between pixels, a pixel is will be classified to foreground when its value is changed due to background noise or local motion. For example, the pixels of tree leaves are classified to foreground for pixel based approaches when the tree shakes with wind. Without considering the spatial correlation between adjacent pixels, pixel based models are not robust to noise and background motion. In order to exploit the information around one pixel, some region based approaches [7], [8] were proposed by incorporating the adjacent pixels around a central pixel. This context information enhances the robustness for background noise and illumination, e.g., block descriptors [8] and local binary similarity patterns (LBSP) [9]. Some other approaches [10], [11] clustered pixels into different classes to build models. However, the region based approaches are usually sensitive to the region size and the complexity of the video scene, which inevitably leads to precision loss.
Based on our wide observation, it is not necessary to build a background model for all positions since a model can be easily shared by the neighbor pixels with similar appearance. Pixel based models establish superfluous models and are sensitive to background noise and movement, while region based models suffer from region size and precision loss. To fully exploit the spatial-temporal correlation across different pixels and accurately find a model for each pixel, we propose a novel framework to learn shared models for moving object detection. On one hand, we argue that a model can be dynamically shared by different pixels in different frames because adjacent pixels have similar pattern in space and time. It is not necessary to build a background model for each pixel in a texture-consistent region. This kind of sharing framework could dynamically associate between pixels and models to reduce the total number of models. On the other hand, existing approaches can be naturally embedded into the model sharing framework. The pixel based background approaches such as GMM [1] and ViBe [4] can be embedded into our framework to exploit the relation of model sharing whatever feature or model are utilized. While region or block based approaches cannot be embedded. The reason is that our framework can realize the many-to-one sharing mechanism between pixels and models without influencing the feature or mathematical representation of background models. To verify the effectiveness of the proposed framework we apply it on the statistical model and the sample consensus model. With the sharing framework, the noises caused by local small movements can be effectively eliminated by dynamically searching for a shared model around. Meanwhile the number of models is reduced remarkably and the performance of shared models is superior to original models.
Section snippets
Related work
As a fundamental task for image classification [12], [13], [14], [15], saliency object detection [16], image annotation [17], Person re-identification [18], lots of approaches for background modeling have been developed in the literatures. Among various background modeling approaches, the GMM is known to be effective in sustaining background variations. GMM [1] was a typical representative for pixel based approaches, which assumed that historical color intensities at each pixel can be modeled
Model sharing framework
To exploit the spatial-temporal correlation of pixels, we propose a unified model sharing framework for moving object detection. A sharing mechanism is presented to model the many-to-one relationship between pixels and models, and each pixel dynamically searches the best matched model in the neighborhood. Furthermore, the shared models are updated through a randomly sampling strategy, i.e., randomly selecting a pixel that matches the shared model for update. The overall flow of the proposed
Shared statistical model
GMM is a classic representative for statistical approaches for moving object detection, which assumes that the temporal evolvement for a pixel is modeled by a mixture of Gaussians. By a sharing mechanism, we can easily embed the statistical model into our sharing framework. Specifically, the shared models are built for both background and foreground. Each pixel dynamically searches for the optimal model from neighboring models according to its color feature. Thus the noises resulted by local
Shared sample consensus model
Moving object detection with sample consensus is a non-parametric approach, where each model is represented by a sequence of historical samples based on sample consensus like Vibe [4], PBAS [5], and SuBSENSE [6]. Similar to shared statistical models, the sample consensus model can also be embedded into the sharing framework. For a given pixel, we extract the color and texture features. Then we dynamically search a matched model for each pixel around the shared region. The sharing framework
Experiments
To evaluate the performance of the proposed model sharing framework, we perform the experiments on the public ChangeDetection benchmark 2014 [21], which provides a realistic, camera-captured, diverse set of videos. A total of 53 video sequences with human labeled ground truth are used for testing. The video sequences are separated into 11 categories based on different types of challenges shown in Table 4, Table 5.
Conclusion
To fully exploit the spatial-temporal correlation across different pixels, we propose a simple but effective framework to learn sharing models for moving object detection. Through dynamically establishing many-to-one relationship between pixels and models, we allow pixels having similar feature to share the same model. Using the sharing framework, the noises resulted by local small movements can be effectively eliminated, and the number of models is reduced remarkably. To verify the
Acknowledgments
This work was supported by 863 Program 2014AA015104, and National Natural Science Foundation of China 61273034, and 61332016.
References (24)
- S. Chris, W.E.L. Grimson, Adaptive background mixture models for real-time tracking, in: CVPR, vol. 2, IEEE, Fort...
- Z. Zoran, Improved adaptive Gaussian mixture model for background subtraction, in: ICPR, vol. 2, IEEE, Cambridge, UK,...
- E. Ahmed, H. David, D. Larry, Non-parametric model for background subtraction, in: ECCV, Springer, Dublin, Ireland,...
- B. Olivier, V.D. Marc, Vibe: a powerful random technique to estimate the background in video sequences, in: ICASSP,...
- M. Hofmann, P. Tiefenbacher, G. Rigoll, Background segmentation with feedback: the pixel-based adaptive segmenter, in:...
- et al.
Subsensea universal change detection method with local adaptive sensitivity
IEEE Trans. Image Process.
(2015) - X. Fang, W. Xiong, B. Hu, L. Wang, A moving object detection algorithm based on color information, in: Journal of...
- S. Varadarajan, P. Miller, H. Zhou, Spatial mixture of Gaussians for dynamic background modelling, in: AVSS, IEEE,...
- G.-A. Bilodeau, J.-P. Jodoin, Change detection in feature space using local binary similarity patterns, in: CRV, IEEE,...
- et al.
Automatic Target Detection Based on Background Modeling Using Adaptive Cluster Density Estimation
(2007)
An efficient, chromatic clustering-based background model for embedded vision platforms
Comput. Vis. Image Underst.
Manifold regularized multitask learning for semi-supervised multilabel image classification
IEEE Trans. Image Process.
Cited by (13)
Moving object detection via segmentation and saliency constrained RPCA
2019, NeurocomputingCitation Excerpt :These RPCA based methods, generally, could improve the abilities of RPCA for dealing with dynamic background and slow motion, showing considerable improvement in detection accuracy over the standard RPCA method. There are also many moving object detection methods built upon the frameworks other than GMM and RPCA, e.g., [32–50]. In particular, Elgammal et al. [33] adopted a Kernel Density Estimation (KDE) for background modeling.
Efficient Bias Estimation in Airborne Video Georegistration for Ground Target Tracking
2021, IEEE Transactions on Aerospace and Electronic SystemsA kernel support vector machine based anomaly detection using spatio-temporal motion pattern models in extremely crowded scenes
2021, Journal of Ambient Intelligence and Humanized ComputingMatching Image and Sentence with Multi-Faceted Representations
2020, IEEE Transactions on Circuits and Systems for Video TechnologyA new algorithm for small moving target detection on dynamic water surface
2019, Proceedings of SPIE - The International Society for Optical Engineering