A model for dynamic object segmentation with kernel density estimation based on gradient features
Introduction
The dynamic object segmentation techniques are necessary in many vision surveillance applications. Because the scenes in real videos are complicated, the research for high-quality segmentation methods is challenging. In recent years, related research has shown some effective methods in this field. The mixture-of-Gaussians method [6], [7], [8] is the typical example of parametric non-predictive methods, and this method can adapt to continuous background changes. However, the adaptability of background changes of this method is limited by the fixed number of experiential background types. The segmentation methods based on kernel density estimation [2], [3] are non-parametric non-predictive methods, and they can adapt to more complex background changes than the mixture-of-Gaussians methods, because the number of background types is dynamic and based on recent samples. Besides the mixture-of-Gaussians and kernel density estimation models, some other probabilistic background models also have been presented [5].
Dynamic cast shadows and reflection images in videos are considered as fake objects in most applications, and they deteriorate the segmentation quality seriously in many presented methods. There are mainly two kinds of approaches for cast shadow suppression. One is to eliminate cast shadow regions with object shape structure [9], and the other is to classify cast shadow pixels into background with pixel color space [10], [14]. Kernel density estimation methods based on pixel color space can suppress the dynamic cast shadows effectively [1]. But for intensity videos, the methods of conventional kernel density estimation cannot suppress cast shadows because of its inherent requirement for color. Besides, for reflection images, these methods cannot suppress them at all. Furthermore, because reflection images do not always obviously exist in all videos, most presented methods do not involve them. In order to solve these problems, we provide a different kernel density estimation model based on dynamic gradient features.
Four pixel features used in kernel density estimation models will be described in this paper. These features are intensity, color, gradient magnitude, and gradient direction. The last two features together may be labeled as gradient or gradient features. Three kernel density estimation models will be described and compared in this paper. The first one is the intensity model based on pixel intensity [1]. This model is a basic one which cannot suppress cast shadows and reflection images, but it provides a basic segmentation theory for the other two models. The second one is the rgs model based on pixel color and intensity [1]. This model combines color into the intensity model and it can suppress cast shadows in color videos. The third one is the new model presented in this paper, which is based on pixel gradient and intensity. We refer to it as xgs model here. This model can suppress cast shadows in intensity videos. Furthermore, under the circumstance of diffusion which is common in natural scenes, it can suppress parts of reflection images in intensity videos, whereas the reflection images are ignored in both intensity and rgs models.
In fact, the kernel density estimation has been used in edge extraction already [4], but in this paper we are going to combine gradient computation and kernel density estimation for dynamic object segmentation. Furthermore, our research about the xgs model is different from the known spatial–temporal background modeling works, such as reference [11], [12]. First, in spatial–temporal background model, the spatial gradient and the temporal gradient are considered as two elements of one integral vector, and they are not suitable to be detached for analysis [11]. However, in xgs model, the research focus is the spatial gradient and the change of the spatial gradient over time. Secondly, the purpose of the spatial–temporal background model aims to solve problems such as object segmentation with dropped frames, camera motions, weak assumptions about object appearance, instead of the cast shadow and reflection image suppression, which are the main purpose of the xgs model.
We will organize the content as follows. In Section 2, we describe the intensity model and rgs model presented in reference [1] in brief. In Section 3, we will describe the gradient properties of cast shadow and reflection image suppression, and we will compare probability estimations of different pixel features in different situations. In Section 4, we will introduce the xgs model. In Section 5, we present the experimental results based on real videos, and we will draw conclusion from our research work.
Section snippets
Intensity model
Let {x1, x2 ,… , xN} be an intensity sample set of a pixel, and each sample is from the corresponding frame in a video. Based on the sample set, we can represent an estimate of the pixel intensity pdf (probability density function) [1]. Given the intensity observation xt of this pixel at time t, we can estimate the probability of this observation as the following formula [1]:The Gaussian in formula (1) is a kernel function. This formula presents the basic idea of
Analysis and comparison of dynamic pixel features
In this section, we will compare the probability estimations of four pixel features in different situations, which are the probability estimations of intensity, color vector, gradient magnitude, and direction. The gradient features here are all intensity gradient features, and we adopt the classic definitions of the gradient magnitude and the gradient direction at a pixel. The gradient magnitude is represented as |g|. The gradient direction is a unit vector with two dimensions, we represent it
Kernel density estimation model based on dynamic gradient features
In this section, we present a new kernel density estimation model, the xgs model. The probability estimations of intensity, gradient magnitude, and gradient direction are adopted in the xgs model, and we label these three probability estimations as Prs, Prg, and Prx separately. The functions of the three pixel features in the xgs model are summarized in Table 1. The pixel displacement probability PN and component displacement probability PC based on the intensity sample set, which are defined
Experiment results analysis and conclusion
In Fig. 8, we provide eight segmentation results of different models based on eight different video clips. The segmentation results of rgs model are based on color videos, and that of intensity and xgs model are both based on intensity videos. In experiments, fixed thresholds are applied to all possibility estimations in the intensity, rgs, and xgs models. The thresholds of Prs, PN, and PC are the same in the intensity, rgs, and xgs models. And, in rgs model, the probability estimation
Acknowledgement
We are grateful to the science foundation of Sichuan province (2006J13-092) of Republic of China for financial support.
References (14)
- A. Elgmmal, R. Duraiswami, D. Harwood, L.S. Davis, Background and foreground modeling using nonparametric kernel...
- A. Mittal, N. Paragios, Motion-based background subtraction using adaptive kernel density estimation, in: CVPR’04, vol....
- A. Elgmmal, R. Duraiswami, L.S. Davis, Efficient non-parametric adaptive color modeling using fast gauss transform, in:...
- G. Economou, A. Fotinos, S. Makrogiannis, S. Fotopoulos, Color image edge detection based on nonparametric density...
- J. Rittscher, J. Kato, S. Joga, A. Blake, A probabilistic background model for tracking, in: ECCV’00, Springer-Verlag,...
- C. Stauffer, W.E.L. Grimson, Adaptive background mixture models for realtime tracking, in: CVPR’99, vol. 2, IEEE...
- W.E.L. Grimson, C. Stauffer, R. Romano, L. Lee, Using adaptive tracking to classify and monitor activities in a site,...
Cited by (13)
On the role and the importance of features for background modeling and foreground detection
2018, Computer Science ReviewCitation Excerpt :ADM achieves similar results than ViBe [347] and PBAS [463]. In other works, Li et al. [464] and Gu et al. [465] used the gradient feature with the KDE model [440]. The reader can see how the color features and edge features are fused in these different approaches in Section 17.
First-order kernel density estimation of abdomen medical image intensity and spatial information and application to segmentation
2014, OptikCitation Excerpt :Many medical image features are extracted based on the density estimate and are used for image retrieval [6], image classification [7] and image categorization [8]. Most published methods estimate the density function using histograms [9,10] or kernel density estimators [11,12] (often called Parzen Windows estimators). These approaches have the advantage of being nonparametric, so they are generally applicable.
A nonparametric Riemannian framework on tensor field with application to foreground segmentation
2012, Pattern RecognitionCitation Excerpt :One major disadvantage of these block-based methods is that the boundary of the foreground objects cannot be delineated exactly. In recent years, researchers have been concentrating more on incorporating spatial aspect into background modeling to take advantage of the correlation that exists between neighboring pixels [34]. Thus, the background model of a pixel also depends on its neighbors.
Squeezing the DCT to Fight Camouflage
2020, Journal of Mathematical Imaging and VisionExtracting rooftops from remote sensing images using both top-down and bottom-up processes
2017, Remote Sensing LettersForeground detection with simultaneous dictionary learning and historical pixel maintenance
2016, IEEE Transactions on Image Processing