Exploiting Color and Depth for Background Subtraction

Maddalena, Lucia; Petrosino, Alfredo

doi:10.1007/978-3-319-70742-6_24

Exploiting Color and Depth for Background Subtraction

Lucia Maddalena¹⁷ &
Alfredo Petrosino¹⁸

Conference paper
First Online: 31 December 2017

1915 Accesses
12 Citations

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10590))

Abstract

Background subtraction from color and depth data is a fundamental task for indoor video surveillance applications that use data acquired by RGBD sensors. This paper proposes a method based on two background models for color and depth information, exploiting a self-organizing neural background model previously adopted for RGB videos. The resulting color and depth detection masks are combined, not only to achieve the final results, but also to better guide the selective model update procedure. The experimental evaluation on the SBM-RGBD dataset shows that the exploitation of depth information allows to achieve much higher performance than just using color, accurately handling color and depth background maintenance challenges.

You have full access to this open access chapter, Download conference paper PDF

1 Introduction

Low cost RGBD sensors are being successfully used in several indoor video surveillance applications. Many of them rely on a scene background model learned from data for detecting moving objects, to be further processed and analyzed.

Background subtraction from color video data is a widely studied problem, as witnessed by several recent surveys [1, 6, 19, 21]. Main challenges include illumination changes (where the background model should adapt to strong and mild illumination changes), color camouflage (where foreground objects having color very close to the background are hardly segmented), shadows caused by foreground objects occluding the visible light, bootstrapping (where the background model should be properly set up even in the absence of a training set free of foreground moving objects), and the so-called intermittent motion, referring to videos with scenarios known for causing “ghosting” artifacts in the detected motion, i.e., foreground objects that should be detected even if they stop moving (abandoned object) or if they were initially stationary and then start moving (removed object).

Depth data is particularly attractive for background subtraction, since it is not affected by illumination changes or color camouflage; thus some background modeling approaches based only on depth have been proposed [9, 20]. However, depth data suffers from other types of problems, such as depth camouflage (where foreground objects having depth very close to the background are hardly segmented), and out of sensor range (where the sensor produces invalid depth values for foreground or background objects that are too close to/far from it). Moreover, depth data shares with color data other challenges, including intermittent motion, bootstrapping, and shadows caused by foreground objects occluding the IR light coming from the emitter.

Many recent methods try to exploit the complementary nature of color and depth information acquired with RGBD sensors. Generally, these methods either extend to RGBD data well-known background models originally designed for color data [8, 12] or model the scene background (and sometimes also the foreground) based on color and depth independently and then combine the results, on the basis of different criteria [5, 10, 13, 15]

The method proposed in this paper belongs to the latter class of methods. Two background models are constructed for color and depth information, exploiting a self-organizing neural background model previously adopted for RGB videos [18]. The resulting color and depth detection masks are then combined to achieve the final detection masks, also used to better guide the selective model update procedure.

2 RGBD-SOBS Algorithm

The proposed algorithm for background subtraction using RGBD video data exploits the background model constructed and maintained in the SC-SOBS algorithm [18], originally designed for RGB data. It is based on the idea of building a neural background model of the image sequence by learning in a self-organizing manner image sequence variations, seen as trajectories of pixels in time. Two separated models are constructed for color and depth data, and their resulting background subtraction masks are suitably combined in order to update the models and to achieve the final result. In the following, we provide a self-contained description of the color and depth models, referring to [18] for further details on the original neural model, and of the combination criterion.

2.1 The Color Model

Given the color image sequence $\left\{ I_1, \ldots , I_T \right\} $, at each time instant t we build and update a neuronal map for each pixel $\mathbf p $, consisting of $n \!\times \! n$ weight vectors $cm_t^{i,j}(\mathbf p ), i,j\,=\,0, \ldots , n-1$, which will be called the color model for pixel $\mathbf p $ and will be indicated as $CM_t(\mathbf p )$:

$$\begin{aligned} CM_t(\mathbf p ) = \left\{ cm_t^{i,j}(\mathbf p ), \; i,j =0, \ldots , n-1 \right\} . \end{aligned}$$

(1)

If every sequence image has size $N\,\times \,P$, the complete set of models $CM_t(\mathbf p )$ for all pixels $\mathbf p $ of the t-th sequence image $I_t$ is organized as a 2D neuronal map $CB_t$ of size $(n \!\times \! N) \!\times \! (n \!\times \! P)$, where the weight vectors $cm_t^{i,j}(\mathbf p )$ for the generic pixel $\mathbf p =(x,y)$ are at neuronal map position $(n \!\times \! x + i, n \!\times \! y + j)$, $i, j = 0, \ldots , n-1$:

$$\begin{aligned} CB_t(n \!\times \! x + i, n \!\times \! y + j) = cm_t^{i,j}(\mathbf p ), \; i,j =0, \ldots , n-1. \end{aligned}$$

(2)

Although redundant, notations $CM_t$ and $CB_t$ introduced in Eqs. (1) and (2) will both be adopted. Indeed, the color model $CM_t(\mathbf p )$ will be adopted in order to indicate the whole set of color weight vectors for each single pixel $\mathbf p $ at time t, helping to focus on the pixelwise representation of the background model. On the other side, the neuronal map $CB_t$ will be adopted in order to refer to the whole color background model for an image sequence at time t, to highlight spatial relationships among weight vectors of adjacent pixels (see Eq. (7)).

Differently from [18], for color model initialization, we construct a color image CE that is an estimate of the color scene background. Then, for each pixel $\mathbf p $, the corresponding weight vectors of the color model $CM_0(\mathbf p )$ are initialized with the pixel color value $CE(\mathbf p )$:

$$\begin{aligned} cm_0^{i,j}(\mathbf p ) = CE(\mathbf p ), \; \; \; i,j=0, \ldots , n-1. \end{aligned}$$

(3)

Among the several state-of-the-art background estimation methods [2] for constructing CE, in the experiments we have chosen the LabGen algorithm [14], which is one of the best performing on the SBMnet dataset^{Footnote 1}. Specifically, LabGen was run over the first L initial color frames, where L = 100.

At each time step t, color background subtraction sis achieved by comparing each pixel $\mathbf p $ of the t-th sequence frame $I_t$ with the current pixel color model $CM_{t-1}(\mathbf p )$, to determine the weight vector $BM_t^C(\mathbf p )$ that best matches it:

$$\begin{aligned} d(BM_t^C(\mathbf p ), I_t(\mathbf p )) = \min _{i,j=0, \ldots , n-1} d(cm_{t-1}^{i,j}(\mathbf p ), I_t(\mathbf p )), \end{aligned}$$

(4)

For the experiments reported in Sect. 3, the metric $d(\cdot ,\cdot )$ is chosen as the Euclidean distance in the HSV color hexcone as in [18]. The color background subtraction mask for pixel $\mathbf p $ is then computed as

$$\begin{aligned} M^C_t(\mathbf p ) = \left\{ \begin{array}{lll} 1 &{} &{} \mathrm{if } \; \; \; NCF_t(\mathbf p ) \le 0.5\\ 0 &{} &{} \mathrm{otherwise}\\ \end{array}, \right. \end{aligned}$$

(5)

where the Neighborhood Coherence Factor is defined as $NCF_t(\mathbf p )\!=\!|\varOmega _\mathbf p |/|N_\mathbf p |$ [7]. Here $| \cdot |$ refers to the set cardinality, $N_\mathbf p \!=\!\{ \mathbf q \!: |\mathbf p -\mathbf q | \le h \}$ is a 2D spatial neighborhood of $\mathbf p $ having width (2h + 1) $\in \mathbb {N}$ (in the experiments h = 2), and

$$\begin{aligned} \varOmega _\mathbf p = \{ \mathbf q \in N_\mathbf p \!: (d(BM_t^C(\mathbf q ),I_t(\mathbf p )) \le \varepsilon ^C) \vee (shadow(BM_t^C(\mathbf q ),I_t(\mathbf p ))) \}. \end{aligned}$$

(6)

$\varOmega _\mathbf p $ is the set of pixels $\mathbf q $ belonging to $N_\mathbf p $ that either have in their background model a best match that is close enough to their value $I_t(\mathbf q )$ or are shadows of the background. $\varepsilon ^C$ is a color threshold enabling the distinction between foreground and background pixels, while $shadow(\cdot )$ is a function implementing the shadow detection mechanism adopted in [16]. It has been shown that the introduction of spatial coherence enhances robustness of the background subtraction algorithm against false detections [17].

An update of the color neuronal map is performed in order to adapt the color background model to scene modifications. At each time step t, the weight vectors of $CB_{t-1}$ in a neighborhood of the best matching weight vector $BM_t^C(\mathbf p )$ are updated according to weighted running average. In details, if $BM_t^C(\mathbf p )$ is found at position $\overline{\mathbf{p }}$ in $CB_{t-1}$, then weight vectors of $CB_{t-1}$ are updated according to

$$\begin{aligned} CB_{t}(\mathbf q ) = (1-\alpha ^C_t(\mathbf p )) CB_{t-1}(\mathbf q ) + \alpha ^C_t(\mathbf p ) I_t(\mathbf p ) \; \; \; \forall \mathbf q \in N_{\overline{\mathbf{p }}}, \end{aligned}$$

(7)

where $N_{\overline{\mathbf{p }}}=\left\{ \mathbf q : \left| \overline{\mathbf{p }} - \mathbf q \right| \le k \right\} $ is a 2D spatial neighborhood of $\overline{\mathbf{p }}$ having width (2$k+$ 1) $\in \mathbb {N}$ (in the reported experiments k = 1). Moreover,

$$\begin{aligned} \alpha ^C_t(\mathbf p ) = \gamma \cdot G(\mathbf q -\overline{\mathbf{p }}) \cdot \left( 1 - M_t(\mathbf p ) \right) , \end{aligned}$$

(8)

where $\gamma $ represents the learning rate, $G(\cdot ) = \mathcal{N}(\cdot ; \mathbf 0 , \sigma ^2 I)$ is a 2D Gaussian low-pass filter with zero mean and $\sigma ^2 I$ variance (in the reported experiments $\sigma ^2$ = 0.75). The $\alpha ^C_t(\mathbf p )$ values in Eq. (8) are weights that allow us to smoothly take into account the spatial relationship between current pixel $\mathbf p $ (through its best matching weight vector found at position $\overline{\mathbf{p }}$) and its neighboring pixels in $I_t$ (through weight vectors at position $\mathbf q \in N_{\overline{\mathbf{p }}}$), thus preserving topological properties of the input in the neural network update (close inputs correspond to close outputs). In [18], $M_t(\mathbf p )$ is the background subtraction mask value $M^C_t(\mathbf p )$ for pixel $\mathbf p ,$ computed as in Eq. (5).

In the usual case that a set of K initial sequence frames is available for training, the above described initialization and update procedures on the first K sequence frames are adopted for training the neural network background model, to be used for detection and update in all subsequent sequence frames. What differentiates the training and the online phases in the proposed algorithm is the background subtraction mask $M_t(\mathbf p )$ adopted in Eq. (8), besides the choice of parameters in Eqs. (6) and (8). Indeed, during the online phase, $M_t(\mathbf p )$ is the combined mask value for pixel $\mathbf p $ (see Sect. 2.3):

$$\begin{aligned} M_t(\mathbf p ) = \left\{ \begin{array}{lll} M^C_t(\mathbf p ) &{} &{} \mathrm{if } \; \; \; 1 \le t \le K\\ M^{Comb}_t(\mathbf p ) &{} &{} \mathrm{if } \; \; \; t > K\\ \end{array} \right. , \end{aligned}$$

(9)

in order to exploit depth information for the update of the color background model. The threshold $\varepsilon ^C$ in Eq. (6) is chosen as $\varepsilon ^C\!=\!\varepsilon ^C_1$ during training and $\varepsilon ^C\!=\!\varepsilon ^C_2$ during the online phase, with $\varepsilon ^C_2 \!\le \! \varepsilon ^C_1$, in order to include several observed pixel color variations during training and to obtain a more accurate color background model during the online phase (in the experiments, $\varepsilon ^C_1\!=\!0.1$ and $\varepsilon ^C_2\!=\!0.008$). The learning rate $\gamma $ in Eq. (8) is set as $\gamma \!=\!\gamma _1 - t (\gamma _1-\gamma _2)/K$ during training and as $\gamma \!=\!\gamma _2$ during the online phase, where $\gamma _1$ and $\gamma _2$ are predefined constants such that $\gamma _2\!\le \!\gamma _1$, in order to ensure neural network convergence during the training phase and to adapt to scene variability during the online phase. In order to have in (7) values for $\alpha ^C_t(\mathbf p )$ that belong to [0,1], we set $\gamma _1\!=\!c_1/\max \limits _{{\displaystyle \mathbf q \in N_{\overline{\mathbf{p }}}}} G(\mathbf q -\overline{\mathbf{p }})$ and $\gamma _2\!=\!c_2/\max \limits _{{\displaystyle \mathbf q \in N_{\overline{\mathbf{p }}}}} G(\mathbf q -\overline{\mathbf{p }})$, with $c_1$ and $c_2$ constants such that $0\!\le \!c_2\!\le \!c_1\!\le $ 1 (in the experiments, $c_1\!=\!0.1$ and $c_2\!=\!0.05$). For a deeper explanation of the mathematical ground behind the choice of color model parameters, the interested reader is referred to [18].

2.2 The Depth Model

The neural model adopted for depth information is analogous to the one adopted for color information. Differences are mainly due to the special treatment of invalid values inherent in the depth information acquisition phase.

Given the depth image sequence $\left\{ D_1, \ldots , D_T \right\} $, at each time instant t we build and update a depth neuronal map for each pixel $\mathbf p $. It consists of $n\,\times \,n$ weight vectors $dm_t^{i,j}(\mathbf p ), i,j$ = 0, ..., n - 1, which will be called the depth model for pixel $\mathbf p $ and will be indicated as $DM_t(\mathbf p )$:

$$\begin{aligned} DM_t(\mathbf p ) = \left\{ dm_t^{i,j}(\mathbf p ), \; i,j =0, \ldots , n-1 \right\} . \end{aligned}$$

(10)

Analogously to the case of the color model, the complete set of models $DM_t(\mathbf p )$ for all pixels $\mathbf p $ of the t-th depth frame $D_t$ is organized as a 2D neuronal map $DB_t$ of size $(n\,\times \,N)\,\times \,(n\,\times \,P)$.

For depth model initialization, an estimate DE of the depth scene background is constructed based on the observation that the scene background is generally further away from the camera as compared to the foreground. Therefore, DE is obtained by accumulating, for each pixel, the highest depth value held in the first L depth frames. Then, for each pixel $\mathbf p $, the corresponding weight vectors of the depth model $DM_0(\mathbf p )$ are initialized with the pixel depth value $DE(\mathbf p )$:

$$\begin{aligned} dm_0^{i,j}(\mathbf p ) = DE(\mathbf p ), \; \; \; i,j=0, \ldots , n-1. \end{aligned}$$

(11)

At each time step t, depth background subtraction is achieved by comparing each pixel $\mathbf p $ of the t-th depth frame $D_t$ having valid value with the current pixel depth model $DM_{t-1}(\mathbf p )$, to determine the closest weight vector $BM_t^D(\mathbf p )$:

$$\begin{aligned} |BM_t^D(\mathbf p ) - D_t(\mathbf p )| = \min _{i,j=0, \ldots , n-1} | dm_{t-1}^{i,j}(\mathbf p ) - D_t(\mathbf p ) |. \end{aligned}$$

(12)

The depth background subtraction mask for pixel $\mathbf p $ is then computed as

$$\begin{aligned} M^D_t(\mathbf p ) = \left\{ \begin{array}{lll} 2 &{} &{} \mathrm{if} \; (D_t(\mathbf p ) \; invalid)\\ 0 &{} &{} \mathrm{if} \; (D_t(\mathbf p ) \; valid) \wedge (BM_t^D(\mathbf p ) - D_t(\mathbf p ) \le \varepsilon ^D)\\ 1 &{} &{} \mathrm{otherwise}\\ \end{array}, \right. \end{aligned}$$

(13)

where $\wedge $ denotes the logical AND operator and $\varepsilon ^D$ is a predefined threshold. In the experiments, depth values are normalized in [0,1] and $\varepsilon ^D$ is chosen as $\varepsilon ^D_1\!=\!0.1$ during training and $\varepsilon ^D_2\!=\!0.00075$ for 16bit depth images and 0.005 for 8bit depth images in the online phase. According to Eq. (13), incoming pixels having invalid depth value are signaled in the depth detection mask (being assigned the value 2), so as to be suitably treated in the mask combination step (see Sect. 2.3). Moreover, all pixels that have depth value greater than all weight vectors of their depth model are considered as background pixels (being assigned the value 0). This is in line with the observation that the scene background is generally further away from the camera as compared to the foreground, already exploited in the depth model initialization step.

Depth neuronal map update is also performed in order to adapt the depth background model to scene modifications. At each time step t and for each pixel $\mathbf p $ having valid depth value $D_t(\mathbf p )$, the weight vectors of $DB_{t-1}$ in a neighborhood of a valid best matching weight vector $BM_t^D(\mathbf p )$, found at position $\overline{\mathbf{p }}$ in $DB_{t-1}$, are updated according to

$$\begin{aligned} DB_{t}(\mathbf q ) = (1-\alpha ^D_t(\mathbf p )) DB_{t-1}(\mathbf q ) + \alpha ^D_t(\mathbf p ) D_t(\mathbf p ) \; \; \;\forall \mathbf q \in N_{\overline{\mathbf{p }}}, \end{aligned}$$

(14)

where $\alpha ^D_t(\mathbf p ) = \gamma \cdot G(\mathbf q -\overline{\mathbf{p }}) \cdot \left( 1 - M^D_t(\mathbf p ) \right) ,$ and the remaining notations are defined as those for Eqs. (7) and (8).

Moreover, during training, valid depth values for pixels that in previous frames had invalid values are included into the depth model. Specifically, weight vectors for generic pixel $\mathbf p =(x,y)$ that are still invalid at time t, $1 \le t \le K$, are initialized with valid depth values $D_t(\mathbf p )$

$$\begin{aligned} DB_t(n \!\times \! x + i, n \!\times \! y + j) = dm_t^{i,j}(\mathbf p ) = D_t(\mathbf p ), \; \; \; i,j=0, \ldots , n-1, 1 \le t \le K. \end{aligned}$$

(15)

This leads to learning a more complete depth background model during training. The process is not applied during the online phase, in order to avoid to include into the depth model new valid values that might belong to foreground objects.

2.3 Combining Color and Depth Masks

During online learning, color mask $M^C_t$ and depth mask $M^D_t$ are combined in order to produce a combined mask $M^{Comb}_t$, that is adopted to selectively update the color model (see Eq. (9)). In case of invalid depth values (signaled by $M^D_t(\mathbf p )\!=\!2$ in Eq. (13)), only color mask values are considered; otherwise, depth values are considered. In order to reduce the adverse effect of noisy depth values around the object contours, signaled by setting $M^D_t(\mathbf p )\!=\!3$, color mask values are considered instead in these areas. Thus, the combined mask is computed as

$$\begin{aligned} M^{Comb}_t(\mathbf p ) = \left\{ \begin{array}{lll} M^C_t(\mathbf p ) &{} &{} \mathrm{if } \; \; \; M^D_t(\mathbf p )>1\\ M^D_t(\mathbf p ) &{} &{} \mathrm{if } \; \; \; otherwise\\ \end{array} \right. . \end{aligned}$$

(16)

An example is provided in Fig. 1. Similarly to [10], the object contours are obtained as $dil(\overline{M}^D_t) \!\wedge \! M^D_t$, where $dil(\cdot )$ is the morphological dilation operator with a $3\!\times \!3$ structuring element and $\overline{M}^D_t$ denotes the complement of $M^D_t$.

2.4 The Algorithm

The above described procedure for each pixel can be sketched as the RGBD-SOBS algorithm reported in Fig. 2.

3 Experimental Results

Experimental results have been carried out on the SBM-RGBD dataset [3, 4], consisting of 33 RGBD videos acquired by the Microsoft Kinect, spanning 7 categories that include diverse scene background modelling challenges (see Sect. 1): Illumination Changes (IC), Color Camouflage (CC), Depth Camouflage (DC), Intermittent Motion (IM), Out of sensor Range (OR), Shadows (Sh), and Bootstrapping (Bo).

Table 1. Parameter values adopted for evaluating the RGBD-SOBS algorithm.

Full size table

Parameter values for the RGBD-SOBS algorithm common to all the SBM-RGBD videos are summarized in Table 1. In practice, all default SC-SOBS parameter values [18], well established on CDnet.net [11], have been chosen for the color model, and an analysis analogous to the one reported in [16] has been carried out for choosing the initialization and depth model parameters.

Accuracy is evaluated in terms of seven well-known metrics [3]: Recall (Rec), Specificity (Sp), False Positive Rate (FPR), False Negative Rate (FNR), Percentage of Wrong Classifications (PWC), Precision (Prec), and F-Measure ($F_1$).

Table 2. Average performance results of RGBD-SOBS and RGB-SOBS in each category of the SBM-RGBD dataset. In boldface the best results for each metric.

Full size table

Average performance metrics^{Footnote 2} achieved by the proposed RGBD-SOBS algorithm are reported in Table 2 for all the categories of the SBM-RGBD dataset, showing that, on average, RGBD-SOBS performs quite well. Moreover, comparisons with the RGB-SOBS results (obtained by RGBD-SOBS using only color information) clearly show that the exploitation of depth information helps in achieving much higher performance. This is particularly true for disambiguating color camouflage, as exemplified in the first two rows of Fig. 3. Great improvement is also achieved for the Intermittent Motion category, where depth information can be easily exploited to detect both abandoned and removed objects and, thus, can help driving the update of the color model in a much more consistent way (see Fig. 3, third and fourth rows). Indeed, since selectivity prevents the model update in foreground areas, a correct classification of the removed foreground (e.g., the box that originally was on the floor, as shown in Fig. 4), is essential for achieving an accurate background model and, consequently, accurate detection results.

Other well-known challenges, such as bootstrapping, illumination changes, and shadows, are partly well handled also by RGB-SOBS (see Fig. 5). It should be observed that, although performance results are on average comparable for the Depth Camouflage category, sometimes the strategy for combining color and depth masks (see Sect. 2.3) can lead to results worse that using only color information, as shown in the last row of Fig. 5, suggesting that further work is still needed for accurately handling this challenge.

4 Conclusions and Perspectives

The paper proposes the RGBD-SOBS algorithm for detecting moving objects in RGBD video sequences. Two background models are constructed for color and depth information, exploiting a self-organizing neural background model previously adopted for RGB videos. The resulting color and depth detection masks are combined, not only to achieve the final results, but also to better guide the selective model update procedure. The evaluation of the algorithm on the SBM-RGBD dataset shows that the exploitation of depth information helps in achieving much higher performance than just using color. This is true not only for sequences showing color camouflage, but also for those including many other color and depth background maintenance challenges (e.g., intermittent motion, bootstrapping, and out of sensor range data). Further work will be devoted to specifically handling the depth camouflage challenge, for which only fair results are achieved by the proposed method.

Notes

1.
http://SceneBackgroundModeling.net.
2.
Performance values on each video of the SBM-RGBD dataset are available at http://www.na.icar.cnr.it/~maddalena.l/MODLab/RGBD-SOBS/results.html.

References

Bouwmans, T.: Traditional and recent approaches in background modeling for foreground detection: an overview. Comput. Sci. Rev. 11, 31–66 (2014)
Article MATH Google Scholar
Bouwmans, T., Maddalena, L., Petrosino, A.: Scene background initialization: a taxonomy. Pattern Recogn. Lett. 96, 3–11 (2017)
Article Google Scholar
Camplani, M., Maddalena, L., Alcover, G.M., Petrosino, A., Salgado, L.: SBM-RGBD Dataset. http://rgbd2017.na.icar.cnr.it/SBM-RGBDdataset.html
Camplani, M., Maddalena, L., Moyà Alcover, G., Petrosino, A., Salgado, L.: A benchmarking framework for background subtraction in RGBD videos. In: Battiato, S., Gallo, G., Farinella, G., Leo, M. (eds.) New Trends in Image Analysis and Processing-ICIAP 2017 Workshops. LNCS. Springer, Heidelberg (2017)
Google Scholar
Camplani, M., Salgado, L.: Background foreground segmentation with RGB-D Kinect data: an efficient combination of classifiers. J. Vis. Commun. Image Represent. 25(1), 122–136 (2014)
Article Google Scholar
Cuevas, C., Martínez, R., García, N.: Detection of stationary foreground objects: a survey. Comput. Vis. Image Underst. 152, 41–57 (2016)
Article Google Scholar
Ding, J., Ma, R., Chen, S.: A scale-based connected coherence tree algorithm for image segmentation. IEEE Trans. Image Process. 17(2), 204–216 (2008)
Article MathSciNet Google Scholar
Fernandez-Sanchez, E.J., Rubio, L., Diaz, J., Ros, E.: Background subtraction model based on color and depth cues. Mach. Vis. Appl. 25(5), 1211–1225 (2014)
Article Google Scholar
Frick, A., Kellner, F., Bartczak, B., Koch, R.: Generation of 3D-TV LDV-content with time-of-flight camera. In: 3DTV Conference: The True Vision - Capture, Transmission and Display of 3D Video, pp. 1–4, May 2009
Google Scholar
Gallego, J., Pardás, M.: Region based foreground segmentation combining color and depth sensors via logarithmic opinion pool decision. J. Vis. Commun. Image Represent. 25(1), 184–194 (2014)
Article Google Scholar
Goyette, N., Jodoin, P.M., Porikli, F., Konrad, J., Ishwar, P.: Changedetection.net: a new change detection benchmark dataset. In: 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1–8, June 2012
Google Scholar
Harville, M., Gordon, G., Woodfill, J.: Foreground segmentation using adaptive mixture models in color and depth. In: Proceedings of IEEE Workshop on Detection and Recognition of Events in Video, pp. 3–11 (2001)
Google Scholar
Huang, J., Wu, H., Gong, Y., Gao, D.: Random sampling-based background subtraction with adaptive multi-cue fusion in RGBD videos. In: International Congress on Image and Signal Processing, BioMedical Engineering and Informatics, pp. 30–35 (2016)
Google Scholar
Laugraud, B., Piérard, S., Braham, M., Van Droogenbroeck, M.: Simple median-based method for stationary background generation using background subtraction algorithms. In: Murino, V., Puppo, E., Sona, D., Cristani, M., Sansone, C. (eds.) ICIAP 2015. LNCS, vol. 9281, pp. 477–484. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23222-5_58
Chapter Google Scholar
Liang, Z., Liu, X., Liu, H., Chen, W.: A refinement framework for background subtraction based on color and depth data. In: 2016 IEEE International Conference on Image Processing (ICIP), pp. 271–275, September 2016
Google Scholar
Maddalena, L., Petrosino, A.: A self-organizing approach to background subtraction for visual surveillance applications. IEEE Trans. Image Process. 17(7), 1168–1177 (2008)
Article MathSciNet Google Scholar
Maddalena, L., Petrosino, A.: A fuzzy spatial coherence-based approach to background/foreground separation for moving object detection. Neural Comput. Appl. 19, 179–186 (2010)
Article Google Scholar
Maddalena, L., Petrosino, A.: The SOBS algorithm: what are the limits? In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 21–26, June 2012
Google Scholar
Shah, M., Deng, J.D., Woodford, B.J.: Video background modeling: recent approaches, issues and our proposed techniques. Mach. Vis. Appl. 25(5), 1105–1119 (2014)
Article Google Scholar
Stormer, A., Hofmann, M., Rigoll, G.: Depth gradient based segmentation of overlapping foreground objects in range images. In: International Conference on Information Fusion, pp. 1–4, July 2010
Google Scholar
Xu, Y., Dong, J., Zhang, B., Xu, D.: Background modeling methods in video analysis: a review and comparative evaluation. CAAI Trans. Intell. Tech. 1(1), 43–60 (2016)
Article Google Scholar

Download references

Acknowledgments

L. Maddalena wishes to acknowledge the GNCS (Gruppo Nazionale di Calcolo Scientifico) and the INTEROMICS Flagship Project funded by MIUR, Italy. A. Petrosino acknowledges Project VIRTUALOG Horizon 2020-PON 2014/2020.

Author information

Authors and Affiliations

National Research Council, Naples, Italy
Lucia Maddalena
University of Naples Parthenope, Naples, Italy
Alfredo Petrosino

Authors

Lucia Maddalena
View author publications
You can also search for this author in PubMed Google Scholar
Alfredo Petrosino
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lucia Maddalena .

Editor information

Editors and Affiliations

University of Catania, Catania, Italy
Sebastiano Battiato
University of Catania, Catania, Italy
Giovanni Maria Farinella
University of Catania, Catania, Italy
Marco Leo
University of Catania, Catania, Italy
Giovanni Gallo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Maddalena, L., Petrosino, A. (2017). Exploiting Color and Depth for Background Subtraction. In: Battiato, S., Farinella, G., Leo, M., Gallo, G. (eds) New Trends in Image Analysis and Processing – ICIAP 2017. ICIAP 2017. Lecture Notes in Computer Science(), vol 10590. Springer, Cham. https://doi.org/10.1007/978-3-319-70742-6_24

Download citation

DOI: https://doi.org/10.1007/978-3-319-70742-6_24
Published: 31 December 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-70741-9
Online ISBN: 978-3-319-70742-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)