On-line EM Variants for Multivariate Normal Mixture Model in Background Learning and Moving Foreground Detection

Li, Dawei; Xu, Lihong; Goodman, Erik

doi:10.1007/s10851-012-0403-6

On-line EM Variants for Multivariate Normal Mixture Model in Background Learning and Moving Foreground Detection

Published: 16 November 2012

Volume 48, pages 114–133, (2014)
Cite this article

Journal of Mathematical Imaging and Vision Aims and scope Submit manuscript

Dawei Li¹,
Lihong Xu¹ &
Erik Goodman²

575 Accesses
4 Citations
Explore all metrics

Abstract

The unsupervised learning of multivariate mixture models from on-line data streams has attracted the attention of researchers for its usefulness in real-time intelligent learning systems. The EM algorithm is an ideal choice for iteratively obtaining maximum likelihood estimation of parameters in presumable finite mixtures, comparing to some popular numerical methods. However, the original EM is a batch algorithm that works only on fixed datasets. To endow the EM algorithm with the capability to process streaming data, two on-line variants are studied, including Titterington’s method and a sufficient statistics-based method. We first prove that the two on-line EM variants are theoretically feasible for training the multivariate normal mixture model by showing that the model belongs to the exponential family. Afterward, the two on-line learning schemes for multivariate normal mixtures are applied to the problems of background learning and moving foreground detection. Experiments show that the two on-line EM variants can efficiently update the parameters of the mixture model and are capable of generating reliable backgrounds for moving foreground detection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey on semi-supervised learning

Article Open access 15 November 2019

Jesper E. van Engelen & Holger H. Hoos

Learning from imbalanced data: open challenges and future directions

Article Open access 22 April 2016

Bartosz Krawczyk

Supervised Classification Algorithms in Machine Learning: A Survey and Review

References

Figueiredo, M.A.T., Jain, A.K.: Unsupervised learning of finite mixture models. IEEE Trans. Pattern Anal. Mach. Intell. 24(3), 381–396 (2002)
Article Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc., Ser. B, Stat. Methodol. 39(1), 1–38 (1977)
MATH MathSciNet Google Scholar
Wolfe, J.H.: Pattern clustering by multivariate mixture analysis. Multivar. Behav. Res. 5, 329–350 (1970)
Article Google Scholar
Titterington, D.M.: Recursive parameter estimation using incomplete data. J. R. Stat. Soc., Ser. B, Methodol. 46(2), 257–267 (1984)
MATH MathSciNet Google Scholar
Fabian, V.: On asymptotically efficient recursive estimation. Ann. Stat. 6, 854–866 (1978)
Article MATH MathSciNet Google Scholar
Zivkovic, Z., van der Heijden, F.: Recursive unsupervised learning of finite mixture models. IEEE Trans. Pattern Anal. Mach. Intell. 26(5), 651–656 (2004)
Article Google Scholar
Neal, R., Hinton, G.: A view of the EM algorithm that justifies incremental, sparse, and other variants. In: Jordan, M.I. (ed.) Learning in Graphical Models, pp. 355–368. Kluwer Academic, Dordrecht (1998)
Chapter Google Scholar
Cappe, O., Moulines, E.: On-line expectation-maximization algorithm for latent data models. J. R. Stat. Soc., Ser. B, Methodol. 71, 593–613 (2009)
Article MATH MathSciNet Google Scholar
Arcidiacono, P., Jones, J.B.: Finite mixture distributions, sequential likelihood and the EM algorithm. Econometrica 71(3), 933–946 (2003)
Article MATH MathSciNet Google Scholar
Xu, L., Jordan, M.: On convergence properties of the EM algorithm for Gaussian mixtures. Neural Comput. 8, 129–151 (1996)
Article Google Scholar
Redner, R.A., Walker, H.F.: Mixture densities, maximum likelihood, and the EM algorithm. SIAM Rev. 26, 195–239 (1984)
Article MATH MathSciNet Google Scholar
Sato, M., Ishii, S.: On-line EM algorithm for the normalized Gaussian network. Neural Comput. 12, 407–432 (2000)
Article Google Scholar
Markov, U.E., Smith, A.F.M.: A quasi-Bayes unsupervised learning procedure for priors. IEEE Trans. Inf. Theory 23(6), 761–764 (1977)
Article Google Scholar
Smith, A.F.M., Markov, U.E.: A quasi-Bayes sequential procedure for mixtures. J. R. Stat. Soc., Ser. B, Methodol. 40, 106–112 (1978)
MATH Google Scholar
Wang, S., Zhao, Y.: Almost sure convergence of Titterington’s recursive estimator for mixture models. Stat. Probab. Lett. 76, 2001–2006 (2006)
Article MATH Google Scholar
Wren, C.R., Azarbayejani, A., Darrell, T., Pentland, A.P.: Pfinder: real-time tracking of the human body. IEEE Trans. Pattern Anal. Mach. Intell. 19(7), 780–785 (1997)
Article Google Scholar
Stauffer, C., Grimson, W.E.: Adaptive background mixture models for realtime tracking. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit, pp. 246–252 (1999)
Google Scholar
Stauffer, C., Grimson, W.E.: Learning patterns of activity using real-time tracking. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 747–757 (2000)
Article Google Scholar
Zivkovic, Z.: Improved adaptive Gaussian mixture model for background subtraction. In: Proc. 17th Int. Conf. Pattern Recognition, pp. 28–31 (2004)
Google Scholar
Li, D., Xu, L., Goodman, E.: On-line background learning for illumination-robust foreground detection. In: Proc. 11th ICARCV, pp. 1093–1100 (2010)
Google Scholar
Available at: http://www.cvg.rdg.ac.uk/slides/pets.html
Goyette, N., Jodoin, P.-M., Porikli, F., Konrad, J., Ishwar, P.: Changedetection.net: a new change detection benchmark dataset. In: Proc. IEEE Workshop on Change Detection (CDW’12) at CVPR’12, pp. 1–8 (2012)
Google Scholar
Cheng, J., Yang, J., Zhou, Y., Cui, Y.: Flexible background mixture models for foreground segmentation. Image Vis. Comput. 24, 473–482 (2006)
Article Google Scholar
Kim, K., Chalidabhongse, T.H., Harwood, D., Davis, L.: Real-time foreground-background segmentation using codebook model. Real-Time Imaging 11(3), 167–256 (2005)
Article Google Scholar
Chang, F., Chen, C., Lu, C.: A linear-time component-labeling algorithm using contour tracing technique. Comput. Vis. Image Underst. 93(2), 206–220 (2004)
Article Google Scholar
Joo, S., Chellappa, R.: A multiple-hypothesis approach for multiobject visual tracking. IEEE Trans. Image Process. 16(11), 2849–2854 (2007)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Control Science and Engineering, Tongji University, Shanghai, 200092, China
Dawei Li & Lihong Xu
BEACON Center, Michigan State University, East Lansing, 48824, MI, USA
Erik Goodman

Authors

Dawei Li
View author publications
You can also search for this author in PubMed Google Scholar
Lihong Xu
View author publications
You can also search for this author in PubMed Google Scholar
Erik Goodman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lihong Xu.

Appendices

Appendix A: Proof of Theorem 1

The complete-data density (11) can be transformed into:

(45)

Decomposing (45) into the following setting:

$$ \mathbf{s}_{j}(\mathbf{x}) = \bigl[ \delta_{j},\delta_{j},\delta_{j}\mathit{vec} \bigl( \mathbf{yy}^{T} \bigr),\delta_{j}\mathbf{y}, \delta_{j} \bigr]^{T}, $$

(46)

(47)

$$ c ( \mathbf{x} ) = \sum_{j = 1}^{K} - \frac{m}{2}\delta_{j}\log ( 2\pi ) , $$

(48)

$$ b ( \boldsymbol{\theta} ) = 0 . $$

(49)

Thus, according to Definition 2, it is obvious that f(x|θ) defined in (11) belongs to the exponential family.

Appendix B: The Conditional Expectation of Score Functions

According to the definition of conditional expectation:

$$ \mathbb{E}_{\boldsymbol{\theta }} [ \mathbf{v} ] = \int\frac{1}{L(\boldsymbol{\theta} |\mathbf{y})} \cdot \frac{\partial L(\boldsymbol{\theta} |\mathbf{y})}{\partial \boldsymbol{\theta}} L(\boldsymbol{\theta} |\mathbf{y})\,dy . $$

(50)

The derivative with respect to θ is independent of the integral on y, so they are exchangeable, and we have:

(51)

Appendix C: Proof of Theorem 2

According to (11), the logarithm of the complete-data density is:

$$ \log f( \mathbf{x}|\boldsymbol{\theta} ) = \sum _{j = 1}^{K} \delta_{j}\log \omega_{j}p_{j}( \mathbf{y}|\boldsymbol{\theta}_{j} ) . $$

(52)

The parameters to be estimated are ω _j, μ _j and C _j, which are mutually independent. Therefore the FIM of θ has diagonal form, and we are going to derive the components of I _c(θ) separately in the following three parts.

3.1 C.1 Derivation of Titterington’s Equation for ω _j

To compute I _c(ω), without loss of generality, we take ω ₁ for example. By using the statistical constraint $\sum_{j = 1}^{K} \omega_{j} = 1$ and abbreviating $p_{j}( \mathbf{y|\theta}_{j} )$ to p _j, Eq. (52) can be rewritten into two forms:

(53)

and

(54)

By taking derivative of ω ₁ on (53) and (54), we obtain:

$$ \mathbf{v}_{c}(\mathbf{y},\omega_{1}) = \frac{\partial \log f( \mathbf{x}|\boldsymbol{\theta} )}{\partial \omega_{1}} = \frac{\delta_{1}}{\omega_{1}} - \frac{\delta_{2}}{\omega_{2}} = \frac{\delta_{1}}{\omega_{1}} - \frac{\delta_{3}}{\omega_{3}}. $$

(55)

The Kronecker Delta δ _j is a discrete random variable with the distribution listed in Table 2.

Table 2 The discrete probability distribution of δ _j

Full size table

From Table 2 we have $\mathbb{E}_{\boldsymbol{\theta }}[ \mathbf{v}_{c}(\mathbf{y},\omega_{1}) ] = 0$, which satisfies the property of scores in (17). Then I _c(ω ₁) is constructed as:

Considering the generality of (53), (54) and (55), we have:

$$ \mathbf{I}_{c}(\omega_{j}) = \frac{1}{\omega_{j}}. $$

(56)

From (56) we can see I _c(ω _j) is a scalar. In order to obtain the incomplete-data score function v _g(y,ω _j), we modify the incomplete-data density g(y|θ) by adding a zero term which is analogous to the way we construct a Lagrange function; this new modified incomplete-data density is denoted by g _s(y|θ), and its value is always equal to g(y|θ):

(57)

Then it can be derived that:

(58)

According to the properties of a probability density function, the expectation of (58) is computed as:

which also satisfies (17). Now by introducing (56) and (58) to the recursive Eq. (13), we obtain the updating equation for ω _j formulated in (21).

3.2 C.2 Derivation of Titterington’s Equation for μ _j

For updating the mean vectors, I _c(μ) should be obtained first. By taking the first derivative of the μ _j on (52):

(59)

It is obvious that $\mathbb{E}_{\boldsymbol{\theta }}[ \mathbf{v}_{c}(\mathbf{y},\boldsymbol{\mu}_{j}) ] = 0$, which satisfies (17). Then I _c(μ _j) is given by:

(60)

Then the score function v _g(y,μ _j) is constructed as follows:

(61)

The expectation of (61) is,

(62)

which also satisfies (17), and by combining the results of (60) and (61) into (13), we obtain the recursive estimation of μ, given in (22).

3.3 C.3 Derivation of Titterington’s Equation for C _j

As the behavior of Titterington’s method is not clear when given score functions with respect to a matrix, the generalization of Titterington’s method for estimating parameters in matrix form can be problematic. Moreover, we have discovered that although the scores v _g(y,C _j) and v _c(y,C _j) satisfy (17), I _c(C _j) cannot be directly computed (see Sect. 3.1). Therefore we make a simplification on C _j by assuming any two elements of y are mutually independent. In this case, C _j becomes diagonal. Define y=[y ₁,…,y _m]^T,μ _j=[μ _j1,…,μ _jm]^T, and C _j is given as:

$$ \mathbf{C}_{j} = \left [ \begin{array}{c@{\quad}c@{\quad}c} \phi_{j1} & & \\ & \ddots & \\ & & \phi_{jm} \end{array} \right ] . $$

(63)

Now p _j becomes a multiplication of m independent univariate normals.

$$ p_{j} = \prod_{i = 1}^{m} \frac{1}{\sqrt{2\pi \phi_{ji}}} \exp\biggl[ - \frac{( y_{i} - \mu_{ji} )^{2}}{2\phi_{ji}} \biggr]. $$

(64)

Then we have:

(65)

It is easy to verify $\mathbb{E}_{\boldsymbol{\theta }}[ \mathbf{v}_{c}(\mathbf{y},\phi_{ji}) ] = 0$ and then we compute the FIM of ϕ _ji as:

(66)

It is known that $( y_{i} - \mu_{ji} ) / \sqrt{\phi_{ji}} \sim\mathcal{N}( 0,1 )$, in which $\mathcal{N}( 0,1 )$ is the standard univariate normal density function. By defining $z = ( y_{i} - \mu_{ji} ) / \sqrt{\phi_{ji}}$, then z ² obeys the χ ²-distribution with 1 degree of freedom. According to the properties of the χ ²-distribution, $\mathbb{E}( z^{2} ) = 1$ and $\mathbb{D}( z^{2} ) = 2$, thus:

$$\mathbb{E} \bigl( z^{4} \bigr) = \mathbb{D} \bigl( z^{2} \bigr) + \mathbb{E}^{2} \bigl( z^{2} \bigr) = 3 . $$

Therefore,

$$ \left \{ \begin{array}{l} \mathbb{E}[ ( y_{i} - \mu_{ji} )^{4} ] =3\phi_{ji}^{2}, \\ \mathbb{E}[ ( y_{i} - \mu_{ji} )^{2} ] = \phi_{ji}. \\ \end{array} \right . $$

(67)

Introducing (67) to (66):

$$ \mathbf{I}_{c}(\phi_{ji}) = \frac{\omega_{j}}{2\phi_{ji}^{2}} . $$

(68)

We also derive that:

(69)

Note that the expectation of v _g(y,ϕ _ji) also satisfies (17). Now by combining (68) and (69) and placing ϕ _ji on the diagonal of C _j, we obtain

(70)

which is actually equivalent to (23) and the proof of Theorem 2 is complete.

Appendix D: Proof of Theorem 3

By removing the redundant elements in the sufficient statistic vector formulated in (51) for the complete-data likelihood function of a multivariate normal mixture, the sufficient statistic vector, s _j(x) which comprises three elements for the jth member of the mixture, is given by:

$$ \mathbf{s}_{j}(\mathbf{x}) = \bigl[ \delta_{j},\delta_{j}\mathbf{y},\delta_{j} \mathbf{yy}^{T} \bigr]^{T} = \bigl[ s_{j,1}( \mathbf{x}),s_{j,2}(\mathbf{x}),s_{j,3}(\mathbf{x}) \bigr]^{T} . $$

(71)

Then we can obtain the conditional expected values of the three statistics under a batch setting:

(72)

(73)

(74)

By introducing the above equations into the original batch EM recursions (7)–(9) to represent the parameters of a multivariate normal mixture, we obtain:

(75)

(76)

and for C _j we have:

By introducing (76) to replace μ _j with sufficient statistics,

$$ \mathbf{C}_{j} = \frac{\overline{s_{j,3}}(\mathbf{x})}{\overline{s_{j,1}}(\mathbf{x})} - \frac{\overline{s_{j,2}}(\mathbf{x})\overline{s_{j,2}}^{T}(\mathbf{x})}{\overline{s_{j,1}}(\mathbf{x})\overline{s_{j,1}}(\mathbf{x})}. $$

(77)

The parametric representation for sufficient statistics is easy to derive from (75)–(77):

(78)

(79)

(80)

Equations (78)–(80) are the relation between the current estimated parameters and the current sufficient statistics. Then the exponential forgetting technique is applied to update the three sufficient statistics:

(81)

(82)

(83)

Recalling (75)–(77), at iteration k+1 we have,

(84)

(85)

(86)

Considering (78)–(80), we approximate $s_{j,1}^{(k + 1)}$ with $\omega_{j}^{(k + 1)}$, and $s_{j,2}^{(k + 1)}$ with $\omega_{j}^{(k + 1)}\boldsymbol{\mu}_{j}^{(k + 1)}$. After we substitute all conditional expected values of the three statistics by the parameters’ values at epoch k and k+1, (40)–(42) are obtained and the proof is completed. A similar proof for applying this on-line EM algorithm to a mixture model of Poisson distributions is given in [8].

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, D., Xu, L. & Goodman, E. On-line EM Variants for Multivariate Normal Mixture Model in Background Learning and Moving Foreground Detection. J Math Imaging Vis 48, 114–133 (2014). https://doi.org/10.1007/s10851-012-0403-6

Download citation

Published: 16 November 2012
Issue Date: January 2014
DOI: https://doi.org/10.1007/s10851-012-0403-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On-line EM Variants for Multivariate Normal Mixture Model in Background Learning and Moving Foreground Detection

Abstract

Access this article