Skip to main content
Log in

On-line EM Variants for Multivariate Normal Mixture Model in Background Learning and Moving Foreground Detection

  • Published:
Journal of Mathematical Imaging and Vision Aims and scope Submit manuscript

Abstract

The unsupervised learning of multivariate mixture models from on-line data streams has attracted the attention of researchers for its usefulness in real-time intelligent learning systems. The EM algorithm is an ideal choice for iteratively obtaining maximum likelihood estimation of parameters in presumable finite mixtures, comparing to some popular numerical methods. However, the original EM is a batch algorithm that works only on fixed datasets. To endow the EM algorithm with the capability to process streaming data, two on-line variants are studied, including Titterington’s method and a sufficient statistics-based method. We first prove that the two on-line EM variants are theoretically feasible for training the multivariate normal mixture model by showing that the model belongs to the exponential family. Afterward, the two on-line learning schemes for multivariate normal mixtures are applied to the problems of background learning and moving foreground detection. Experiments show that the two on-line EM variants can efficiently update the parameters of the mixture model and are capable of generating reliable backgrounds for moving foreground detection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  1. Figueiredo, M.A.T., Jain, A.K.: Unsupervised learning of finite mixture models. IEEE Trans. Pattern Anal. Mach. Intell. 24(3), 381–396 (2002)

    Article  Google Scholar 

  2. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc., Ser. B, Stat. Methodol. 39(1), 1–38 (1977)

    MATH  MathSciNet  Google Scholar 

  3. Wolfe, J.H.: Pattern clustering by multivariate mixture analysis. Multivar. Behav. Res. 5, 329–350 (1970)

    Article  Google Scholar 

  4. Titterington, D.M.: Recursive parameter estimation using incomplete data. J. R. Stat. Soc., Ser. B, Methodol. 46(2), 257–267 (1984)

    MATH  MathSciNet  Google Scholar 

  5. Fabian, V.: On asymptotically efficient recursive estimation. Ann. Stat. 6, 854–866 (1978)

    Article  MATH  MathSciNet  Google Scholar 

  6. Zivkovic, Z., van der Heijden, F.: Recursive unsupervised learning of finite mixture models. IEEE Trans. Pattern Anal. Mach. Intell. 26(5), 651–656 (2004)

    Article  Google Scholar 

  7. Neal, R., Hinton, G.: A view of the EM algorithm that justifies incremental, sparse, and other variants. In: Jordan, M.I. (ed.) Learning in Graphical Models, pp. 355–368. Kluwer Academic, Dordrecht (1998)

    Chapter  Google Scholar 

  8. Cappe, O., Moulines, E.: On-line expectation-maximization algorithm for latent data models. J. R. Stat. Soc., Ser. B, Methodol. 71, 593–613 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  9. Arcidiacono, P., Jones, J.B.: Finite mixture distributions, sequential likelihood and the EM algorithm. Econometrica 71(3), 933–946 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  10. Xu, L., Jordan, M.: On convergence properties of the EM algorithm for Gaussian mixtures. Neural Comput. 8, 129–151 (1996)

    Article  Google Scholar 

  11. Redner, R.A., Walker, H.F.: Mixture densities, maximum likelihood, and the EM algorithm. SIAM Rev. 26, 195–239 (1984)

    Article  MATH  MathSciNet  Google Scholar 

  12. Sato, M., Ishii, S.: On-line EM algorithm for the normalized Gaussian network. Neural Comput. 12, 407–432 (2000)

    Article  Google Scholar 

  13. Markov, U.E., Smith, A.F.M.: A quasi-Bayes unsupervised learning procedure for priors. IEEE Trans. Inf. Theory 23(6), 761–764 (1977)

    Article  Google Scholar 

  14. Smith, A.F.M., Markov, U.E.: A quasi-Bayes sequential procedure for mixtures. J. R. Stat. Soc., Ser. B, Methodol. 40, 106–112 (1978)

    MATH  Google Scholar 

  15. Wang, S., Zhao, Y.: Almost sure convergence of Titterington’s recursive estimator for mixture models. Stat. Probab. Lett. 76, 2001–2006 (2006)

    Article  MATH  Google Scholar 

  16. Wren, C.R., Azarbayejani, A., Darrell, T., Pentland, A.P.: Pfinder: real-time tracking of the human body. IEEE Trans. Pattern Anal. Mach. Intell. 19(7), 780–785 (1997)

    Article  Google Scholar 

  17. Stauffer, C., Grimson, W.E.: Adaptive background mixture models for realtime tracking. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit, pp. 246–252 (1999)

    Google Scholar 

  18. Stauffer, C., Grimson, W.E.: Learning patterns of activity using real-time tracking. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 747–757 (2000)

    Article  Google Scholar 

  19. Zivkovic, Z.: Improved adaptive Gaussian mixture model for background subtraction. In: Proc. 17th Int. Conf. Pattern Recognition, pp. 28–31 (2004)

    Google Scholar 

  20. Li, D., Xu, L., Goodman, E.: On-line background learning for illumination-robust foreground detection. In: Proc. 11th ICARCV, pp. 1093–1100 (2010)

    Google Scholar 

  21. Available at: http://www.cvg.rdg.ac.uk/slides/pets.html

  22. Goyette, N., Jodoin, P.-M., Porikli, F., Konrad, J., Ishwar, P.: Changedetection.net: a new change detection benchmark dataset. In: Proc. IEEE Workshop on Change Detection (CDW’12) at CVPR’12, pp. 1–8 (2012)

    Google Scholar 

  23. Cheng, J., Yang, J., Zhou, Y., Cui, Y.: Flexible background mixture models for foreground segmentation. Image Vis. Comput. 24, 473–482 (2006)

    Article  Google Scholar 

  24. Kim, K., Chalidabhongse, T.H., Harwood, D., Davis, L.: Real-time foreground-background segmentation using codebook model. Real-Time Imaging 11(3), 167–256 (2005)

    Article  Google Scholar 

  25. Chang, F., Chen, C., Lu, C.: A linear-time component-labeling algorithm using contour tracing technique. Comput. Vis. Image Underst. 93(2), 206–220 (2004)

    Article  Google Scholar 

  26. Joo, S., Chellappa, R.: A multiple-hypothesis approach for multiobject visual tracking. IEEE Trans. Image Process. 16(11), 2849–2854 (2007)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lihong Xu.

Appendices

Appendix A: Proof of Theorem 1

The complete-data density (11) can be transformed into:

(45)

Decomposing (45) into the following setting:

$$ \mathbf{s}_{j}(\mathbf{x}) = \bigl[ \delta_{j},\delta_{j},\delta_{j}\mathit{vec} \bigl( \mathbf{yy}^{T} \bigr),\delta_{j}\mathbf{y}, \delta_{j} \bigr]^{T}, $$
(46)
(47)
$$ c ( \mathbf{x} ) = \sum_{j = 1}^{K} - \frac{m}{2}\delta_{j}\log ( 2\pi ) , $$
(48)
$$ b ( \boldsymbol{\theta} ) = 0 . $$
(49)

Thus, according to Definition 2, it is obvious that f(x|θ) defined in (11) belongs to the exponential family.

Appendix B: The Conditional Expectation of Score Functions

According to the definition of conditional expectation:

$$ \mathbb{E}_{\boldsymbol{\theta }} [ \mathbf{v} ] = \int\frac{1}{L(\boldsymbol{\theta} |\mathbf{y})} \cdot \frac{\partial L(\boldsymbol{\theta} |\mathbf{y})}{\partial \boldsymbol{\theta}} L(\boldsymbol{\theta} |\mathbf{y})\,dy . $$
(50)

The derivative with respect to θ is independent of the integral on y, so they are exchangeable, and we have:

(51)

Appendix C: Proof of Theorem 2

According to (11), the logarithm of the complete-data density is:

$$ \log f( \mathbf{x}|\boldsymbol{\theta} ) = \sum _{j = 1}^{K} \delta_{j}\log \omega_{j}p_{j}( \mathbf{y}|\boldsymbol{\theta}_{j} ) . $$
(52)

The parameters to be estimated are ω j , μ j and C j , which are mutually independent. Therefore the FIM of θ has diagonal form, and we are going to derive the components of I c (θ) separately in the following three parts.

3.1 C.1 Derivation of Titterington’s Equation for ω j

To compute I c (ω), without loss of generality, we take ω 1 for example. By using the statistical constraint \(\sum_{j = 1}^{K} \omega_{j} = 1\) and abbreviating \(p_{j}( \mathbf{y|\theta}_{j} )\) to p j , Eq. (52) can be rewritten into two forms:

(53)

and

(54)

By taking derivative of ω 1 on (53) and (54), we obtain:

$$ \mathbf{v}_{c}(\mathbf{y},\omega_{1}) = \frac{\partial \log f( \mathbf{x}|\boldsymbol{\theta} )}{\partial \omega_{1}} = \frac{\delta_{1}}{\omega_{1}} - \frac{\delta_{2}}{\omega_{2}} = \frac{\delta_{1}}{\omega_{1}} - \frac{\delta_{3}}{\omega_{3}}. $$
(55)

The Kronecker Delta δ j is a discrete random variable with the distribution listed in Table 2.

Table 2 The discrete probability distribution of δ j

From Table 2 we have \(\mathbb{E}_{\boldsymbol{\theta }}[ \mathbf{v}_{c}(\mathbf{y},\omega_{1}) ] = 0\), which satisfies the property of scores in (17). Then I c (ω 1) is constructed as:

Considering the generality of (53), (54) and (55), we have:

$$ \mathbf{I}_{c}(\omega_{j}) = \frac{1}{\omega_{j}}. $$
(56)

From (56) we can see I c (ω j ) is a scalar. In order to obtain the incomplete-data score function v g (y,ω j ), we modify the incomplete-data density g(y|θ) by adding a zero term which is analogous to the way we construct a Lagrange function; this new modified incomplete-data density is denoted by g s (y|θ), and its value is always equal to g(y|θ):

(57)

Then it can be derived that:

(58)

According to the properties of a probability density function, the expectation of (58) is computed as:

which also satisfies (17). Now by introducing (56) and (58) to the recursive Eq. (13), we obtain the updating equation for ω j formulated in (21).

3.2 C.2 Derivation of Titterington’s Equation for μ j

For updating the mean vectors, I c (μ) should be obtained first. By taking the first derivative of the μ j on (52):

(59)

It is obvious that \(\mathbb{E}_{\boldsymbol{\theta }}[ \mathbf{v}_{c}(\mathbf{y},\boldsymbol{\mu}_{j}) ] = 0\), which satisfies (17). Then I c (μ j ) is given by:

(60)

Then the score function v g (y,μ j ) is constructed as follows:

(61)

The expectation of (61) is,

(62)

which also satisfies (17), and by combining the results of (60) and (61) into (13), we obtain the recursive estimation of μ, given in (22).

3.3 C.3 Derivation of Titterington’s Equation for C j

As the behavior of Titterington’s method is not clear when given score functions with respect to a matrix, the generalization of Titterington’s method for estimating parameters in matrix form can be problematic. Moreover, we have discovered that although the scores v g (y,C j ) and v c (y,C j ) satisfy (17), I c (C j ) cannot be directly computed (see Sect. 3.1). Therefore we make a simplification on C j by assuming any two elements of y are mutually independent. In this case, C j becomes diagonal. Define y=[y 1,…,y m ]T,μ j =[μ j1,…,μ jm ]T, and C j is given as:

$$ \mathbf{C}_{j} = \left [ \begin{array}{c@{\quad}c@{\quad}c} \phi_{j1} & & \\ & \ddots & \\ & & \phi_{jm} \end{array} \right ] . $$
(63)

Now p j becomes a multiplication of m independent univariate normals.

$$ p_{j} = \prod_{i = 1}^{m} \frac{1}{\sqrt{2\pi \phi_{ji}}} \exp\biggl[ - \frac{( y_{i} - \mu_{ji} )^{2}}{2\phi_{ji}} \biggr]. $$
(64)

Then we have:

(65)

It is easy to verify \(\mathbb{E}_{\boldsymbol{\theta }}[ \mathbf{v}_{c}(\mathbf{y},\phi_{ji}) ] = 0\) and then we compute the FIM of ϕ ji as:

(66)

It is known that \(( y_{i} - \mu_{ji} ) / \sqrt{\phi_{ji}} \sim\mathcal{N}( 0,1 )\), in which \(\mathcal{N}( 0,1 )\) is the standard univariate normal density function. By defining \(z = ( y_{i} - \mu_{ji} ) / \sqrt{\phi_{ji}}\), then z 2 obeys the χ 2-distribution with 1 degree of freedom. According to the properties of the χ 2-distribution, \(\mathbb{E}( z^{2} ) = 1\) and \(\mathbb{D}( z^{2} ) = 2\), thus:

$$\mathbb{E} \bigl( z^{4} \bigr) = \mathbb{D} \bigl( z^{2} \bigr) + \mathbb{E}^{2} \bigl( z^{2} \bigr) = 3 . $$

Therefore,

$$ \left \{ \begin{array}{l} \mathbb{E}[ ( y_{i} - \mu_{ji} )^{4} ] =3\phi_{ji}^{2}, \\ \mathbb{E}[ ( y_{i} - \mu_{ji} )^{2} ] = \phi_{ji}. \\ \end{array} \right . $$
(67)

Introducing (67) to (66):

$$ \mathbf{I}_{c}(\phi_{ji}) = \frac{\omega_{j}}{2\phi_{ji}^{2}} . $$
(68)

We also derive that:

(69)

Note that the expectation of v g (y,ϕ ji ) also satisfies (17). Now by combining (68) and (69) and placing ϕ ji on the diagonal of C j , we obtain

(70)

which is actually equivalent to (23) and the proof of Theorem 2 is complete.

Appendix D: Proof of Theorem 3

By removing the redundant elements in the sufficient statistic vector formulated in (51) for the complete-data likelihood function of a multivariate normal mixture, the sufficient statistic vector, s j (x) which comprises three elements for the jth member of the mixture, is given by:

$$ \mathbf{s}_{j}(\mathbf{x}) = \bigl[ \delta_{j},\delta_{j}\mathbf{y},\delta_{j} \mathbf{yy}^{T} \bigr]^{T} = \bigl[ s_{j,1}( \mathbf{x}),s_{j,2}(\mathbf{x}),s_{j,3}(\mathbf{x}) \bigr]^{T} . $$
(71)

Then we can obtain the conditional expected values of the three statistics under a batch setting:

(72)
(73)
(74)

By introducing the above equations into the original batch EM recursions (7)–(9) to represent the parameters of a multivariate normal mixture, we obtain:

(75)
(76)

and for C j we have:

By introducing (76) to replace μ j with sufficient statistics,

$$ \mathbf{C}_{j} = \frac{\overline{s_{j,3}}(\mathbf{x})}{\overline{s_{j,1}}(\mathbf{x})} - \frac{\overline{s_{j,2}}(\mathbf{x})\overline{s_{j,2}}^{T}(\mathbf{x})}{\overline{s_{j,1}}(\mathbf{x})\overline{s_{j,1}}(\mathbf{x})}. $$
(77)

The parametric representation for sufficient statistics is easy to derive from (75)–(77):

(78)
(79)
(80)

Equations (78)–(80) are the relation between the current estimated parameters and the current sufficient statistics. Then the exponential forgetting technique is applied to update the three sufficient statistics:

(81)
(82)
(83)

Recalling (75)–(77), at iteration k+1 we have,

(84)
(85)
(86)

Considering (78)–(80), we approximate \(s_{j,1}^{(k + 1)}\) with \(\omega_{j}^{(k + 1)}\), and \(s_{j,2}^{(k + 1)}\) with \(\omega_{j}^{(k + 1)}\boldsymbol{\mu}_{j}^{(k + 1)}\). After we substitute all conditional expected values of the three statistics by the parameters’ values at epoch k and k+1, (40)–(42) are obtained and the proof is completed. A similar proof for applying this on-line EM algorithm to a mixture model of Poisson distributions is given in [8].

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, D., Xu, L. & Goodman, E. On-line EM Variants for Multivariate Normal Mixture Model in Background Learning and Moving Foreground Detection. J Math Imaging Vis 48, 114–133 (2014). https://doi.org/10.1007/s10851-012-0403-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10851-012-0403-6

Keywords

Navigation