Skip to main content
Log in

Online variational learning of finite Dirichlet mixture models

  • Original Paper
  • Published:
Evolving Systems Aims and scope Submit manuscript

Abstract

In this paper, we present an online variational inference algorithm for finite Dirichlet mixture models learning. Online algorithms allow data points to be processed one at a time, which is important for real-time applications, and also where large scale data sets are involved so that batch processing of all data points at once becomes infeasible. By adopting the variational Bayes framework in an online manner, all the involved parameters and the model complexity (i.e. the number of components) of the Dirichlet mixture model can be estimated simultaneously in a closed form. The proposed algorithm is validated through both synthetic data sets and a challenging real-world application namely video background subtraction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. An interesting theoretical study about the convergence of variational Bayes approaches can be found in Wang and Titterington (2005).

References

  • Allili M, Bouguila N, Ziou D (2008) Finite general gaussian mixture modeling and application to image and video foreground segmentation. J Electron Imaging 17(1):1–13

    Article  Google Scholar 

  • Allili MS, Ziou D, Bouguila N, Boutemedjet S (2010) Image and video segmentation by combining unsupervised generalized gaussian mixture modeling and feature selection. IEEE Trans Circ Syst Video Technol 20(10):1373–1377

    Article  Google Scholar 

  • Amari SI (1998) Natural gradient works efficiently in learning. Neural Comput 10:251–276

    Article  Google Scholar 

  • Attias H (1999) A variational bayes framework for graphical models. In: Proceedings of advances in neural information processing systems (NIPS), pp 209–215

  • Bishop CM (2006) Pattern recognition and machine learning. Springer, Berlin

  • Bishop CM, Lawrence N, Jaakola T, Jordan MI (1997) Approximating posterior distributions in belief networks using mixtures. In: Proceedings of advances in neural information processing systems (NIPS)

  • Bottou L (1999) Online learning and stochastic approximations. In: On-line learning in neural networks, pp 9–42

  • Bouguila N, Ziou D (2004) Improving content based image retrieval systems using finite multinomial Dirichlet mixture. In: Proceedings of the 14th IEEE workshop on machine learning for signal processing, Sao Luis, Brazil, pp 23–32

  • Bouguila N, Ziou D (2005a) MML-based approach for finite Dirichlet mixture estimation and selection. In: Proceedings of the 4th international conference on machine learning and data mining in pattern recognition (MLDM), LNAI3587. Springer, Berlin, pp 42–51

  • Bouguila N, Ziou D (2005b) On fitting finite Dirichlet mixture using ECM and MML. In: Singh S et al (eds) Pattern recognition and data mining, third international conference on advances in pattern recognition, ICAPR (1). LNCS 3686. Springer, Berlin, pp 172–182

  • Bouguila N, Ziou D (2005c) A probabilistic approach for shadows modeling and detection. In: Proceedings of the IEEE international conference on image processing (ICIP), pp 329–332

  • Bouguila N, Ziou D (2005d) Using unsupervised learning of a finite Dirichlet mixture model to improve pattern recognition applications. Pattern Recognit Lett 26(12):1916–1925

    Article  Google Scholar 

  • Bouguila N, Ziou D (2006a) Online clustering via finite mixtures of dirichlet and minimum message length. Eng Appl Artif Intell 19(4):371–379

    Article  Google Scholar 

  • Bouguila N, Ziou D (2006b) Unsupervised selection of a finite dirichlet mixture model: an MML-based approach. IEEE Trans Knowl Data Eng 18(8):993–1009

    Article  Google Scholar 

  • Bouguila N, Ziou D (2007) Unsupervised learning of a finite discrete mixture: applications to texture modeling and image databases summarization. J Vis Commun Image Represent 18(4):295–309

    Article  Google Scholar 

  • Bouguila N, Ziou D, Vaillancourt J (2004) Unsupervised learning of a finite mixture model based on the Dirichlet distribution and its application. IEEE Trans Image Process 13(11):1533–1543

    Article  Google Scholar 

  • Bouguila N, Wang JH, Hamza AB (2010) Software modules categorization through likelihood and Bayesian analysis of finite Dirichlet mixtures. J Appl Stat 37(2):235–252

    Article  MathSciNet  Google Scholar 

  • Corduneanu A, Bishop CM (2001) Variational Bayesian model selection for mixture distributions. In: Proceedings of the 8th international conference on artificial intelligence and statistics (AISTAT), pp 27–34

  • Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc B 39(1):1–38

    MathSciNet  MATH  Google Scholar 

  • Diaconis P, Ylvisaker D (1979) Conjugate priors for exponential families. Ann Stat 7:269–281

    Article  MathSciNet  MATH  Google Scholar 

  • Fujimaki R, Sogawa Y, Morinaga S (2011) Online heterogeneous mixture modeling with marginal and copula selection. In: KDD, pp 645–653

  • Hoffman MD, Blei DM, Bach F (2010) Online learning for latent Dirichlet allocation. In: Proceedings of neural information processing systems (NIPS)

  • Jordan MI, Ghahramani Z, Jaakkola T, Saul LK (1999) An introduction to variational methods for graphical models. Mach Learn 37:183–233

    Article  MATH  Google Scholar 

  • Kushner H, Yin G (1997) Stochastic approximation algorithms and applications. Applications of mathematics, Springer, Berlin

  • Lawrence ND, Bishop CM, Jordan MI (1998) Mixture representations for inference and learning in boltzmann machines. In: Proceedings of the 15th conference on uncertainty in artificial intelligence (UAI), pp 320–327

  • Lee DS (2005) Effective Gaussian mixture learning for video background subtraction. IEEE Trans Pattern Anal Mach Intell 27(5):827–832

    Article  Google Scholar 

  • Ma Z, Leijon A (2011) Bayesian estimation of beta mixture models with variational inference. IEEE Trans Pattern Anal Mach Intell 33(11):2160–2173

    Article  Google Scholar 

  • Meier T, Ngan K (1998) Automatic segmentation of moving objects for video object plane generation. IEEE Trans Circ Syst Video Technol 8(5):525–538

    Article  Google Scholar 

  • Nasios N, Bors AG (2006) Variational learning for gaussian mixture models. IEEE Trans Syst Man Cybern B: Cybern 36(4):849–862

    Article  Google Scholar 

  • Parisi G (1988) Statistical field theory. Addison-Wesley

  • Piccardi M (2004) Background subtraction techniques: a review. In: IEEE international conference on systems, man and cybernetics (SMC), vol 4, pp 3099– 3104

  • Robert C (2001) The Bayesian choice. Springer, Berlin

  • Robert C, Casella G (1999) Monte Carlo statistical methods. Springer, Berlin

  • Sato MA (2001) Online model selection based on the variational Bayes. Neural Comput 13:1649–1681

    Article  MATH  Google Scholar 

  • Stauffer C, Grimson W (1999) Adaptive background mixture models for real-time tracking. In: IEEE computer society conference on computer vision and pattern recognition (CVPR), pp 246–252

  • Stauffer C, Grimson W (2000) Learning patterns of activity using real-time tracking. IEEE Trans Pattern Anal Mach Intell 22(8):747–757

    Article  Google Scholar 

  • Stiller C (1997) Object-based estimation of dense motion fields. IEEE Trans Image Process 6(2):234–250

    Article  Google Scholar 

  • Wang J, Adelson E (1994) Representing moving images with layers. IEEE Trans Image Process 3(5):625–638

    Article  Google Scholar 

  • Wang B, Titterington DM (2004) Convergence and asymptotic normality of variational bayesian approximations for exponential family models with missing values. In: Proceedings of the 20th conference on uncertainty in artificial intelligence (UAI), pp 577–584

  • Wang B, Titterington DM (2005) Inadequacy of interval estimates corresponding to variational Bayesian approximations. In: Proceedings of the international conference on artificial intelligence and statistics (AISTAT), pp 373–380

  • Wang C, Paisley JW, Blei DM (2011) Online variational inference for the hierarchical Dirichlet process. In: Proceedings of the 14th international conference on artificial intelligence and statistics (AISTAT)

  • Woolrich MW, Behrens TE (2006) Variational Bayes inference of spatial mixture models for segmentation. IEEE Trans Med Imaging 25(10):1380–1391

    Article  Google Scholar 

  • Zivkovic Z, van der Heijden F (2004) Recursive unsupervised learning of finite mixture models. IEEE Trans Pattern Anal Mach Intell 26:651–656

    Article  Google Scholar 

  • Zivkovic Z, van der Heijden F (2006) Efficient adaptive density estimation per image pixel for the task of background subtraction. Pattern Recognit Lett 27:773–780

    Article  Google Scholar 

Download references

Acknowledgments

The completion of this research was made possible thanks to the Natural Sciences and Engineering Research Council of Canada (NSERC).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nizar Bouguila.

Appendices

Appendix 1: Proof of Equation (15)

The variational parameter r ij is calculated by by setting the derivative of \(\mathcal{L}(Q)\) with respect to r ij to 0. Here we must take account of the constraint that ∑ M j=1 r ij  = 1. This can be achieved by adding a Lagarange multiplier to \(\mathcal{L}(Q). \) Taking the derivative w.r.t. r ij and setting the result to zero we get

$$ \begin{aligned} \frac{\partial {\mathcal{L}}(Q)}{\partial r_{ij}} =\,& 0\\ =\,& \frac{\partial}{\partial r_{ij}} \left\{ \sum_{i=1}^{N}\sum_{j=1}^Mr_{ij}\left[\ln\pi_{j} + {\mathcal{R}}_{j}+ \sum_{l=1}^{D}(\bar{\alpha}_{jl}-1)\ln X_{il}\right.\right.\\ &\qquad\quad\left. \left.-\ln r_{ij}\vphantom{\sum_{l=1}^{D}}\right] + \lambda\left(\sum_{j=1}^Mr_{ij}-1\right)\right\}\\ =\,& {\mathcal{R}}_{j}+ \sum_{l=1}^{D}(\bar{\alpha}_{jl}-1)\ln X_{il} + \ln\pi_{j} -(\ln r_{ij}+1) + \lambda\\ \end{aligned} $$
(39)

where

$$ {\mathcal{R}}_j = \left\langle\ln\frac{\Upgamma(\sum_{l=1}^D \alpha_{jl})}{\prod_{l=1}^D\Upgamma(\alpha_{jl})}\right\rangle \;, \qquad \bar{\alpha}_{jl} = \left\langle\alpha_{jl}\right\rangle=\frac{u_{jl}}{v_{jl}} $$
(40)

Unfortunately, a closed-form expression cannot be found for \(\mathcal{R}_{j}, \) so the variational inference can not be applied directly. Here, we approximate the function \(\mathcal{R}_j\) using a second-order Taylor expansion about the expected values of the parameters \(\varvec{\alpha}_j. \) This approximation function is denoted as \(\mathcal{\widetilde{R}}_j\) and is defined in (16). By substituting \(\mathcal{\widetilde{R}}_j\) into (39) for \(\mathcal{R}_j\) and applying some algebra, we then have

$$ \lambda = 1- \ln \sum\limits_{j=1}^{M} \exp\left(\widetilde{R}_{j}+ \sum\limits_{l=1}^{D}(\bar{\alpha}_{jl}-1)\ln X_{il}+\ln\pi_{j}\right) $$
(41)

By substituting (41) back into (39), we then obtain

$$ r_{ij} = \frac{\exp \left[\ln\pi_j + {\mathcal{\widetilde{R}}}_j+ \sum_{l=1}^{D}(\bar{\alpha}_{jl}-1)\ln X_{il}\right]}{\sum_{j=1}^{M}\exp \left[\ln\pi_{j} + {\mathcal{\widetilde{R}}}_j+ \sum_{l=1}^{D}(\bar{\alpha}_{jl}-1)\ln X_{il}\right]} $$
(42)

Appendix 2: proof of equation (17)

The value of the mixing coefficients \(\varvec{\pi}\) is calculated by maximizing the lower bound with respect to \(\varvec{\pi}. \) Adding a Lagarange term to the lower bound is required due to the constraint of ∑ M j=1 π j  = 1. By taking the derivative w.r.t. π j and setting the result to zero, we then have

$$ \begin{aligned} \frac{\partial {\mathcal{L}}(Q)}{\partial \pi_{j}} = &\frac{\partial}{\partial \pi_{j}} \sum\limits_{i=1}^{N}\sum\limits_{j=1}^{M}r_{ij}\,\hbox{ln}\,\pi_{j} + \lambda\left(\sum_{j=1}^{M}\pi_{j} -1\right)\\ = & \sum_{i=1}^{N}r_{ij}(1/\pi_{j}) + \lambda = 0 \\ \end{aligned} $$
(43)
$$ \Longrightarrow\quad \sum_{i=1}^{N}r_{ij} = -\lambda\pi_{j} $$
(44)

Summing both sides of (44) over j, we can obtain that λ =  − N. By substituting the value of λ back into (43), we then acquire

$$ \pi_j = \frac{1}{N}\sum\limits_{i=1}^{N}r_{ij} $$
(45)

Appendix 3: proof of equations (18) and (19)

For the variational factor Q jl ), instead of using the gradient method, it is more straightforward to use (11) for computing the variational solution. Therefore, the logarithm of Q jl ) is given by

$$ \begin{aligned} \ln Q(\alpha_{jl}) =\,& \left\langle {\mathbb{L}}({\mathcal{X}},\Uptheta)\right\rangle_{\Uptheta \neq \alpha_{jl}}\\ =& \sum\limits_{i=1}^{N} \left\langle Z_{ij}\right\rangle \left\langle\ln\frac{\Upgamma(\sum_{s=1}^{D} \alpha_{js})}{\prod_{s=1}^{D}\Upgamma(\alpha_{js})}\right\rangle_{\Uptheta \neq \alpha_{jl}}\\ & + \alpha_{jl}\sum_{i=1}^{N} \left\langle Z_{ij}\right\rangle\ln X_{il} + (u_{jl}-1)\hbox{ln}\alpha_{jl}\\ & -v_{jl}\alpha_{jl}+ \hbox{const.}\\ =& \sum_{i=1}^{N} r_{ij} {\mathcal{J}}(\alpha_{jl})+ \alpha_{jl} \sum_{i=1}^{N}r_{ij}\ln X_{il}\\ & + (u_{jl}-1)\ln\alpha_{jl} -v_{jl}\alpha_{jl} + \hbox{const.}\\ \end{aligned} $$
(46)

where

$$ {\mathcal{J}}(\alpha_{jl}) = \left\langle\ln\frac{\Upgamma(\alpha_l + \sum_{s\neq l}^{D} \alpha_{js})}{\Upgamma(\alpha_l)\prod_{s \neq l}^{D}\Upgamma(\alpha_{js})}\right\rangle_{\Uptheta \neq \alpha_{jl}} $$
(47)

Notice that, \(\mathcal{J}(\alpha_{jl})\) is a function of α jl and is unfortunately analytically intractable. We can obtain a approximate lower bound by applying first-order Taylor expansion about \(\bar{\alpha}_{jl}\) (the expected value of α jl ) as

$$ \begin{aligned} {\mathcal{J}}(\alpha_{jl})\geq& \bar{\alpha}_{jl} \hbox{ln} \alpha_{jl} \biggl[\Uppsi\biggl(\sum_{s=1}^{D}\bar{\alpha}_{js}\biggr) - \Uppsi(\bar{\alpha}_{jl}) \\ & + \sum_{s \neq l}^{D} \bar{\alpha}_{js}\Uppsi^{\prime}\left(\sum_{s=1}^D\bar{\alpha}_{js}\right)(\left\langle \hbox{ln} \alpha_{js}\right\rangle- \ln\bar{\alpha}_{js}) \biggr] + \hbox{const.}\\ \end{aligned} $$
(48)

Substituting (48) back into (46), this gives

$$ \begin{aligned} \hbox{ln}\, Q(\alpha_{jl})=&\sum_{i=1}^{N}r_{ij}\bar{\alpha}_{jl}\ln \alpha_{jl}\biggl[\Uppsi\biggl(\sum_{s=1}^{D}\bar{\alpha}_{js}\biggr) -\Uppsi(\bar{\alpha}_{jl})\\ & + \sum_{s \neq l}^{D} \Uppsi^{\prime}\biggl(\sum_{s=1}^{D}\bar{\alpha}_{js}\biggr) \bar{\alpha}_{js}(\left\langle \hbox{ln}\, \alpha_{js}\right\rangle- \ln\bar{\alpha}_{js}) \biggr]\\ & + \alpha_{jl}\sum_{i=1}^{N}r_{ij}\ln\, X_{il} + (u_{jl}-1)\ln\alpha_{jl} -v_{jl}\alpha_{jl} \\ & + \hbox{const.}\\ =\,& \hbox{ln}\, \alpha_{jl}(u_{jl}+\varphi_{jl}-1) - \alpha_{jl}(v_{jl} - \vartheta_{jl} ) + \hbox{const.}\\ \end{aligned} $$
(49)

where

$$ \begin{aligned} \varphi_{jl} =& \sum\limits_{i=1}^{N}r_{ij}\bar{\alpha}_{jl} \biggl[\Uppsi\biggl(\sum\limits_{s=1}^{D}\bar{\alpha}_{js}\biggr)-\Uppsi(\bar{\alpha}_{jl})\\ & +\sum\limits_{s \neq l}^{D} \Uppsi^{\prime}\biggl(\sum\limits_{s=1}^{D}\bar{\alpha}_{js}\biggr) \bar{\alpha}_{js}(\left\langle\ln \alpha_{js}\right\rangle- \ln\bar{\alpha}_{js})\biggr] \end{aligned} $$
(50)
$$ \vartheta_{jl} = \sum\limits_{i=1}^{N}r_{ij}\,\hbox{ln}\,X_{il} $$
(51)

It is obvious that (49) has the logarithmic form of a Gamma distribution. Taking the exponential of its both sides, it gives

$$ Q(\alpha_{jl}) \propto \alpha_{jl}^{u_{jl}+\varphi_{jl}-1} e^{-(v_{jl}-\vartheta_{jl})\alpha_{jl}} $$
(52)

Thus, the optimal solutions to the hyper-parameters \(\varvec{u}\) and \(\varvec{v}\) can be obtained as

$$ u_{jl}^{\ast} = u_{jl} + \varphi_{jl} \;, \qquad v_{jl}^\ast = v_{jl} - \vartheta_{jl}. $$
(53)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fan, W., Bouguila, N. Online variational learning of finite Dirichlet mixture models. Evolving Systems 3, 153–165 (2012). https://doi.org/10.1007/s12530-012-9047-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12530-012-9047-4

Keywords

Navigation