Online variational learning of finite Dirichlet mixture models

Fan, Wentao; Bouguila, Nizar

doi:10.1007/s12530-012-9047-4

Online variational learning of finite Dirichlet mixture models

Original Paper
Published: 07 February 2012

Volume 3, pages 153–165, (2012)
Cite this article

Evolving Systems Aims and scope Submit manuscript

Wentao Fan¹ &
Nizar Bouguila²

449 Accesses
13 Citations
Explore all metrics

Abstract

In this paper, we present an online variational inference algorithm for finite Dirichlet mixture models learning. Online algorithms allow data points to be processed one at a time, which is important for real-time applications, and also where large scale data sets are involved so that batch processing of all data points at once becomes infeasible. By adopting the variational Bayes framework in an online manner, all the involved parameters and the model complexity (i.e. the number of components) of the Dirichlet mixture model can be estimated simultaneously in a closed form. The proposed algorithm is validated through both synthetic data sets and a challenging real-world application namely video background subtraction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mixture Models: Latent Profile and Latent Class Analysis

An Overview of Stochastic Quasi-Newton Methods for Large-Scale Machine Learning

Article Open access 25 February 2023

Tian-De Guo, Yan Liu & Cong-Ying Han

Bayesian learning for neural networks: an algorithmic survey

Article Open access 15 March 2023

Martin Magris & Alexandros Iosifidis

Notes

An interesting theoretical study about the convergence of variational Bayes approaches can be found in Wang and Titterington (2005).

References

Allili M, Bouguila N, Ziou D (2008) Finite general gaussian mixture modeling and application to image and video foreground segmentation. J Electron Imaging 17(1):1–13
Article Google Scholar
Allili MS, Ziou D, Bouguila N, Boutemedjet S (2010) Image and video segmentation by combining unsupervised generalized gaussian mixture modeling and feature selection. IEEE Trans Circ Syst Video Technol 20(10):1373–1377
Article Google Scholar
Amari SI (1998) Natural gradient works efficiently in learning. Neural Comput 10:251–276
Article Google Scholar
Attias H (1999) A variational bayes framework for graphical models. In: Proceedings of advances in neural information processing systems (NIPS), pp 209–215
Bishop CM (2006) Pattern recognition and machine learning. Springer, Berlin
Bishop CM, Lawrence N, Jaakola T, Jordan MI (1997) Approximating posterior distributions in belief networks using mixtures. In: Proceedings of advances in neural information processing systems (NIPS)
Bottou L (1999) Online learning and stochastic approximations. In: On-line learning in neural networks, pp 9–42
Bouguila N, Ziou D (2004) Improving content based image retrieval systems using finite multinomial Dirichlet mixture. In: Proceedings of the 14th IEEE workshop on machine learning for signal processing, Sao Luis, Brazil, pp 23–32
Bouguila N, Ziou D (2005a) MML-based approach for finite Dirichlet mixture estimation and selection. In: Proceedings of the 4th international conference on machine learning and data mining in pattern recognition (MLDM), LNAI3587. Springer, Berlin, pp 42–51
Bouguila N, Ziou D (2005b) On fitting finite Dirichlet mixture using ECM and MML. In: Singh S et al (eds) Pattern recognition and data mining, third international conference on advances in pattern recognition, ICAPR (1). LNCS 3686. Springer, Berlin, pp 172–182
Bouguila N, Ziou D (2005c) A probabilistic approach for shadows modeling and detection. In: Proceedings of the IEEE international conference on image processing (ICIP), pp 329–332
Bouguila N, Ziou D (2005d) Using unsupervised learning of a finite Dirichlet mixture model to improve pattern recognition applications. Pattern Recognit Lett 26(12):1916–1925
Article Google Scholar
Bouguila N, Ziou D (2006a) Online clustering via finite mixtures of dirichlet and minimum message length. Eng Appl Artif Intell 19(4):371–379
Article Google Scholar
Bouguila N, Ziou D (2006b) Unsupervised selection of a finite dirichlet mixture model: an MML-based approach. IEEE Trans Knowl Data Eng 18(8):993–1009
Article Google Scholar
Bouguila N, Ziou D (2007) Unsupervised learning of a finite discrete mixture: applications to texture modeling and image databases summarization. J Vis Commun Image Represent 18(4):295–309
Article Google Scholar
Bouguila N, Ziou D, Vaillancourt J (2004) Unsupervised learning of a finite mixture model based on the Dirichlet distribution and its application. IEEE Trans Image Process 13(11):1533–1543
Article Google Scholar
Bouguila N, Wang JH, Hamza AB (2010) Software modules categorization through likelihood and Bayesian analysis of finite Dirichlet mixtures. J Appl Stat 37(2):235–252
Article MathSciNet Google Scholar
Corduneanu A, Bishop CM (2001) Variational Bayesian model selection for mixture distributions. In: Proceedings of the 8th international conference on artificial intelligence and statistics (AISTAT), pp 27–34
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc B 39(1):1–38
MathSciNet MATH Google Scholar
Diaconis P, Ylvisaker D (1979) Conjugate priors for exponential families. Ann Stat 7:269–281
Article MathSciNet MATH Google Scholar
Fujimaki R, Sogawa Y, Morinaga S (2011) Online heterogeneous mixture modeling with marginal and copula selection. In: KDD, pp 645–653
Hoffman MD, Blei DM, Bach F (2010) Online learning for latent Dirichlet allocation. In: Proceedings of neural information processing systems (NIPS)
Jordan MI, Ghahramani Z, Jaakkola T, Saul LK (1999) An introduction to variational methods for graphical models. Mach Learn 37:183–233
Article MATH Google Scholar
Kushner H, Yin G (1997) Stochastic approximation algorithms and applications. Applications of mathematics, Springer, Berlin
Lawrence ND, Bishop CM, Jordan MI (1998) Mixture representations for inference and learning in boltzmann machines. In: Proceedings of the 15th conference on uncertainty in artificial intelligence (UAI), pp 320–327
Lee DS (2005) Effective Gaussian mixture learning for video background subtraction. IEEE Trans Pattern Anal Mach Intell 27(5):827–832
Article Google Scholar
Ma Z, Leijon A (2011) Bayesian estimation of beta mixture models with variational inference. IEEE Trans Pattern Anal Mach Intell 33(11):2160–2173
Article Google Scholar
Meier T, Ngan K (1998) Automatic segmentation of moving objects for video object plane generation. IEEE Trans Circ Syst Video Technol 8(5):525–538
Article Google Scholar
Nasios N, Bors AG (2006) Variational learning for gaussian mixture models. IEEE Trans Syst Man Cybern B: Cybern 36(4):849–862
Article Google Scholar
Parisi G (1988) Statistical field theory. Addison-Wesley
Piccardi M (2004) Background subtraction techniques: a review. In: IEEE international conference on systems, man and cybernetics (SMC), vol 4, pp 3099– 3104
Robert C (2001) The Bayesian choice. Springer, Berlin
Robert C, Casella G (1999) Monte Carlo statistical methods. Springer, Berlin
Sato MA (2001) Online model selection based on the variational Bayes. Neural Comput 13:1649–1681
Article MATH Google Scholar
Stauffer C, Grimson W (1999) Adaptive background mixture models for real-time tracking. In: IEEE computer society conference on computer vision and pattern recognition (CVPR), pp 246–252
Stauffer C, Grimson W (2000) Learning patterns of activity using real-time tracking. IEEE Trans Pattern Anal Mach Intell 22(8):747–757
Article Google Scholar
Stiller C (1997) Object-based estimation of dense motion fields. IEEE Trans Image Process 6(2):234–250
Article Google Scholar
Wang J, Adelson E (1994) Representing moving images with layers. IEEE Trans Image Process 3(5):625–638
Article Google Scholar
Wang B, Titterington DM (2004) Convergence and asymptotic normality of variational bayesian approximations for exponential family models with missing values. In: Proceedings of the 20th conference on uncertainty in artificial intelligence (UAI), pp 577–584
Wang B, Titterington DM (2005) Inadequacy of interval estimates corresponding to variational Bayesian approximations. In: Proceedings of the international conference on artificial intelligence and statistics (AISTAT), pp 373–380
Wang C, Paisley JW, Blei DM (2011) Online variational inference for the hierarchical Dirichlet process. In: Proceedings of the 14th international conference on artificial intelligence and statistics (AISTAT)
Woolrich MW, Behrens TE (2006) Variational Bayes inference of spatial mixture models for segmentation. IEEE Trans Med Imaging 25(10):1380–1391
Article Google Scholar
Zivkovic Z, van der Heijden F (2004) Recursive unsupervised learning of finite mixture models. IEEE Trans Pattern Anal Mach Intell 26:651–656
Article Google Scholar
Zivkovic Z, van der Heijden F (2006) Efficient adaptive density estimation per image pixel for the task of background subtraction. Pattern Recognit Lett 27:773–780
Article Google Scholar

Download references

Acknowledgments

The completion of this research was made possible thanks to the Natural Sciences and Engineering Research Council of Canada (NSERC).

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, Concordia University, Montreal, QC, Canada
Wentao Fan
Concordia Institute for Information Systems Engineering (CIISE), Concordia University, Montreal, QC, Canada
Nizar Bouguila

Authors

Wentao Fan
View author publications
You can also search for this author in PubMed Google Scholar
Nizar Bouguila
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nizar Bouguila.

Appendices

Appendix 1: Proof of Equation (15)

The variational parameter r _ij is calculated by by setting the derivative of $\mathcal{L}(Q)$ with respect to r _ij to 0. Here we must take account of the constraint that ∑ ^M_j=1 r _ij = 1. This can be achieved by adding a Lagarange multiplier to $\mathcal{L}(Q). $ Taking the derivative w.r.t. r _ij and setting the result to zero we get

$$ \begin{aligned} \frac{\partial {\mathcal{L}}(Q)}{\partial r_{ij}} =\,& 0\\ =\,& \frac{\partial}{\partial r_{ij}} \left\{ \sum_{i=1}^{N}\sum_{j=1}^Mr_{ij}\left[\ln\pi_{j} + {\mathcal{R}}_{j}+ \sum_{l=1}^{D}(\bar{\alpha}_{jl}-1)\ln X_{il}\right.\right.\\ &\qquad\quad\left. \left.-\ln r_{ij}\vphantom{\sum_{l=1}^{D}}\right] + \lambda\left(\sum_{j=1}^Mr_{ij}-1\right)\right\}\\ =\,& {\mathcal{R}}_{j}+ \sum_{l=1}^{D}(\bar{\alpha}_{jl}-1)\ln X_{il} + \ln\pi_{j} -(\ln r_{ij}+1) + \lambda\\ \end{aligned} $$

(39)

where

$$ {\mathcal{R}}_j = \left\langle\ln\frac{\Upgamma(\sum_{l=1}^D \alpha_{jl})}{\prod_{l=1}^D\Upgamma(\alpha_{jl})}\right\rangle \;, \qquad \bar{\alpha}_{jl} = \left\langle\alpha_{jl}\right\rangle=\frac{u_{jl}}{v_{jl}} $$

(40)

Unfortunately, a closed-form expression cannot be found for $\mathcal{R}_{j}, $ so the variational inference can not be applied directly. Here, we approximate the function $\mathcal{R}_j$ using a second-order Taylor expansion about the expected values of the parameters $\varvec{\alpha}_j. $ This approximation function is denoted as $\mathcal{\widetilde{R}}_j$ and is defined in (16). By substituting $\mathcal{\widetilde{R}}_j$ into (39) for $\mathcal{R}_j$ and applying some algebra, we then have

$$ \lambda = 1- \ln \sum\limits_{j=1}^{M} \exp\left(\widetilde{R}_{j}+ \sum\limits_{l=1}^{D}(\bar{\alpha}_{jl}-1)\ln X_{il}+\ln\pi_{j}\right) $$

(41)

By substituting (41) back into (39), we then obtain

$$ r_{ij} = \frac{\exp \left[\ln\pi_j + {\mathcal{\widetilde{R}}}_j+ \sum_{l=1}^{D}(\bar{\alpha}_{jl}-1)\ln X_{il}\right]}{\sum_{j=1}^{M}\exp \left[\ln\pi_{j} + {\mathcal{\widetilde{R}}}_j+ \sum_{l=1}^{D}(\bar{\alpha}_{jl}-1)\ln X_{il}\right]} $$

(42)

Appendix 2: proof of equation (17)

The value of the mixing coefficients $\varvec{\pi}$ is calculated by maximizing the lower bound with respect to $\varvec{\pi}. $ Adding a Lagarange term to the lower bound is required due to the constraint of ∑ ^M_j=1 π_j = 1. By taking the derivative w.r.t. π_j and setting the result to zero, we then have

$$ \begin{aligned} \frac{\partial {\mathcal{L}}(Q)}{\partial \pi_{j}} = &\frac{\partial}{\partial \pi_{j}} \sum\limits_{i=1}^{N}\sum\limits_{j=1}^{M}r_{ij}\,\hbox{ln}\,\pi_{j} + \lambda\left(\sum_{j=1}^{M}\pi_{j} -1\right)\\ = & \sum_{i=1}^{N}r_{ij}(1/\pi_{j}) + \lambda = 0 \\ \end{aligned} $$

(43)

$$ \Longrightarrow\quad \sum_{i=1}^{N}r_{ij} = -\lambda\pi_{j} $$

(44)

Summing both sides of (44) over j, we can obtain that λ = − N. By substituting the value of λ back into (43), we then acquire

$$ \pi_j = \frac{1}{N}\sum\limits_{i=1}^{N}r_{ij} $$

(45)

Appendix 3: proof of equations (18) and (19)

For the variational factor Q(α_jl), instead of using the gradient method, it is more straightforward to use (11) for computing the variational solution. Therefore, the logarithm of Q(α_jl) is given by

$$ \begin{aligned} \ln Q(\alpha_{jl}) =\,& \left\langle {\mathbb{L}}({\mathcal{X}},\Uptheta)\right\rangle_{\Uptheta \neq \alpha_{jl}}\\ =& \sum\limits_{i=1}^{N} \left\langle Z_{ij}\right\rangle \left\langle\ln\frac{\Upgamma(\sum_{s=1}^{D} \alpha_{js})}{\prod_{s=1}^{D}\Upgamma(\alpha_{js})}\right\rangle_{\Uptheta \neq \alpha_{jl}}\\ & + \alpha_{jl}\sum_{i=1}^{N} \left\langle Z_{ij}\right\rangle\ln X_{il} + (u_{jl}-1)\hbox{ln}\alpha_{jl}\\ & -v_{jl}\alpha_{jl}+ \hbox{const.}\\ =& \sum_{i=1}^{N} r_{ij} {\mathcal{J}}(\alpha_{jl})+ \alpha_{jl} \sum_{i=1}^{N}r_{ij}\ln X_{il}\\ & + (u_{jl}-1)\ln\alpha_{jl} -v_{jl}\alpha_{jl} + \hbox{const.}\\ \end{aligned} $$

(46)

where

$$ {\mathcal{J}}(\alpha_{jl}) = \left\langle\ln\frac{\Upgamma(\alpha_l + \sum_{s\neq l}^{D} \alpha_{js})}{\Upgamma(\alpha_l)\prod_{s \neq l}^{D}\Upgamma(\alpha_{js})}\right\rangle_{\Uptheta \neq \alpha_{jl}} $$

(47)

Notice that, $\mathcal{J}(\alpha_{jl})$ is a function of α_jl and is unfortunately analytically intractable. We can obtain a approximate lower bound by applying first-order Taylor expansion about $\bar{\alpha}_{jl}$ (the expected value of α_jl) as

$$ \begin{aligned} {\mathcal{J}}(\alpha_{jl})\geq& \bar{\alpha}_{jl} \hbox{ln} \alpha_{jl} \biggl[\Uppsi\biggl(\sum_{s=1}^{D}\bar{\alpha}_{js}\biggr) - \Uppsi(\bar{\alpha}_{jl}) \\ & + \sum_{s \neq l}^{D} \bar{\alpha}_{js}\Uppsi^{\prime}\left(\sum_{s=1}^D\bar{\alpha}_{js}\right)(\left\langle \hbox{ln} \alpha_{js}\right\rangle- \ln\bar{\alpha}_{js}) \biggr] + \hbox{const.}\\ \end{aligned} $$

(48)

Substituting (48) back into (46), this gives

$$ \begin{aligned} \hbox{ln}\, Q(\alpha_{jl})=&\sum_{i=1}^{N}r_{ij}\bar{\alpha}_{jl}\ln \alpha_{jl}\biggl[\Uppsi\biggl(\sum_{s=1}^{D}\bar{\alpha}_{js}\biggr) -\Uppsi(\bar{\alpha}_{jl})\\ & + \sum_{s \neq l}^{D} \Uppsi^{\prime}\biggl(\sum_{s=1}^{D}\bar{\alpha}_{js}\biggr) \bar{\alpha}_{js}(\left\langle \hbox{ln}\, \alpha_{js}\right\rangle- \ln\bar{\alpha}_{js}) \biggr]\\ & + \alpha_{jl}\sum_{i=1}^{N}r_{ij}\ln\, X_{il} + (u_{jl}-1)\ln\alpha_{jl} -v_{jl}\alpha_{jl} \\ & + \hbox{const.}\\ =\,& \hbox{ln}\, \alpha_{jl}(u_{jl}+\varphi_{jl}-1) - \alpha_{jl}(v_{jl} - \vartheta_{jl} ) + \hbox{const.}\\ \end{aligned} $$

(49)

where

$$ \begin{aligned} \varphi_{jl} =& \sum\limits_{i=1}^{N}r_{ij}\bar{\alpha}_{jl} \biggl[\Uppsi\biggl(\sum\limits_{s=1}^{D}\bar{\alpha}_{js}\biggr)-\Uppsi(\bar{\alpha}_{jl})\\ & +\sum\limits_{s \neq l}^{D} \Uppsi^{\prime}\biggl(\sum\limits_{s=1}^{D}\bar{\alpha}_{js}\biggr) \bar{\alpha}_{js}(\left\langle\ln \alpha_{js}\right\rangle- \ln\bar{\alpha}_{js})\biggr] \end{aligned} $$

(50)

$$ \vartheta_{jl} = \sum\limits_{i=1}^{N}r_{ij}\,\hbox{ln}\,X_{il} $$

(51)

It is obvious that (49) has the logarithmic form of a Gamma distribution. Taking the exponential of its both sides, it gives

$$ Q(\alpha_{jl}) \propto \alpha_{jl}^{u_{jl}+\varphi_{jl}-1} e^{-(v_{jl}-\vartheta_{jl})\alpha_{jl}} $$

(52)

Thus, the optimal solutions to the hyper-parameters $\varvec{u}$ and $\varvec{v}$ can be obtained as

$$ u_{jl}^{\ast} = u_{jl} + \varphi_{jl} \;, \qquad v_{jl}^\ast = v_{jl} - \vartheta_{jl}. $$

(53)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fan, W., Bouguila, N. Online variational learning of finite Dirichlet mixture models. Evolving Systems 3, 153–165 (2012). https://doi.org/10.1007/s12530-012-9047-4

Download citation

Received: 18 September 2011
Accepted: 14 January 2012
Published: 07 February 2012
Issue Date: September 2012
DOI: https://doi.org/10.1007/s12530-012-9047-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Online variational learning of finite Dirichlet mixture models

Abstract

Access this article

Similar content being viewed by others

Mixture Models: Latent Profile and Latent Class Analysis

An Overview of Stochastic Quasi-Newton Methods for Large-Scale Machine Learning

Bayesian learning for neural networks: an algorithmic survey

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: Proof of Equation (15)

Appendix 2: proof of equation (17)

Appendix 3: proof of equations (18) and (19)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Online variational learning of finite Dirichlet mixture models

Abstract

Access this article

Similar content being viewed by others

Mixture Models: Latent Profile and Latent Class Analysis

An Overview of Stochastic Quasi-Newton Methods for Large-Scale Machine Learning

Bayesian learning for neural networks: an algorithmic survey

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: Proof of Equation (15)

Appendix 2: proof of equation (17)

Appendix 3: proof of equations (18) and (19)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation