Abstract
In this paper, we present an online variational inference algorithm for finite Dirichlet mixture models learning. Online algorithms allow data points to be processed one at a time, which is important for real-time applications, and also where large scale data sets are involved so that batch processing of all data points at once becomes infeasible. By adopting the variational Bayes framework in an online manner, all the involved parameters and the model complexity (i.e. the number of components) of the Dirichlet mixture model can be estimated simultaneously in a closed form. The proposed algorithm is validated through both synthetic data sets and a challenging real-world application namely video background subtraction.
Similar content being viewed by others
Notes
An interesting theoretical study about the convergence of variational Bayes approaches can be found in Wang and Titterington (2005).
References
Allili M, Bouguila N, Ziou D (2008) Finite general gaussian mixture modeling and application to image and video foreground segmentation. J Electron Imaging 17(1):1–13
Allili MS, Ziou D, Bouguila N, Boutemedjet S (2010) Image and video segmentation by combining unsupervised generalized gaussian mixture modeling and feature selection. IEEE Trans Circ Syst Video Technol 20(10):1373–1377
Amari SI (1998) Natural gradient works efficiently in learning. Neural Comput 10:251–276
Attias H (1999) A variational bayes framework for graphical models. In: Proceedings of advances in neural information processing systems (NIPS), pp 209–215
Bishop CM (2006) Pattern recognition and machine learning. Springer, Berlin
Bishop CM, Lawrence N, Jaakola T, Jordan MI (1997) Approximating posterior distributions in belief networks using mixtures. In: Proceedings of advances in neural information processing systems (NIPS)
Bottou L (1999) Online learning and stochastic approximations. In: On-line learning in neural networks, pp 9–42
Bouguila N, Ziou D (2004) Improving content based image retrieval systems using finite multinomial Dirichlet mixture. In: Proceedings of the 14th IEEE workshop on machine learning for signal processing, Sao Luis, Brazil, pp 23–32
Bouguila N, Ziou D (2005a) MML-based approach for finite Dirichlet mixture estimation and selection. In: Proceedings of the 4th international conference on machine learning and data mining in pattern recognition (MLDM), LNAI3587. Springer, Berlin, pp 42–51
Bouguila N, Ziou D (2005b) On fitting finite Dirichlet mixture using ECM and MML. In: Singh S et al (eds) Pattern recognition and data mining, third international conference on advances in pattern recognition, ICAPR (1). LNCS 3686. Springer, Berlin, pp 172–182
Bouguila N, Ziou D (2005c) A probabilistic approach for shadows modeling and detection. In: Proceedings of the IEEE international conference on image processing (ICIP), pp 329–332
Bouguila N, Ziou D (2005d) Using unsupervised learning of a finite Dirichlet mixture model to improve pattern recognition applications. Pattern Recognit Lett 26(12):1916–1925
Bouguila N, Ziou D (2006a) Online clustering via finite mixtures of dirichlet and minimum message length. Eng Appl Artif Intell 19(4):371–379
Bouguila N, Ziou D (2006b) Unsupervised selection of a finite dirichlet mixture model: an MML-based approach. IEEE Trans Knowl Data Eng 18(8):993–1009
Bouguila N, Ziou D (2007) Unsupervised learning of a finite discrete mixture: applications to texture modeling and image databases summarization. J Vis Commun Image Represent 18(4):295–309
Bouguila N, Ziou D, Vaillancourt J (2004) Unsupervised learning of a finite mixture model based on the Dirichlet distribution and its application. IEEE Trans Image Process 13(11):1533–1543
Bouguila N, Wang JH, Hamza AB (2010) Software modules categorization through likelihood and Bayesian analysis of finite Dirichlet mixtures. J Appl Stat 37(2):235–252
Corduneanu A, Bishop CM (2001) Variational Bayesian model selection for mixture distributions. In: Proceedings of the 8th international conference on artificial intelligence and statistics (AISTAT), pp 27–34
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc B 39(1):1–38
Diaconis P, Ylvisaker D (1979) Conjugate priors for exponential families. Ann Stat 7:269–281
Fujimaki R, Sogawa Y, Morinaga S (2011) Online heterogeneous mixture modeling with marginal and copula selection. In: KDD, pp 645–653
Hoffman MD, Blei DM, Bach F (2010) Online learning for latent Dirichlet allocation. In: Proceedings of neural information processing systems (NIPS)
Jordan MI, Ghahramani Z, Jaakkola T, Saul LK (1999) An introduction to variational methods for graphical models. Mach Learn 37:183–233
Kushner H, Yin G (1997) Stochastic approximation algorithms and applications. Applications of mathematics, Springer, Berlin
Lawrence ND, Bishop CM, Jordan MI (1998) Mixture representations for inference and learning in boltzmann machines. In: Proceedings of the 15th conference on uncertainty in artificial intelligence (UAI), pp 320–327
Lee DS (2005) Effective Gaussian mixture learning for video background subtraction. IEEE Trans Pattern Anal Mach Intell 27(5):827–832
Ma Z, Leijon A (2011) Bayesian estimation of beta mixture models with variational inference. IEEE Trans Pattern Anal Mach Intell 33(11):2160–2173
Meier T, Ngan K (1998) Automatic segmentation of moving objects for video object plane generation. IEEE Trans Circ Syst Video Technol 8(5):525–538
Nasios N, Bors AG (2006) Variational learning for gaussian mixture models. IEEE Trans Syst Man Cybern B: Cybern 36(4):849–862
Parisi G (1988) Statistical field theory. Addison-Wesley
Piccardi M (2004) Background subtraction techniques: a review. In: IEEE international conference on systems, man and cybernetics (SMC), vol 4, pp 3099– 3104
Robert C (2001) The Bayesian choice. Springer, Berlin
Robert C, Casella G (1999) Monte Carlo statistical methods. Springer, Berlin
Sato MA (2001) Online model selection based on the variational Bayes. Neural Comput 13:1649–1681
Stauffer C, Grimson W (1999) Adaptive background mixture models for real-time tracking. In: IEEE computer society conference on computer vision and pattern recognition (CVPR), pp 246–252
Stauffer C, Grimson W (2000) Learning patterns of activity using real-time tracking. IEEE Trans Pattern Anal Mach Intell 22(8):747–757
Stiller C (1997) Object-based estimation of dense motion fields. IEEE Trans Image Process 6(2):234–250
Wang J, Adelson E (1994) Representing moving images with layers. IEEE Trans Image Process 3(5):625–638
Wang B, Titterington DM (2004) Convergence and asymptotic normality of variational bayesian approximations for exponential family models with missing values. In: Proceedings of the 20th conference on uncertainty in artificial intelligence (UAI), pp 577–584
Wang B, Titterington DM (2005) Inadequacy of interval estimates corresponding to variational Bayesian approximations. In: Proceedings of the international conference on artificial intelligence and statistics (AISTAT), pp 373–380
Wang C, Paisley JW, Blei DM (2011) Online variational inference for the hierarchical Dirichlet process. In: Proceedings of the 14th international conference on artificial intelligence and statistics (AISTAT)
Woolrich MW, Behrens TE (2006) Variational Bayes inference of spatial mixture models for segmentation. IEEE Trans Med Imaging 25(10):1380–1391
Zivkovic Z, van der Heijden F (2004) Recursive unsupervised learning of finite mixture models. IEEE Trans Pattern Anal Mach Intell 26:651–656
Zivkovic Z, van der Heijden F (2006) Efficient adaptive density estimation per image pixel for the task of background subtraction. Pattern Recognit Lett 27:773–780
Acknowledgments
The completion of this research was made possible thanks to the Natural Sciences and Engineering Research Council of Canada (NSERC).
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix 1: Proof of Equation (15)
The variational parameter r ij is calculated by by setting the derivative of \(\mathcal{L}(Q)\) with respect to r ij to 0. Here we must take account of the constraint that ∑ M j=1 r ij = 1. This can be achieved by adding a Lagarange multiplier to \(\mathcal{L}(Q). \) Taking the derivative w.r.t. r ij and setting the result to zero we get
where
Unfortunately, a closed-form expression cannot be found for \(\mathcal{R}_{j}, \) so the variational inference can not be applied directly. Here, we approximate the function \(\mathcal{R}_j\) using a second-order Taylor expansion about the expected values of the parameters \(\varvec{\alpha}_j. \) This approximation function is denoted as \(\mathcal{\widetilde{R}}_j\) and is defined in (16). By substituting \(\mathcal{\widetilde{R}}_j\) into (39) for \(\mathcal{R}_j\) and applying some algebra, we then have
By substituting (41) back into (39), we then obtain
Appendix 2: proof of equation (17)
The value of the mixing coefficients \(\varvec{\pi}\) is calculated by maximizing the lower bound with respect to \(\varvec{\pi}. \) Adding a Lagarange term to the lower bound is required due to the constraint of ∑ M j=1 π j = 1. By taking the derivative w.r.t. π j and setting the result to zero, we then have
Summing both sides of (44) over j, we can obtain that λ = − N. By substituting the value of λ back into (43), we then acquire
Appendix 3: proof of equations (18) and (19)
For the variational factor Q(α jl ), instead of using the gradient method, it is more straightforward to use (11) for computing the variational solution. Therefore, the logarithm of Q(α jl ) is given by
where
Notice that, \(\mathcal{J}(\alpha_{jl})\) is a function of α jl and is unfortunately analytically intractable. We can obtain a approximate lower bound by applying first-order Taylor expansion about \(\bar{\alpha}_{jl}\) (the expected value of α jl ) as
Substituting (48) back into (46), this gives
where
It is obvious that (49) has the logarithmic form of a Gamma distribution. Taking the exponential of its both sides, it gives
Thus, the optimal solutions to the hyper-parameters \(\varvec{u}\) and \(\varvec{v}\) can be obtained as
Rights and permissions
About this article
Cite this article
Fan, W., Bouguila, N. Online variational learning of finite Dirichlet mixture models. Evolving Systems 3, 153–165 (2012). https://doi.org/10.1007/s12530-012-9047-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12530-012-9047-4