Nonparametric hierarchical mixture models based on asymmetric Gaussian distribution

https://doi.org/10.1016/j.dsp.2020.102829Get rights and content

Abstract

Mixture models are broadly applied in image processing domains with hierarchical Bayesian approaches popularly considered for grouped data. The related inference process takes into consideration both the approximation of exact data shapes and estimation of adequate component numbers. In our work, we develop nonparametric hierarchical Bayesian models using the Dirichlet and Pitman-Yor processes with asymmetric Gaussian distribution. The parameters of these models are learned using variational inference methods. The effectiveness and merits of the proposed approaches are validated using the challenging real-life application of dynamic texture clustering.

Introduction

With the rapid advancement of digital technologies, the need for the development of visual data modeling becomes more urgent. Among these techniques, finite mixture models have been widely used in a series of domains, such as image processing and pattern recognition [1] [2], as an efficient unsupervised learning approach. Such models can discover the structure of extracted visual features and classify them into distinct groups.

A challenging problem when applying finite mixture models is model selection (i.e., determination of the model's complexity) since an inappropriate number of mixture components could result in poor generalization capability. Numerous studies have been devoted to the self-refinement selection of the components' number which best depict the vectors, such as maximum likelihood method in which a model selection criterion is included. Recently, nonparametric Bayesian methods, especially Dirichlet process (DP) mixture models, have been widely considered to deal with the model selection problem [3] [4].

Mixture models which allow the number of clusters to extend to infinity as the new data arrive could be viewed as nonparametric models [5]. In one of our earlier work [6], we have constructed a DP mixture of asymmetric Gaussian distributions (AGD) allowing simultaneous feature selection for video background subtraction. The DP is a parameterized stochastic process with a positive scaling factor and base distribution which is used to form a distribution over discrete distribution. A sound alternative to DP is the Pitman-Yor process (PYP) which can be viewed as a generalization to the DP prior for nonparametric Bayesian modeling [7]. In this paper, we are interested in Bayesian nonparametric models based on the Dirichlet and Pitman-Yor processes.

Hierarchical Bayesian models have been an attractive research topic and been successfully applied in various fields such as language modeling, image segmentation, etc [8] [9]. For modeling grouped data with shared clusters, hierarchical nonparametric Bayesian approaches namely the hierarchical DP or hierarchical PYP mixtures are considered. Within the same group, each observation is drawn independently from a mixture model, where the number of observations may be different within each group. Under the settings of hierarchical modeling, parameters are shared among groups, and the randomness of the parameters induces dependencies among different groups. Another crucial problem when dealing with vectors of visual descriptors, that we take into account within our nonparametric Bayesian framework, is parameter estimation. The inference for the resulting models is conducted under a Bayesian setting by means of a variational Bayes technique and a gradient ascent method namely black box variational inference (BBVI).

Another challenging problem when considering mixture models within nonparametric Bayesian frameworks is the choice of the base distribution. Gaussian distribution has enjoyed great popularity in many fields since it provides interpretable results and is easily generalized to new tasks [10]. Despite its success and usefulness in many application domains, it is not always an adequate choice and suffers from some limitations with asymmetric shaped data which is frequently appears in image processing tasks [11]. Indeed, this is the case especially for natural images. For achieving improved accurate approximation and modeling performance, we consider AGD which is competent in modeling asymmetric data: this distribution has left and right standard deviations to capture the asymmetry of data [12].

The major contributions of this work can be summarized as follows: Firstly, we propose two efficient nonparametric hierarchical models based on DP and PYP mixtures with AGD. Secondly, we develop efficient learning algorithms to estimate both models' parameters through an inference framework which integrates coordinate ascent variational inference (CAVI) and BBVI method. The proposed nonparametric hierarchical Bayesian models and the learning algorithm are validated using a real-life application namely dynamic texture clustering. It is noteworthy to mention that the complexity of the proposed method still remains less than MCMC. Furthermore, model selection is a simultaneous operation in the proposed algorithm. This is not the case for MCMC methods where model selection is usually solved as a preprocessing or post processing step. Overall, that also leads to faster convergence of the proposed method in comparison to the alternatives.

The remainder of this paper is organized as follows: Section 2 describes the background of hierarchical DP mixture and hierarchical PYP mixture, and defines it via stick-breaking construction. Section 3 develops the variational inference framework to optimize the evidence lower bound and estimate the parameters of the resulting model. Section 4 presents the complete learning algorithms for our approaches. Section 5 applies the proposed approach to the challenging task of dynamic texture clustering. Finally, Section 6 concludes the paper.

Section snippets

Hierarchical infinite asymmetric Gaussian mixture

In this section, we briefly introduce our hierarchical DP mixture model of AGD, which may also be referred to as the hierarchical infinite asymmetric Gaussian mixture model.

Variational approximation

Variational inference is a well-defined method to approximate probability densities through optimization [23] [24]. The idea behind variational inference is to approximate the true posterior distribution p(w|x) with a suitable approximation distribution q(w) from a restrictive family of distribution, where w=(C,V,Z,π,μ,σl,σr) represents the set of latent variables in the HDPAGM model.

The objective of variational inference is to discover the closest parameters in the constrained variational

Learning algorithm

An important aspect when applying variational inference is the convergence assessment. In our work, we trace the convergence systematically by monitoring the ELBO. Convergence is reached when the ELBO is less than 102 between epochs or the number of iterations is more than 300. The Bayesian inference framework of the HDPAGM is summarized in Algorithm 1.

The detailed learning equations of HPYPAGM are presented in the Appendix A. The complete learning algorithm is summarized in Algorithm 2.

Experimental results

We evaluate the effectiveness of the proposed HDP mixture and HPYP mixture model with AGD using challenging dynamic texture clustering application. In our experiments, we initialize the global truncation level K and group level truncation level T to 120 and 60, respectively. For HDP mixture, the hyperparameters of the stick lengths ω and α are initialized to 0.25; we set the parameters of HPYP mixture γa, γb, βa and βb as 0.25.

The hyperparameters of asymmetric Gaussian base distribution are

Conclusion

In this paper, we have presented a statistical clustering framework based on AGD. This framework is developed from nonparametric Bayesian prior. We have proposed and implemented an effective variational inference framework to estimate latent variables for hierarchical infinite mixtures. As for estimating the parameters, we adopt a tenable fully factorized assumption over the family of variables to optimize the lower bound of the likelihood of the models. The effectiveness of these models are

CRediT authorship contribution statement

Ziyang Song: Methodology, Software, Writing - original draft. Samr Ali: Writing - review & editing. Nizar Bouguila: Supervision. Wentao Fan: Conceptualization.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

The completion of this research was possible thanks to the Natural Sciences and Engineering Research Council of Canada (NSERC) grant number 6656-2017 and the National Natural Science Foundation of China (61876068).

Ziyang Song is currently a M.Sc. student at the Concordia Institute for Information Systems Engineering (CIISE), Concordia University, Montreal, QC, Canada. His research interests include machine learning, probabilistic graphical model and Bayesian inference.

References (36)

  • Y.W. Teh

    A hierarchical Bayesian language model based on Pitman-yor processes

  • Y.W. Teh et al.

    Hierarchical Bayesian Nonparametric Models with Applications

    (2010)
  • W. Fan et al.

    Online learning of hierarchical Pitman-yor process mixture of generalized Dirichlet distributions with feature selection

    IEEE Trans. Neural Netw. Learn. Syst.

    (2016)
  • S. Park et al.

    Gaussian assumption: the least favorable but the most useful [lecture notes]

    IEEE Signal Process. Mag.

    (May 2013)
  • T.S. Ferguson

    A Bayesian analysis of some nonparametric problems

    Ann. Stat.

    (1973)
  • J.E. Gentle

    Random Number Generation and Monte Carlo Methods

    (2006)
  • L. Martino

    Independent Random Sampling Methods

    (2018)
  • Y.W. Teh et al.

    Hierarchical Dirichlet processes

    J. Am. Stat. Assoc.

    (2006)
  • Cited by (8)

    View all citing articles on Scopus

    Ziyang Song is currently a M.Sc. student at the Concordia Institute for Information Systems Engineering (CIISE), Concordia University, Montreal, QC, Canada. His research interests include machine learning, probabilistic graphical model and Bayesian inference.

    Samr Ali received her M.Sc. degree in Electrical and Computer Engineering from Abu Dhabi University in 2017. She is currently a Ph.D. candidate in the Department of Electrical and Computer Engineering, Concordia University, Montreal, QC, Canada. She is also the recipient of the prestigious FRQNT award in 2019. Her current research interests include machine learning, pattern recognition, data mining, and computer vision.

    Nizar Bouguila received the degree in engineering from the University of Tunis, in 2000, and the M.Sc. and Ph.D. degrees from Sherbrooke University, in 2002 and 2006, respectively, all in computer science. He is currently a Professor with the Concordia Institute for Information Systems Engineering (CIISE), Concordia University, Montreal, QC, Canada. His research interests include image processing, machine learning, 3D graphics, computer vision, and pattern recognition.

    Wentao Fan received the M.Sc. and Ph.D. degrees in electrical and computer engineering from Concordia University, Montreal, QC, Canada, in 2009 and 2014, respectively. He is currently an Associate Professor with the Department of Computer Science and Technology, Huaqiao University, Xiamen, China. His research interests include machine learning, computer vision, and pattern recognition.

    View full text