Elsevier

Signal Processing

Volume 93, Issue 11, November 2013, Pages 2894-2905
Signal Processing

Bayesian mixtures of common factor analyzers: Model, variational inference, and applications

https://doi.org/10.1016/j.sigpro.2013.04.007Get rights and content

Highlights

  • We propose the Bayesian mixtures of common factor analyzer (BMCFA) model.

  • We derive an efficient variational Bayesian inference algorithm for the BMCFA model.

  • The BMCFA approach (model+variational inference) is applied to the problems of clustering high-dimensional data sets.

  • This approach can automatically determine the appropriate number of clusters.

Abstract

Recently, a representative approach, named mixtures of common factor analyzers (MCFA), was proposed for clustering high-dimensional observed data. Existing model-parameter estimation methods for this approach is based on the maximum likelihood criterion and performed by the expectation–maximization algorithm. In this paper, we consider the MCFA from a Bayesian perspective and propose the Bayesian mixtures of common factor analyzers (BMCFA) model, which replaces the deterministic model parameters in the MCFA by stochastic variables. Then we present a variational inference algorithm for this BMCFA model. Moreover, the proposed BMCFA model and the associated variational inference algorithm are used for clustering the high-dimensional synthetic data, the wine data from the UCI machine learning repository and the gene expression data. Experimental results illustrate that the BMCFA has good generalization capacities, automatically determining the appropriate number of clusters from high-dimensional observed data.

Introduction

Factor analysis is a statistical scheme used to describe variability among high-dimensional observed data in terms of potentially low-dimensional latent variables called factors [1]. Factor analysis is commonly used for explaining the observed data and dimension reduction in the domains of information science, behavioral science, social science, marketing, product management, and other applied sciences.

Though factor analysis can be used for representing observations in a low-dimensional latent space, the effectiveness of this statistical technique is limited by its global linearity. Combining local factor analysis in the form of a finite mixture, the so-called mixtures of factor analyzers (MFA) [2] is yielded. The MFA can approximate global nonlinear latent space of the high-dimensional observed variables by finite mixture, further widening the applicability of factor analysis [3]. The model parameters of the MFA can be statistically estimated based on the maximum likelihood criterion. In [2], [4], the expectation–maximization (EM) algorithm [5] and the alternating expectation conditional maximization algorithm for the MFA were proposed for finding the maximum likelihood solutions of the parameters, respectively. In [6], a fast expectation conditional maximization algorithm was presented. As the closed-form expressions in all conditional maximization steps can be obtained explicitly, the convergence rate of this algorithm is faster than the algorithms in [2], [4]. In [7], a direct procedure based on the maximum likelihood criterion was developed. Moreover, in [8], model parameter estimation for the MFA was performed from the Bayesian perspective.

In some applications, when the dimension of observed data is high and/or the number of mixture components is not small, the parameters in the MFA may be still not manageable. In other words, the MFA has to fit more parameters from the limited observed data in these cases, leading to performance deterioration. To tackle this problem, a new model, named mixtures of common factor analyzers (MCFA) [9], [10], was proposed. The differences between the MFA and the MCFA are obvious. In the MFA, each mixing component has its own factor loading matrix while the distributions of factors are unique for all the components, which is the standard normal. On the contrary, in the MCFA, the factor loading matrix is common for all the components, while each mixing component has its own distribution of factors. Compared with the MFA, the MCFA has two main properties [10]. First, when the dimension of observed data is high and/or the number of mixing components is not small, the number of parameters in the MCFA does not increase fast, providing a preferable performance, at the expense of more distributional restrictions on the observed data. Second, since the latent factors in different components are allowed to have different means and covariance matrices, the estimated factors corresponding to the observed data in the MCFA can be used to portray the latter in a low-dimensional space, thus enabling us to perform clustering in this low-dimensional space. This function cannot be realized by the MFA. For more information with the comparison of these two approaches, refer to [10].

In the MCFA, model parameter estimation is based on the maximum likelihood criterion and performed by the EM algorithm. Though this kind of parameter estimation algorithm has also been widely used in other statistical models, there are several limitations. First, as the parameter estimation algorithm of the MCFA is only associated with the likelihood, it inevitably suffers from the presence of singularities in the likelihood function [11]. Second, as this maximum likelihood estimation algorithm only considers fitting performance for observed data, it is prone to obtain relatively complex model structure, leading to overfitting. In real applications, optimal model structure and parameters should be obtained by considering the tradeoff between fitting performance and model complexity.

To address the issues listed above, we consider the MCFA from a Bayesian perspective in this paper. To this end, prior distributions of the parameters in the MCFA are set at first. Then the Bayes' rule is adopted to obtain the corresponding posterior distributions. This new model is named as Bayesian mixtures of common factor analyzers (BMCFA). It is known that under the Bayesian framework, the most difficult task of inference is to process integrals of multivariate distributions over entire parameter space when calculating posteriors. As these integrals are intractable in most cases, several methods were proposed to approximate them. Variational inference method [12], [13], [14], originating from the mean field theory, is one of the most representative approaches and has been successfully adopted in several models and applications [15], [16], [17], [18], [19]. In this paper, we derive a variational inference algorithm for the BMCFA model, which is deterministic and can obtain computationally tractable expressions of the posteriors, manifested as the iteratively hyperparameters update process. The BMCFA approach (BMCFA approach=BMCFA model+variational inference algorithm) can tackle the issues in the MCFA listed above. Moreover, when the BMCFA approach is used for clustering high-dimensional observed data, it can automatically determine the appropriate number of clusters.

The rest of this paper is organized as follows. In Section 2, a brief overview of the MFA and the MCFA is provided. In Section 3, the BMCFA model is proposed. In Section 4, a variational inference algorithm for the BMCFA model is derived. In Section 5, the proposed BMCFA model and the associated variational inference algorithm are utilized in clustering a high-dimensional synthetic data, the wine data, and the gene expression data. Finally, conclusions are drawn in Section 6.

Section snippets

Mixtures of factor analyzers (MFA)

Let the observed data set be Y={y1,,yN}. In the MFA, it assumes that each p-dimensional data vector yn is generated asyn=μi+Aiun+eniwithprob.πi(i=1,,I),where I is the number of mixing components. The corresponding q-dimensional (q<p) factor unN(0,Iq) is independent of the i.i.d. eniN(0,Di), where Di is a p×p diagonal matrix. The parameter μi is the mean of the ith analyzer and Ai (p×q) is the linear transformation known as the factor loading matrix. The so-called mixing proportions πi(i=1,,

Bayesian mixtures of common factor analyzers (BMCFA)

Let us now consider the problem of treating the MCFA in the Bayesian framework. As the BMCFA belongs to a mixture model, an I-dimensional binary latent stochastic variable zn is introduced, which is associated with the observed data yn. In zn, an element zni is one or zero according as yn belongs to or does not belong to the ith component of the mixture. The whole latent variable set is represented as Z={z1,,zN}.

According to (2), the marginal distribution over Z is specified in terms of the

Variational inference algorithm for the BMCFA

After the BMCFA is established, the next task is to obtain posterior distributions of the stochastic variables in Fig. 1, which is realized by the inference process. For the inference in the Bayesian framework, we need to obtain the expressions of the log-likelihood, logp(Y)=logp(Y,Θ)dΘ. Unfortunately, there are multiple interacting stochastic variables in p(Y,Θ), resulting in intractable integral. To overcome this calculation difficulty, we use a variational approximation method to explore a

Experimental results

In this section, we apply the proposed BMCFA approach for clustering a synthetic data and two real data sets, and experimentally evaluate the performance of this approach.

Conclusion

In this paper, we considered the mixtures of common factor analyzers from the Bayesian perspective and proposed the BMCFA. In the BMCFA model, parameters are treated as stochastic variables with corresponding distributions. In order to calculate the intractable integrals in the parameter space, a variational inference algorithm was presented for approximating the posterior distributions of these stochastic variables. The proposed BMCFA model and the associated variational inference algorithm

References (30)

  • Z. Ghahramani, M.J. Beal, Variational inference for Bayesian mixtures of factor analyzers, in: Proceedings of the...
  • J. Baek, G.J. McLachlan, Mixtures of Factor Analyzers with Common Factor Loadings for the Clustering and Visualisation...
  • J. Baek et al.

    Mixtures of factor analyzers with common factor loadings: applications to the clustering and visualization of high-dimensional data

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2010)
  • M.I. Jordan

    A introduction to variational methods for graphical models

    Machine Learning

    (1999)
  • H. Attias, A variational Bayesian framework for graphical models, in: Proceedings of Annual Conference on Neural...
  • Cited by (10)

    • MRF model-based joint interrupted SAR imaging and coherent change detection via variational Bayesian inference

      2018, Signal Processing
      Citation Excerpt :

      More specifically, we use a Markov random fields (MRF) prior [22–25] to encode both the spatial sparsity and correlation among changes. Moreover, to surmount the difficulty of the calculation of posterior, the mean-field VBEM method [26–29] is utilized to jointly estimate the MRF parameters and the hidden variables. Furthermore, unlike the traditional CCD techniques which have to to form images preliminarily, the proposed joint scheme obtains interrupted SAR imaging and change detection results simultaneously, by using the adaptive iterative Bayesian inference process.

    • Mixtures of common t-factor analyzers for modeling high-dimensional data with missing values

      2015, Computational Statistics and Data Analysis
      Citation Excerpt :

      Viroli (2010) discussed the difference between MFA and FMA models and proposed a general combination of the two models, called as mixtures of factor mixture analyzers (MFMA). More recently, Wei and Li (2013) proposed the Bayesian MCFA model and provided a variational inference algorithm for approximating the posterior distributions of model parameters that are treated as stochastic variables. Missing values usually occur in real-world data for diverse reasons, for example, equipment malfunction, subjects withdraw from a study, data entered incorrectly, image corrupted or damaged due to dust or scratches on the slide.

    View all citing articles on Scopus

    This work was supported by the National Natural Science Foundation of China (Grant nos. 61171153, 61101045 and 61201326), the Foundation for the Author of National Excellent Doctoral Dissertation of PR China, the Scientific Research Foundation for the Returned Overseas Chinese Scholars, the Zhejiang Provincial Natural Science Foundation of China (Grant no. LR12F01001), Natural Science Fund for Higher Education of Jiangsu Province (Grant no. 12KJB510021), Scientific Research Fund of Zhejiang Provincial Education Department (Grant no. Y201017301), and the Scientific Research Foundation of NUPT (Grant no. NY211039).

    View full text