Bayesian mixtures of common factor analyzers: Model, variational inference, and applications☆
Introduction
Factor analysis is a statistical scheme used to describe variability among high-dimensional observed data in terms of potentially low-dimensional latent variables called factors [1]. Factor analysis is commonly used for explaining the observed data and dimension reduction in the domains of information science, behavioral science, social science, marketing, product management, and other applied sciences.
Though factor analysis can be used for representing observations in a low-dimensional latent space, the effectiveness of this statistical technique is limited by its global linearity. Combining local factor analysis in the form of a finite mixture, the so-called mixtures of factor analyzers (MFA) [2] is yielded. The MFA can approximate global nonlinear latent space of the high-dimensional observed variables by finite mixture, further widening the applicability of factor analysis [3]. The model parameters of the MFA can be statistically estimated based on the maximum likelihood criterion. In [2], [4], the expectation–maximization (EM) algorithm [5] and the alternating expectation conditional maximization algorithm for the MFA were proposed for finding the maximum likelihood solutions of the parameters, respectively. In [6], a fast expectation conditional maximization algorithm was presented. As the closed-form expressions in all conditional maximization steps can be obtained explicitly, the convergence rate of this algorithm is faster than the algorithms in [2], [4]. In [7], a direct procedure based on the maximum likelihood criterion was developed. Moreover, in [8], model parameter estimation for the MFA was performed from the Bayesian perspective.
In some applications, when the dimension of observed data is high and/or the number of mixture components is not small, the parameters in the MFA may be still not manageable. In other words, the MFA has to fit more parameters from the limited observed data in these cases, leading to performance deterioration. To tackle this problem, a new model, named mixtures of common factor analyzers (MCFA) [9], [10], was proposed. The differences between the MFA and the MCFA are obvious. In the MFA, each mixing component has its own factor loading matrix while the distributions of factors are unique for all the components, which is the standard normal. On the contrary, in the MCFA, the factor loading matrix is common for all the components, while each mixing component has its own distribution of factors. Compared with the MFA, the MCFA has two main properties [10]. First, when the dimension of observed data is high and/or the number of mixing components is not small, the number of parameters in the MCFA does not increase fast, providing a preferable performance, at the expense of more distributional restrictions on the observed data. Second, since the latent factors in different components are allowed to have different means and covariance matrices, the estimated factors corresponding to the observed data in the MCFA can be used to portray the latter in a low-dimensional space, thus enabling us to perform clustering in this low-dimensional space. This function cannot be realized by the MFA. For more information with the comparison of these two approaches, refer to [10].
In the MCFA, model parameter estimation is based on the maximum likelihood criterion and performed by the EM algorithm. Though this kind of parameter estimation algorithm has also been widely used in other statistical models, there are several limitations. First, as the parameter estimation algorithm of the MCFA is only associated with the likelihood, it inevitably suffers from the presence of singularities in the likelihood function [11]. Second, as this maximum likelihood estimation algorithm only considers fitting performance for observed data, it is prone to obtain relatively complex model structure, leading to overfitting. In real applications, optimal model structure and parameters should be obtained by considering the tradeoff between fitting performance and model complexity.
To address the issues listed above, we consider the MCFA from a Bayesian perspective in this paper. To this end, prior distributions of the parameters in the MCFA are set at first. Then the Bayes' rule is adopted to obtain the corresponding posterior distributions. This new model is named as Bayesian mixtures of common factor analyzers (BMCFA). It is known that under the Bayesian framework, the most difficult task of inference is to process integrals of multivariate distributions over entire parameter space when calculating posteriors. As these integrals are intractable in most cases, several methods were proposed to approximate them. Variational inference method [12], [13], [14], originating from the mean field theory, is one of the most representative approaches and has been successfully adopted in several models and applications [15], [16], [17], [18], [19]. In this paper, we derive a variational inference algorithm for the BMCFA model, which is deterministic and can obtain computationally tractable expressions of the posteriors, manifested as the iteratively hyperparameters update process. The BMCFA approach (BMCFA approach=BMCFA model+variational inference algorithm) can tackle the issues in the MCFA listed above. Moreover, when the BMCFA approach is used for clustering high-dimensional observed data, it can automatically determine the appropriate number of clusters.
The rest of this paper is organized as follows. In Section 2, a brief overview of the MFA and the MCFA is provided. In Section 3, the BMCFA model is proposed. In Section 4, a variational inference algorithm for the BMCFA model is derived. In Section 5, the proposed BMCFA model and the associated variational inference algorithm are utilized in clustering a high-dimensional synthetic data, the wine data, and the gene expression data. Finally, conclusions are drawn in Section 6.
Section snippets
Mixtures of factor analyzers (MFA)
Let the observed data set be . In the MFA, it assumes that each p-dimensional data vector is generated aswhere I is the number of mixing components. The corresponding q-dimensional factor is independent of the i.i.d. , where is a p×p diagonal matrix. The parameter is the mean of the ith analyzer and (p×q) is the linear transformation known as the factor loading matrix. The so-called mixing proportions
Bayesian mixtures of common factor analyzers (BMCFA)
Let us now consider the problem of treating the MCFA in the Bayesian framework. As the BMCFA belongs to a mixture model, an I-dimensional binary latent stochastic variable is introduced, which is associated with the observed data . In , an element zni is one or zero according as belongs to or does not belong to the ith component of the mixture. The whole latent variable set is represented as .
According to (2), the marginal distribution over Z is specified in terms of the
Variational inference algorithm for the BMCFA
After the BMCFA is established, the next task is to obtain posterior distributions of the stochastic variables in Fig. 1, which is realized by the inference process. For the inference in the Bayesian framework, we need to obtain the expressions of the log-likelihood, . Unfortunately, there are multiple interacting stochastic variables in , resulting in intractable integral. To overcome this calculation difficulty, we use a variational approximation method to explore a
Experimental results
In this section, we apply the proposed BMCFA approach for clustering a synthetic data and two real data sets, and experimentally evaluate the performance of this approach.
Conclusion
In this paper, we considered the mixtures of common factor analyzers from the Bayesian perspective and proposed the BMCFA. In the BMCFA model, parameters are treated as stochastic variables with corresponding distributions. In order to calculate the intractable integrals in the parameter space, a variational inference algorithm was presented for approximating the posterior distributions of these stochastic variables. The proposed BMCFA model and the associated variational inference algorithm
References (30)
- et al.
Modelling high-dimensional data by mixtures of factor analyzers
Computational Statistics and Data Analysis
(2003) - et al.
Maximum likelihood estimation of mixtures of factor analyzers
Computational Statistics and Data Analysis
(2011) - et al.
Singularities in mixture models and upper bounds of stochastic complexity
Neural Networks
(2003) - et al.
The infinite Student's t-mixture for robust modeling
Signal Processing
(2012) Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling
Cancer Cell
(2002)- et al.
Finite Mixture Models
(2000) - Z. Ghahramani, G.E. Hinton, The EM Algorithm for Mixtures of Factor Analyzers, Technical Report CRG-TR-96-1, University...
- et al.
Learning low-dimensional signal models
IEEE Signal Processing Magazine
(2011) - et al.
Maximum likelihood from incomplete data via EM algorithm
Journal of the Royal Statistical SocietySeries B
(1977) - et al.
Fast ML estimation for the mixture of factor analyzers via an ECM algorithm
IEEE Transactions on Neural Networks
(2008)
Mixtures of factor analyzers with common factor loadings: applications to the clustering and visualization of high-dimensional data
IEEE Transactions on Pattern Analysis and Machine Intelligence
A introduction to variational methods for graphical models
Machine Learning
Cited by (10)
MRF model-based joint interrupted SAR imaging and coherent change detection via variational Bayesian inference
2018, Signal ProcessingCitation Excerpt :More specifically, we use a Markov random fields (MRF) prior [22–25] to encode both the spatial sparsity and correlation among changes. Moreover, to surmount the difficulty of the calculation of posterior, the mean-field VBEM method [26–29] is utilized to jointly estimate the MRF parameters and the hidden variables. Furthermore, unlike the traditional CCD techniques which have to to form images preliminarily, the proposed joint scheme obtains interrupted SAR imaging and change detection results simultaneously, by using the adaptive iterative Bayesian inference process.
Mixtures of common t-factor analyzers for modeling high-dimensional data with missing values
2015, Computational Statistics and Data AnalysisCitation Excerpt :Viroli (2010) discussed the difference between MFA and FMA models and proposed a general combination of the two models, called as mixtures of factor mixture analyzers (MFMA). More recently, Wei and Li (2013) proposed the Bayesian MCFA model and provided a variational inference algorithm for approximating the posterior distributions of model parameters that are treated as stochastic variables. Missing values usually occur in real-world data for diverse reasons, for example, equipment malfunction, subjects withdraw from a study, data entered incorrectly, image corrupted or damaged due to dust or scratches on the slide.
Clustering Analysis of Classified Performance Evaluation of Higher Education in Shanghai Based on Topsis Model
2023, Sustainability (Switzerland)Variational Bayesian analysis for two-part latent variable model
2023, Computational StatisticsA Bayesian Fisher-EM algorithm for discriminative Gaussian subspace clustering
2021, Statistics and Computing
- ☆
This work was supported by the National Natural Science Foundation of China (Grant nos. 61171153, 61101045 and 61201326), the Foundation for the Author of National Excellent Doctoral Dissertation of PR China, the Scientific Research Foundation for the Returned Overseas Chinese Scholars, the Zhejiang Provincial Natural Science Foundation of China (Grant no. LR12F01001), Natural Science Fund for Higher Education of Jiangsu Province (Grant no. 12KJB510021), Scientific Research Fund of Zhejiang Provincial Education Department (Grant no. Y201017301), and the Scientific Research Foundation of NUPT (Grant no. NY211039).