Skip to main content
Log in

Bayesian sparse covariance decomposition with a graphical structure

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

We consider the problem of estimating covariance matrices of a particular structure that is a summation of a low-rank component and a sparse component. This is a general covariance structure encountered in multiple statistical models including factor analysis and random effects models, where the low-rank component relates to the correlations among variables coming from the latent factors or random effects and the sparse component displays the correlations of the remaining residuals. We propose a Bayesian method for estimating the covariance matrices of such structures by representing the covariance model in the form of a factor model with an unknown number of latent factors. We introduce binary indicators for factor selection and rank estimation for the low-rank component, combined with a Bayesian lasso method for the estimation of the sparse component. Simulation studies show that our method can recover the rank as well as the sparsity of the two respective components. We further extend our method to a latent-factor Markov graphical model, with a focus on the sparse conditional graphical model of the residuals as well as selecting the number of factors. We show through simulations that our Bayesian model can successfully recover both the number of latent factors and the Markov graphical model of the residuals.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Aguilar, O., West, M.: Bayesian dynamic factor models and variance matrix discounting for portfolio allocation. J. Bus. Econ. Stat. 18, 338–357 (2000)

    Google Scholar 

  • Armstrong, H., Carter, C., Wong, K., Kohn, R.: Bayesian covariance matrix estimation using a mixture of decomposable graphical models. Stat. Comput. 19(3), 303–316 (2009)

    Article  MathSciNet  Google Scholar 

  • Atay-Kayis, A., Massam, H.: The marginal likelihood for decomposable and non-decomposable graphical Gaussian models. Biometrika 92, 317–335 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  • Baladandayuthapani, V., Talluri, R., Ji, Y., Coombes, K.R., Hennessy, B.T., Davies, M.A., Mallick, B.K.: Bayesian sparse graphical models for classification with application to protein expression data. Ann. Appl. Stat. 8(3), 1443–1468 (2014)

  • Bhattacharya, A., Dunson, D.: Sparse Bayesian infinite factor models. Biometrika 98, 291–306 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  • Bickel, P.J., Levina, E.: Covariance regularization by thresholding. Ann. Stat. 36, 2577–2604 (2008a)

    Article  MATH  MathSciNet  Google Scholar 

  • Bickel, P.J., Levina, E.: Regularized estimation of large covariance matrices. Ann. Stat. 36, 199–227 (2008b)

    Article  MATH  MathSciNet  Google Scholar 

  • Bien, J., Tibshirani, R.J.: Sparse estimation of a covariance matrix. Biometrika 98, 807–820 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  • Bonato, V., Baladandayuthapani, V., Broom, B.M., Sulman, E.P., Aldape, K.D., Do, K.: Bayesian ensemble methods for survival prediction in gene expression data. Bioinformatics 27, 359–367 (2010)

    Article  Google Scholar 

  • Brooks, S., Giudici, P., Roberts, G.: Efficient construction of reversible jump Markov chain Monte Carlo proposal distributions. J. R. Stat. Soc. 65, 39 (2003)

    Article  MathSciNet  Google Scholar 

  • Cai, T., Liu, W.: Adaptive thresholding for sparse covariance matrix estimation. J. Am. Stat. Assoc. 106, 672–684 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  • Candès, E.J., Li, X., Ma, Y., Wright, J.: Robust principal component analysis? J. ACM, 58, Art. 11, 37 (2011)

  • Carvalho, C.M., Chang, J., Lucas, J.E., Nevins, J., Wang, Q., West, M.: High-dimensional sparse factor modeling: applications in gene expression genomics. J. Am. Stat. Assoc. 103, 1438–1456 (2008)

    Article  MATH  MathSciNet  Google Scholar 

  • Chandrasekaran, V., Sanghavi, S., Parrilo, P.A., Willsky, A.S.: Rank-sparsity incoherence for matrix decomposition. SIAM J. Optim. 21, 572–576 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  • Chandrasekaran, V., Parrilo, P.A., Willsky, A.S.: Latent variable graphical model selection via convex optimization. Ann. Stat. 40(4), 1935–1967 (2012)

    Article  MATH  MathSciNet  Google Scholar 

  • Chiu, T.Y.M., Leonard, T., Tsui, K.W.: The matrix-logarithmic covariance model. J. Am. Stat. Assoc. 91, 198–210 (1996)

    Article  MATH  MathSciNet  Google Scholar 

  • Dawid, A.P.: Conditional independence in statistical theory (with Discussion). J. R. Stat. Soc. 41, 1–31 (1979)

    MATH  MathSciNet  Google Scholar 

  • Dawid, A.P., Lauritzen, S.L.: Hyper Markov laws in the statistical analysis of decomposable graphical models. Ann. Stat. 21(3), 1272–1317 (1993)

    Article  MATH  MathSciNet  Google Scholar 

  • Deng, X., Tsui, K.W.: Penalized covariance matrix estimation using a matrix-logarithm transformation. J. Comput. Gr. Stat. 22, 494–512 (2013)

    Article  MathSciNet  Google Scholar 

  • Fabrigar, L.R., Wegener, D.T., MacCallum, R.C., Strahan, E.J.: Evaluating the use of exploratory factor analysis in psychological research. Psychol. Methods 3, 272–299 (1999)

    Article  Google Scholar 

  • Fan, J., Fan, Y., Lv, J.: High dimensional covariance matrix estimation using a factor model. J. Econom. 147, 186–197 (2008)

    Article  MathSciNet  Google Scholar 

  • Friedman, J., Hastie, T., Tibshirani, R.: Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9(3), 432–441 (2008)

    Article  MATH  Google Scholar 

  • Gelfand, A.E., Sahu, S.: Identifiability, Improper priors, and Gibbs sampling for generalized linear models. J. Am. Stat. Assoc. 94, 247–253 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  • Geweke, J.F., Zhou, G.: Measuring the pricing error of the arbitrage pricing theory. Rev. Financ. Stud. 9, 557–587 (1996)

    Article  Google Scholar 

  • Ghosh, J., Dunson, D.: Bayesian model selection in factor analytic models. In: Dunson, D. (ed.) Random Effect and Latent Variable Model Selection. Wiley, New York (2008)

    Google Scholar 

  • Giudici, P.: Bayesian inference for graphical factor analysis models. Psychometrika 66(4), 577–592 (2001)

    Article  MATH  MathSciNet  Google Scholar 

  • Giudici, P., Green, P.J.: Decomposable graphical Gaussian model determination. Biometrika 86, 785–801 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  • Glorfeld, L.W.: An improvement on Horn’s parallel analysis methodology for selecting the correct number of factors to retain. Educ. Psychol. Meas. 55, 377–393 (1995)

    Article  Google Scholar 

  • Grzebyk, M., Wild, P., Chouanière, D.: On identification of multi-factor models with correlated residuals. Biometrika 91(1), 141–151 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  • Hoff, P.: Model averaging and dimension selection for the singular value decomposition. J. Am. Stat. Assoc. 102, 674–685 (2007)

  • Hoff, P., Niu, X.: A covariance regression model. Stat. Sin. 22, 729–753 (2012)

    MATH  MathSciNet  Google Scholar 

  • Huang, J., Liu, N., Pourahmadi, M., Liu, L.: Covariance matrix selection and estimation via penalised normal likelihood. Biometrika 93, 85–98 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  • Johnstone, I.M., Lu, A.Y.: On consistency and sparsity for principal components analysis in high dimensions. J. Am. Stat. Assoc. 104, 682–693 (2009)

    Article  MathSciNet  Google Scholar 

  • Knowles, D., Ghahramani, Z.: Nonparametric Bayesian sparse factor models with application to gene expression modeling. Ann. Appl. Stat. 5, 1534–1552 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  • Lauritzen, S.L.: Graphical Models. Oxford University Press, Oxford (1996)

    Google Scholar 

  • Ledoit, O., Wolf, M.: A well-conditioned estimator for large-dimensional covariance matrices. J. Multivar. Anal. 88(2), 365–411 (2004)

  • Lee, S., Song, X.: Bayesian selection on the number of factors in a factor analysis model. Behaviormetrika 29, 2339 (2002)

    Article  MathSciNet  Google Scholar 

  • Leonard, T., Hsu, J.S.J.: Bayesian inference for a covariance matrix. Ann. Stat. 20, 1669–1696 (1992)

  • Levina, E., Rothman, A.J., Zhu, J.: Sparse estimation of large covariance matrices via a nested lasso penalty. Ann. Appl. Stat. 1, 245–263 (2008)

  • Lindley, D.: Bayesian Statistics: A Review. SIAM, Philedelphia, PA (1971)

    MATH  Google Scholar 

  • Liu, R.Z., Graham, K., Glubrecht, D.D., Germain, D.R., et al.: Association of FABP5 expression with poor survival in triple-negative breast cancer: implication for retinoic acid therapy. Am. J. Pathol. 178(3), 997–1008 (2011)

    Article  Google Scholar 

  • Liu, W., Luo, X.: High-dimensional sparse precision matrix estimation via sparse column inverse operator. Technical report. arXiv:1203.3896 (2012)

  • Lopes, H.F., Carvalho, C.M.: Factor stochastic volatility with time varying loadings and Markov switching regimes. J. Stat. Plan. Inference 137, 3082–3091 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  • Lopes, H.F., West, M.: Bayesian model assessment in factor analysis. Stat. Sin. 14, 41–67 (2004)

    MATH  MathSciNet  Google Scholar 

  • Luo, X.: High dimensional low rank and sparse covariance matrix estimation via convex minimization. arXiv:1111.1133v1 (2013)

  • Pati, D., Bhattacharya, A., Pillai, N.S., Dunson, D.: Posterior contraction in sparse Bayesian factor models for massive covariance matrices. Ann. Stat. 42(3), 1102–1130 (2014)

    Article  MATH  MathSciNet  Google Scholar 

  • Poirier, D.J.: Revisiting beliefs in non-identified models. Technical report, University of Toronto, (1998)

  • Roverato, A.: Hyper-inverse Wishart distribution for non-decomposable graphs and its application to Bayesian inference for Gaussian graphical models. Scand. J. Stat. 29, 391–411 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  • Rowe, D.B.: Multivar. Bayesian Stat. Chapman & Hall, London (2003)

    Google Scholar 

  • Stranger, B.E., Forrest, M.S., Clark, A.G., Minichiello, M.J., Deutsch, S., et al.: Genome-wide associations of gene expression variation in humans. PLoS Genet. 1(6), e78 (2005)

    Article  Google Scholar 

  • Stanghellini, E.: Identification of a single-factor model using graphical Gaussian rules. Biometrika 84(1), 241–244 (1997)

    Article  MATH  Google Scholar 

  • Wang, H.: Bayesian graphical lasso models and efficient posterior computation. Bayesian Anal. 7, 867–886 (2012)

    Article  MathSciNet  Google Scholar 

  • Wong, F., Carter, C., Kohn, R.: Efficient estimation of covariance selection models. Biometrika 90, 809–830 (2003)

    Article  MathSciNet  Google Scholar 

  • Wu, W.B., Pourahmadi, M.: Nonparametric estimation of large covariance matrices of longitudinal data. Biometrika 90, 831–844 (2003)

    Article  MathSciNet  Google Scholar 

  • Wu, W.B., Pourahmadi, M.: Banding sample auto covariance matrices of stationary processes. Stat. Sin. 19, 1755 (2010)

    MathSciNet  Google Scholar 

  • Yuan, M., Lin, Y.: Model selection and estimation in the Gaussian graphical model. Biometrika 94(1), 19–35 (2007)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lin Zhang.

Appendices

Appendix 1: Derivation of the conditional densities for the elements in \(S\)

1.1 The conditional densities for the diagonal elements in S

For convenience, let \(\varLambda =(y-Mf)(y-Mf)^T\).

For each diagonal element \(S_{jj}\) in S, the full conditional density is

$$\begin{aligned} p(S_{jj} | \cdot ) \propto&|S|^{-\frac{n}{2}} e^{- (tr(S^{-1} \varLambda ) + \lambda S_{jj})/2 } \mathcal {I} \{S \in \mathcal {S}^+ \}. \end{aligned}$$

Without loss of generality, suppose that \(j=p\). Let \(S = \left[ \begin{array}{cc} S_{-j,-j} &{} S_{-j,j} \\ S_{j,-j} &{} S_{jj} \end{array} \right] \). With the property of matrices, we have

$$\begin{aligned} |S|&= \det (S_{-j,-j}) \cdot \det (S_{jj}-S_{j,-j}S^{-1}_{-j,-j}S_{-j,j}),\\&\propto (S_{jj}-c), \text { where } c = S_{j,-j}S^{-1}_{-j,-j}S_{-j,j}. \end{aligned}$$
$$\begin{aligned} S^{-1}&= \left[ \begin{array}{cc} S^{-1}_{-j,-j} &{} 0 \\ 0 &{} 0 \end{array} \right] + \left[ \begin{array}{c} S^{-1}_{-j,-j}S_{-j,j} \\ -1 \end{array} \right] \cdot (S_{jj}-c)^{-1} \\&\times \left[ \begin{array}{cc} S_{j,-j}S^{-1}_{-j,-j}&-1 \end{array} \right] . \end{aligned}$$
$$\begin{aligned} tr(S^{-1}\varLambda )&= tr \left( \left[ \begin{array}{cc} S^{-1}_{-j,-j} &{} 0 \\ 0 &{} 0 \end{array} \right] \varLambda \right) + tr \left( \left[ \begin{array}{c} S^{-1}_{-j,-j}S_{-j,j} \\ -1 \end{array} \right] \right. \\&\left. \times (S_{jj}-c)^{-1} \left[ \begin{array}{cc} S_{j,-j}S^{-1}_{-j,-j}&-1 \end{array} \right] \varLambda \right) ,\\&= tr \left( \left[ \begin{array}{cc} S^{-1}_{-j,-j} &{} 0 \\ 0 &{} 0 \end{array} \right] \varLambda \right) + d(S_{jj}-c)^{-1} ,\\ \text { where } d&= \left[ \begin{array}{cc} S_{j,-j}S^{-1}_{-j}&-1 \end{array} \right] \varLambda \left[ \begin{array}{c} S^{-1}_{-j,-j}S_{-j,j} \\ -1 \end{array} \right] . \end{aligned}$$

Hence, we have

$$\begin{aligned} p(S_{jj} | \cdot ) \propto (S_{jj}-c)^{-\frac{n}{2}} e^{-\{ d(S_{jj}-c)^{-1} +\lambda S_{jj} \}/2 } \mathcal {I} \{S_{jj}>c\}. \end{aligned}$$

1.2 The conditional densities for the off-diagonal elements in S

The conditional density of the off-diagonal element of S, \(S_{jj^{\prime }}\), is

$$\begin{aligned} p(S_{jj^{\prime }}|\cdot ) \propto&\, |S|^{-\frac{n}{2}} e^{tr(S^{-1}\varLambda )/2} \Big [ (1-\rho _{jj^{\prime }}) \mathcal {I}\{S_{jj^{\prime }}=0\} \\&+ \rho _{jj^{\prime }} \frac{\lambda }{2} e^{-\lambda |S_{jj^{\prime }}|} \Big ] \mathcal {I} \{S \in \mathcal {S}^+ \}. \end{aligned}$$

Without loss of generality, suppose that \(j=p-1\) and \(j^{\prime }=p\). Let \(S = \left[ \begin{array}{cc} S_{-(jj^{\prime })} &{} S_{-(jj^{\prime }),jj^{\prime }} \\ S_{jj^{\prime },-(jj^{\prime })} &{} S_{jj^{\prime },jj^{\prime }} \end{array} \right] \), where \(S_{jj^{\prime },jj^{\prime }}=\left[ \begin{array}{cc} S_{jj} &{} S_{jj^{\prime }} \\ S_{jj^{\prime }} &{} S_{j^{\prime }j^{\prime }} \end{array} \right] \).

With the property of matrices, we have

$$\begin{aligned} |S|&\propto \det (S_{jj^{\prime },jj^{\prime }}-B), \end{aligned}$$
$$\begin{aligned} S^{-1} =&\left[ \begin{array}{cc} S_{-(jj^{\prime })}^{-1} &{} 0 \\ 0 &{} 0 \end{array} \right] + \left[ \begin{array}{c} S_{-(jj^{\prime })}^{-1}S_{-(jj^{\prime }),jj^{\prime }} \\ -I_2 \end{array} \right] \\&\times (S_{jj^{\prime }}-B)^{-1} \left[ \begin{array} {cc} S_{jj^{\prime },-(jj^{\prime })}S_{-(jj^{\prime })}^{-1}&-I_2 \end{array} \right] , \end{aligned}$$
$$\begin{aligned} tr(S^{-1}\varLambda ) =&\,tr\left( \left[ \begin{array}{cc} S_{-(jj^{\prime })}^{-1} &{} 0 \\ 0 &{} 0 \end{array} \right] S\right) \\&+ tr( (S_{jj^{\prime },jj^{\prime }}-B)^{-1}D), \end{aligned}$$

where

$$\begin{aligned} B=&S_{jj^{\prime },-(jj^{\prime })}S^{-1}_{-(jj^{\prime })}S_{-(jj^{\prime }),jj^{\prime }}, \\ D =&\left[ \begin{array} {cc} S_{jj^{\prime },-(jj^{\prime })}S_{-(jj^{\prime })}^{-1}&-I_2 \end{array} \right] \varLambda \left[ \begin{array}{c} S_{-(jj^{\prime })}^{-1}S_{-(jj^{\prime }),jj^{\prime }} \\ -I_2 \end{array} \right] . \end{aligned}$$

Hence, we have

$$\begin{aligned} p(S_{jj^{\prime }}|\cdot )&\propto |S_{jj^{\prime },jj^{\prime }}-B|^{-\frac{n}{2}} e^{-tr((S_{jj^{\prime }}-B)^{-1}D)/2 } \\&\times p(S_{ij}|\lambda ,\rho _{jj^{\prime }}) \mathcal {I}\{S \in \mathcal {S}^+\},\\&\propto \left\{ 1-\frac{(S_{jj^{\prime }}\!-\!B_{12})^2}{(S_{jj}\!-\!B_{11})(S_{j^{\prime }j^{\prime }} \!-\!B_{22})}\right\} ^{-\frac{n}{2}} \\&\times \exp \Big [ -\frac{1}{2} \Big \{(S_{j^{\prime }j^{\prime }}\!-\!B_{22})D_{11}+(S_{jj} \!-\!B_{11})D_{22} \\&-2(S_{jj^{\prime }}\!-\!B_{12})D_{12}) \Big \} \Big / \Big \{ (S_{jj}\!-\!B_{11}) (S_{j^{\prime }j^{\prime }}-B_{22}) \\&-(S_{jj^{\prime }}\!-\!B_{12})^2 \Big \} \Big ] \\&\times \Big [ (1-\rho _{jj^{\prime }}) \mathcal {I} \{S_{jj^{\prime }}=0\} \!+\! \rho _{jj^{\prime }} \frac{\lambda }{2} e^{-\lambda |S_{jj^{\prime }}|} \Big ] \\&\times \mathcal {I} \left\{ (S_{jj^{\prime }}-B_{12})^2<(S_{jj}\!-\!B_{11})(S_{j^{\prime }j^{\prime }}\!-\!B_{22}) \right\} . \end{aligned}$$

Appendix 2: Posterior inference for the Bayesian covariance decomposition model

In this section, we present the full conditional posterior distributions of the proposed Bayesian covariance decomposition model with the assumption of an sparse covariance graphical model of the residuals. We also present here a framework to carry out Markov chain Monte Carlo (MCMC) calculations. Note that the full conditional distributions of most parameters are available in a closed form, allowing for a straightforward Gibbs sampling algorithm, except for the off-diagonal elements in the sparse matrix \(S\). In this case, we employ an independent Metropolis–Hastings (MH) sampler within the Gibbs sampler to generate posterior samples of the off-diagonal elements in \(S\).

  1. 1.

    Sampling the factor loading matrix \(M\): Let \(\mathbf {M}_k=(m_{1k},\ldots ,m_{qk})^T\) be the \(k\)th column vector of \(M\), and \(M_{(-k)}\) be the matrix of \(M\) excluding the \(k\)th column. The full conditional distribution of \(\mathbf {M}_k\) for \(k=1,\ldots ,r\) is

    $$\begin{aligned} \mathbf {M}_k|y,M_{(-k)},f,S&\sim \mathcal {N}_q(\varvec{\mu }^M_k,\varSigma ^M_k), \end{aligned}$$

    where

    $$\begin{aligned} \varSigma ^M_k =&\,\Big \{ S^{-1}\left( \sum ^n_{i=1}f^2_{ki} \right) + qI_q \Big \}^{-1}, \\ \varvec{\mu }^M_k =&\, \varSigma ^M_k S^{-1}(y-M_{ (-k)}f_{(-k)\cdot })\mathbf {f}_{k\cdot }^T. \end{aligned}$$

    Hence, we can draw samples of each column of \(M\) from a multivariate Gaussian distribution.

  2. 2.

    Sampling the random factors \(f\): Let \(f\) be the \(r \times n\) matrix with \(f_{ki}\) to be the value of \(k\)th factor in \(i\)th replicate. Then \(\mathbf {f}_{k\cdot } = (f_{k1},\ldots ,,f_{kn})\) is the \(k\)th row vector of \(f\), and \(f_{(-k)\cdot }\) denotes the matrix of \(f\) excluding the \(k\)th row. Note that \(\mathbf {f}_{k\cdot }\) can be viewed as the unobserved values of the random factor \(k\) in the \(n\) replicates. The full conditional distribution of the transpose of \(\mathbf {f}_{k\cdot }\) for \(k=1,\ldots ,r\) is

    $$\begin{aligned}&\mathbf {f}^T_{k\cdot } | y,M, f_{(-k)\cdot }, S,z_k,\tau ^2_k \sim (1-z_k)\mathcal {N}_q(0,\tau _k^2 I_n)\\&\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad + z_k \mathcal {N}_n(\varvec{\mu }^f_k, \sigma ^f_k I_n), \end{aligned}$$

    where

    $$\begin{aligned} \sigma ^f_k =&\,\Big ( M^T_k S^{-1} M_k +\tau ^{-2}_k \Big ) ^{-1}, \\ \varvec{\mu }^{f}_k =&\,\sigma ^{f}_k \Big \{ M^T_kS^{-1}(y-M_{(-k)}f_{(k)\cdot }) \Big \}^T. \end{aligned}$$

    Hence we can draw samples of each row of \(f\) from a mixture of two multivariate Gaussian distributions.

  3. 3.

    Sampling the binary diagonal matrix \(Z\): The full conditional of each diagonal element of \(Z\), \(z_k\), is a Bernoulli distribution:

    $$\begin{aligned} z_k | y,M,f,S,\tau ^2_k,p_k \sim&Bernoulli(p^*_k), \end{aligned}$$

    where \(p^*_k = 1- (1-p_k)/\Big \{ 1-p_k+p_k\cdot \Big ( \frac{\sigma ^f_k}{\tau ^2_k} \Big )^{\frac{n}{2}} \times \exp \Big ( - \frac{(\varvec{\mu }^f_k)^T \varvec{\mu }^f_k}{2\sigma ^f_k} \Big ) \Big \}\).

  4. 4.

    Sampling the probabilities \(p_k\) and \(\pi \): The full conditional of \(p_k\) for \(k=1,\ldots ,r\) is

    $$\begin{aligned} p_k | z_k=1&\sim Beta ( a_p+1, b_p ), \\ p_k | z_k=0&\sim (1-\pi ^*) \mathcal {I}\{p_k=0\} + \pi ^*Beta(a_p, b_p+1), \end{aligned}$$

    where \(\pi ^* = \frac{\pi b_p}{a_p+b_p-\pi a_p}\). The full conditional of \(\pi \) is

    $$\begin{aligned} Beta \left( a_{\pi }+\sum _k \mathcal {I}\{p_k=0\}, b_{\pi }+ \sum _k \mathcal {I}\{p_k \ne 0 \} \right) . \end{aligned}$$
  5. 5.

    Sampling the positive diagonal matrix \(D_\tau \): The full conditional of each diagonal element of \(D_\tau \), \(\tau ^2_k\), for \(k=1,\ldots ,r\) is

    $$\begin{aligned} (1-z_k) \mathrm{IG}(a_\tau ,b_\tau ) + z_k \mathrm{IG} \left( a_\tau +\frac{n}{2}, b_\tau +\frac{\mathbf {f}_{k \cdot } \mathbf {f}^T_ {k\cdot }}{2} \right) . \end{aligned}$$
  6. 6.

    Sampling the diagonal elements \(S_{jj}\): The full posterior conditional density of the diagonal element \(S_{jj}\), for \(j=1,\ldots ,q\), is

    $$\begin{aligned} p(S_{jj} | \cdot ) \propto \, |S|^{-\frac{n}{2}} e^ {-\frac{tr(\varLambda S^{-1})}{2}-\frac{\lambda S_{jj} }{2} } \mathcal {I} \{S \in \mathcal {S}^+ \}, \end{aligned}$$

    where \(\varLambda =(y-Mf)(y-Mf)^T\). Without loss of generality, suppose that \(j=q\). By the properties of the matrix inverse and the matrix determinant, we have

    $$\begin{aligned} p(S_{jj} | \cdot ) \propto \, (S_{jj}-c)^{-\frac{n}{2}} e^{-\frac{d}{2}(S_{jj}-c)^{-1} -\frac{\lambda }{2} S_{jj} } \mathcal {I} \{S_{jj}>c\}, \end{aligned}$$

    where \(c = S_{j,-j} S^{-1}_{-j,-j} S_{-j,j}\), and \(d \!=\! [\begin{array}{lr} S_{j,-j}S^{-1}_{-j,-j}&\!-\!1 \end{array} ] \cdot \varLambda \cdot [\begin{array}{cc} S_{j,-j}S^{-1}_{-j,-j}&\!-\!1 \end{array} ]^T\). Note that the indicator function \(\mathcal {I} \{S_{jj}>c\}\) ensures the positive definiteness of \(S\) conditional on the other elements in \(S\). The derivation of the full conditional distributions for the elements of \(S\) is detailed in the Appendix. The above distribution is in a closed form. We now transform \(S_{jj}\) to \(\nu =S_{jj}-c\), then the conditional density of \(\nu \) is

    $$\begin{aligned} p(\nu |\cdot ) \propto \, \nu ^{-\frac{n}{2}} \exp \{-(d/\nu +\lambda \nu )/2\} \mathcal {I} \{\nu >0\}, \end{aligned}$$

    which is a generalized inverse Gaussian (GIG) distribution with parameters \((1-n/2,d,\lambda )\). Therefore, at each MCMC iteration, we can draw a sample of \(\nu \) from the GIG distribution, and obtain \(S_{jj}=\nu +c\).

  7. 7.

    is upper triangular. ThenSampling the off-diagonal elements \(S_{jj^{\prime }}\): The full posterior conditional density of the off-diagonal element \(S_{jj^{\prime }}\), \(j<j^{\prime }\) is

    $$\begin{aligned} p(S_{jj^{\prime }} | \cdot ) \propto \,&|S|^{-\frac{n}{2}} e^{-tr(\varLambda S^{-1})/2 } \Big [(1-\rho _{jj^{\prime }}) \mathcal {I}\{S_{jj^{\prime }}=0\} \\&+ \rho _{jj^{\prime }} \frac{\lambda }{2} e^{-\lambda |S_{jj^{\prime }}|} \Big ] \mathcal {I} \{S \in \mathcal {S}^+\} . \end{aligned}$$

    Without loss of generality, suppose that \(j=q-1\) and \(j^{\prime }=q\). By the properties of the matrix inverse and the matrix determinant, we have

    $$\begin{aligned} p(S_{jj^{\prime }}|\cdot ) \propto \,&(1\!-\!\frac{\nu ^2}{ab})^{\!-\!\frac{n}{2}} \exp \left\{ \!-\!\frac{bD_{11}\!+\!aD_{22}-2D_{12}\nu }{2(ab-\nu ^2)} \right\} \\&\times \Big [ (1-\rho _{jj^{\prime }})\mathcal {I} \{S_{jj^{\prime }}=0\} + \rho _{jj^{\prime }} \frac{\lambda }{2} e^{-\lambda |S_{jj^{\prime }}|} \Big ] \\&\times \mathcal {I} \{\nu ^2<ab\} \end{aligned}$$

    where

    $$\begin{aligned} \nu =&S_{jj^{\prime }}-B_{12}, \\ a=&S_{jj}-B_{11}, \\ b=&S_{j^{\prime }j^{\prime }}-B_{22}, \\ B=&S_{jj^{\prime }-(jj^{\prime })}S^{-1}_{-(jj^{\prime })}S_{-(jj^{\prime })jj^{\prime }} \\ D =&[S_{jj^{\prime } -(jj^{\prime })}S^{-1}_{-(jj^{\prime })}\text { } -I_2] \varLambda [S_{jj^{\prime } -(jj^{\prime })}S^{-1}_{-(jj^{\prime })}\text { } -I_2]^T. \end{aligned}$$

    We make variable transformation, and the conditional density of \(\nu \) is

    $$\begin{aligned} p(\nu |\cdot ) \propto&\left( 1-\rho _{jj^{\prime }}\right) g(-B_{12}) \mathcal {I} \{\nu =-B_{12}, \nu ^2<ab\} \\&+ \frac{\rho _{jj^{\prime }} \lambda }{2} g(\nu ), \end{aligned}$$

    where

    $$\begin{aligned} g(\nu ) =&\, (1-\frac{\nu ^2}{ab})^{-\frac{n}{2}} \exp \Big \{ -\frac{bD_{11}+aD_{22}-2D_{12}\nu }{2(ab-\nu ^2)} \\&-\lambda |\nu +B_{12}| \Big \} \mathcal {I} \{\nu ^2<ab\}. \end{aligned}$$

    Note that the positive definite constraint on \(S\) is ensured by the indicator \(\mathcal {I} \{\nu ^2<ab\}\). That is, the posterior distribution of \(\nu \) is only over the region \((-\sqrt{ab},\sqrt{ab})\). The continuous part of the conditional distribution of \(\nu \), \(g(\nu )\), cannot be sampled directly. Furthermore, \(g(\nu )\) is nonconcave and therefore the sampler may be trapped locally if we use the random-walk MH algorithm within the Gibbs sampling. Since \(g(\nu )\) has density only over \((-\sqrt{ab},\sqrt{ab})\) and is zero elsewhere, we construct a piecewise uniform proposal distribution that approximates \(g(\nu )\). We choose \(\kappa -1\) equally spaced grids between (\(-\sqrt{ab}\), \(\sqrt{ab})\), \(-\sqrt{ab}=\nu _0<\nu _1<\cdots <\nu _{\kappa }=\sqrt{ab}\), which divide the domain of \(\nu \) into \(\kappa \) intervals of width \(2\sqrt{ab}/\kappa \). The piecewise uniform distribution is as follows:

    $$\begin{aligned} g_a(\nu ) = \left\{ \begin{array}{ll} g\left( \frac{\nu _0 + \nu _1}{2}\right) &{} \quad \text{ if } \, \nu _0 < \nu \le \nu _1 \\ g\left( \frac{\nu _1 + \nu _2}{2}\right) &{} \quad \text{ if } \, \nu _1 < \nu \le \nu _2 \\ \cdots &{} \\ g\left( \frac{\nu _{\kappa -1} + \nu _\kappa }{2}\right) &{} \quad \text{ if } \nu _{\kappa -1} < \nu < \nu _\kappa . \end{array} \right. \end{aligned}$$

    The independent MH proposal distribution for generating \(\nu =S_{jj^{\prime }}-B_{12}\) is given by

    $$\begin{aligned} q(\nu |\cdot ) \propto&\, (1-\rho _{jj^{\prime }}) g(-B_{12}) \mathcal {I} \{\nu =-B_{12}, \nu ^2<ab \} \\&+ \frac{\rho _{jj^{\prime }} \lambda }{2} g_a(\nu ). \end{aligned}$$

    Samples of \(\nu \) can be generated by using an inverse-cumulative distribution function (CDF) method from the proposal distribution, and the proposal \(\nu ^*\) is accepted with the probability

    $$\begin{aligned} \alpha = \min \left\{ 1, \frac{p(\nu ^*)}{q(\nu ^*)} \Big / \frac{p(\nu ^c)}{q(\nu ^c)} \right\} , \end{aligned}$$

    where \(\nu ^c = S^c_{jj^{\prime }}-B_{12}\) denotes the current state of \(\nu \). The piecewise uniform proposal distribution avoids the local-trap problem and can be sampled easily using an inverse-CDF method. Furthermore, \(q(\nu | \cdot )\) approximates the distribution \(p(\nu |\cdot )\) more accurately with an increased number of grids. Based on our simulations, \(100\) grids are enough for a fast convergence of \(S_{jj^{\prime }}\).

  8. 8.

    Sampling the shrinkage parameter \(\lambda \): The full conditional of \(\lambda \) is

    $$\begin{aligned} \mathrm{Gamma}\left( a_\lambda +m, b_\lambda +\sum _{j<j^{\prime }}|S_{jj^{\prime }}|+\frac{1}{2}\sum _{j=1}^q S_{jj}\right) , \end{aligned}$$

    where \(m\) equals the size of the set \(\{(j,j^{\prime }): j \le j^{\prime }, S_{jj^{\prime }} \ne 0 \}\).

  9. 9.

    Sampling the selection parameters \(\rho _{jj^{\prime }}\): The full conditional of \(\rho _{jj^{\prime }}\) for \(j<j^{\prime }\) is

    $$\begin{aligned} \mathrm{Beta}(a_\rho +\mathcal {I} \{S_{jj^{\prime }} \ne 0\}, b_\rho +\mathcal {I} \{ S_{jj^{\prime }}=0 \}). \end{aligned}$$

Appendix 3: Posterior inference for the latent-factor conditional graphical model

We derive the full conditionals for all the parameters and perform posterior inference using a Gibbs sampling algorithm. Note that the prior specification for the parameters \(\{M, z_k, \tau _k\}\) in the graphical factor analysis models parallels that in the hierarchical model in formulas (9), so the full conditionals of these parameters for a Gibbs algorithm are similar to those in Sect. 1. In this section, we present the sampling algorithm of the parameter set \(\{C,\lambda ^C,\varvec{\rho }^C\}\) for the graphical model of the residuals.

1.1 Sampling \(\{C,\lambda ^C,\rho ^C\}\) for the graphical model of the residuals

For convenience, let \(\varLambda =(y-Mf)(y-Mf)^T\).

  1. 1.

    Sampling the diagonal elements of \(C\), \(C_{jj}\), for \(j=1,\ldots ,q\): The full conditional density of \(C_{jj}\) is

    $$\begin{aligned} p(C_{jj} | \cdot ) \propto&\,|C|^{\frac{n}{2}} e^{ -(\varLambda _{jj}C_{jj} + \lambda ^C C_{jj})/2 } \mathcal {I} \{C \in \mathcal {C}^+ \}. \end{aligned}$$

    Without loss of generality, suppose that \(j=q\). Let \(C=R^{\prime }R\) be the Cholesky decomposition of \(C\) where the matrix \(R=(R_{jj^{\prime }})\) is upper triangular. Then

    $$\begin{aligned} p(C_{jj} | \cdot ) \propto&\, (C_{jj}-c)^{\frac{n}{2}} e^{ -\left( \frac{\varLambda _{jj}+\lambda ^C}{2}\right) C_{jj} } \mathcal {I} \{C_{jj}>c \}, \end{aligned}$$

    where \(c=\sum ^{q-1}_{j=1} R_{j,q}^2\) does not depend on \(C_{jj}\), and the indicator function \(\mathcal {I} \{C_{jj}>c \}\) ensures that \(C\) is positive definite. Let \(\nu =(C_{jj}-c)\), then the conditional distribution of \(\nu \) is

    $$\begin{aligned} p(\nu | \cdot ) \propto \,&\,\nu ^{\frac{n}{2}} e^{ -\left( \frac{\varLambda _{jj}+\lambda ^C}{2}\right) \nu } \mathcal {I} \{\nu >0 \}, \end{aligned}$$

    which follows \(\mathrm{Gamma}(\frac{n}{2}+1,\frac{\varLambda _{jj}+\lambda ^C}{2})\). Hence, we can draw samples of \(\nu \) from the Gamma distribution first, and obtain \(C_{jj}=\nu +c\).

  2. 2.

    Sampling the off-diagonal elements of \(C\), \(C_{jj^{\prime }}\), for \(j<j^{\prime }\): The complete conditional density of \(C_{jj^{\prime }}\) is

    $$\begin{aligned} p(C_{jj^{\prime }} | \cdot ) \propto \,&|C|^{\frac{n}{2}} e^{-\varLambda _{jj^{\prime }}C_{jj^{\prime }} } p(C_{jj^{\prime }} |\rho ^C_{jj^{\prime }},\lambda ^C) \\&\times \mathcal {I} \{C \in \mathcal {C}^+ \}. \end{aligned}$$

    Without loss of generality, suppose that \(j=q-1\) and \(j^{\prime }=q\). Then using Lemma 2 of Wong et al. (2003),

    $$\begin{aligned} p(C_{jj^{\prime }} | \cdot ) \propto \,&\Big \{ 1-\frac{(C_{jj^{\prime }}-a)^2}{cb^2} \Big \}^{\frac{n}{2}} e^{ -\varLambda _{jj^{\prime }}C_{jj^{\prime }}} \\&\times p(C_{jj^{\prime }} |\rho ^C_{jj^{\prime }},\lambda ^C) \mathcal {I} \{|C_{jj^{\prime }}-a|<b\sqrt{c} \} \\ \propto&\Big [ (1-\rho ^C_{jj^{\prime }}) \Big \{ 1-\frac{(C_{jj^{\prime }}-a)^2}{cb^2} \Big \}^{\frac{n}{2}} \mathcal {I} \{C_{jj^{\prime }}=0 \} \\&+ \rho ^C_{jj^{\prime }} \Big \{ 1-\frac{(C_{jj^{\prime }}-a)^2}{cb^2} \Big \}^{\frac{n}{2}} e^{-\varLambda _{jj^{\prime }}C_{jj^{\prime }}-\lambda ^C |C_{jj^{\prime }}| }\\&\times \mathcal {I} \{C_{jj^{\prime }} \ne 0 \} \Big ] \cdot \mathcal {I} \{|C_{jj^{\prime }}-a|<b\sqrt{c} \}, \end{aligned}$$

    where \(a=\sum ^{q-2}_{j=1} R_{j,q-1}R_{j,q}\), \(b=R_{q-1,q-1}\), and \(c=R_{q-1,q}^2+R_{q,q}^2\) do not depend on \(C_{jj^{\prime }}\). Transform \(C_{jj^{\prime }}\) to \(\nu = (C_{jj^{\prime }}-a)/(b\sqrt{c})\), and let \(\kappa =-a/(b\sqrt{c})\). The full conditional density of \(\nu \) is

    $$\begin{aligned} p(\nu |\cdot )&\propto \left( 1-\rho ^C_{jj^{\prime }}\right) (1-\kappa ^2)^{\frac{n}{2}} \mathcal {I} \{\nu =\kappa ,|\nu | < 1\} \\&\quad \quad \,\, + \frac{\rho ^C_{jj^{\prime }} \lambda ^C}{2} g(\nu ), \end{aligned}$$

    where \(g(\nu )=(1-\nu ^2)^{n/2} \exp \{ -\varLambda _{jj^{\prime }}(\nu b\sqrt{c} +a) -\lambda ^C|\nu b\sqrt{c} +a| \} \mathcal {I} \{|\nu |<1\}\). The positive definite constraint on \(C\) is ensured by the indicator function \( \mathcal {I} \{|\nu |<1\}\). The continuous part of the conditional distribution of \(\nu \), \(g(\nu )\), cannot be sampled directly. Since \(g(\nu )\) has density only over \((-1,1)\) and is zero elsewhere, we can use an independent MH algorithm as the sampling algorithm for \(S_{jj^{\prime }}\) in Sect. 1. The details of the independent MH algorithm are explained in Sect. 1. Briefly, we choose \(\kappa -1\) equally spaced grids between (-1,1), \(-1=\nu _0<\nu _1<\cdots <\nu _{\kappa }=1\), which divide the domain of \(\nu \) into \(\kappa \) intervals of width \(2/\kappa \), and construct a piecewise uniform distribution \(g_a(\nu )\) approximating \(g(\nu )\). The independent MH proposal for generating \(\nu \) is then given by

    $$\begin{aligned} q(\nu |\cdot ) \propto&\left( 1-\rho ^C_{jj^{\prime }}\right) (1-\kappa ^2)^{\frac{n}{2}} \mathcal {I} \{\nu =\kappa ,|\nu | < 1\} \\&+ \frac{\rho ^C_{jj^{\prime }} \lambda ^C}{2} g_a(\nu ). \end{aligned}$$

    Samples of \(\nu \) can be generated using an inverse-CDF method from the proposal distribution, and the proposal \(\nu ^*\) is accepted with the probability

    $$\begin{aligned} \alpha = \min \left\{ 1, \frac{p(\nu ^*)}{q(\nu ^*)} \Big / \frac{p(\nu ^c)}{q(\nu ^c)} \right\} , \end{aligned}$$

    where \(\nu ^c = (C^c_{jj^{\prime }}-a)/(b\sqrt{c})\) denotes the current state of \(\nu \). Samples of \(C_{jj^{\prime }}\) are obtained as \(C_{jj^{\prime }}=\nu b\sqrt{c} +a\).

  3. 3.

    Sampling the shrinkage parameter \(\lambda ^C\): The full conditional of \(\lambda \) is

    $$\begin{aligned} \mathrm{Gamma}\left( a_\lambda +m, b_\lambda +\sum _{j<j^{\prime }}|C_{jj^{\prime }}|+\frac{1}{2}\sum _{j=1}^q C_{jj}\right) \!, \end{aligned}$$

    where \(m\) is equal to the size of the set \(\{(j,j^{\prime }): j \le j^{\prime }, C_{jj^{\prime }} \ne 0 \}\).

  4. 4.

    Sampling the selection parameters \(\rho ^C_{jj^{\prime }}\): The full conditional of \(\rho ^C_{jj^{\prime }}\) for \(j<j^{\prime }\) is

    $$\begin{aligned} \;\,\mathrm{Beta}(a_\rho +\mathcal {I} \{C_{jj^{\prime }} \ne 0\}, b_\rho +\mathcal {I} \{ C_{jj^{\prime }}=0 \}). \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, L., Sarkar, A. & Mallick, B.K. Bayesian sparse covariance decomposition with a graphical structure. Stat Comput 26, 493–510 (2016). https://doi.org/10.1007/s11222-014-9540-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-014-9540-7

Keywords

Navigation