Bayesian sparse covariance decomposition with a graphical structure

Zhang, Lin; Sarkar, Abhra; Mallick, Bani K.

doi:10.1007/s11222-014-9540-7

Bayesian sparse covariance decomposition with a graphical structure

Published: 09 December 2014

Volume 26, pages 493–510, (2016)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

Lin Zhang¹,
Abhra Sarkar² &
Bani K. Mallick²

724 Accesses
1 Citation
Explore all metrics

Abstract

We consider the problem of estimating covariance matrices of a particular structure that is a summation of a low-rank component and a sparse component. This is a general covariance structure encountered in multiple statistical models including factor analysis and random effects models, where the low-rank component relates to the correlations among variables coming from the latent factors or random effects and the sparse component displays the correlations of the remaining residuals. We propose a Bayesian method for estimating the covariance matrices of such structures by representing the covariance model in the form of a factor model with an unknown number of latent factors. We introduce binary indicators for factor selection and rank estimation for the low-rank component, combined with a Bayesian lasso method for the estimation of the sparse component. Simulation studies show that our method can recover the rank as well as the sparsity of the two respective components. We further extend our method to a latent-factor Markov graphical model, with a focus on the sparse conditional graphical model of the residuals as well as selecting the number of factors. We show through simulations that our Bayesian model can successfully recover both the number of latent factors and the Markov graphical model of the residuals.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sparse estimation via nonconcave penalized likelihood in factor analysis model

Article 28 May 2014

Variance Components Estimation in Mixed Linear Model—The Sub-diagonalization Method

A Variational Bayes Approach to Factor Analysis

References

Aguilar, O., West, M.: Bayesian dynamic factor models and variance matrix discounting for portfolio allocation. J. Bus. Econ. Stat. 18, 338–357 (2000)
Google Scholar
Armstrong, H., Carter, C., Wong, K., Kohn, R.: Bayesian covariance matrix estimation using a mixture of decomposable graphical models. Stat. Comput. 19(3), 303–316 (2009)
Article MathSciNet Google Scholar
Atay-Kayis, A., Massam, H.: The marginal likelihood for decomposable and non-decomposable graphical Gaussian models. Biometrika 92, 317–335 (2005)
Article MATH MathSciNet Google Scholar
Baladandayuthapani, V., Talluri, R., Ji, Y., Coombes, K.R., Hennessy, B.T., Davies, M.A., Mallick, B.K.: Bayesian sparse graphical models for classification with application to protein expression data. Ann. Appl. Stat. 8(3), 1443–1468 (2014)
Bhattacharya, A., Dunson, D.: Sparse Bayesian infinite factor models. Biometrika 98, 291–306 (2011)
Article MATH MathSciNet Google Scholar
Bickel, P.J., Levina, E.: Covariance regularization by thresholding. Ann. Stat. 36, 2577–2604 (2008a)
Article MATH MathSciNet Google Scholar
Bickel, P.J., Levina, E.: Regularized estimation of large covariance matrices. Ann. Stat. 36, 199–227 (2008b)
Article MATH MathSciNet Google Scholar
Bien, J., Tibshirani, R.J.: Sparse estimation of a covariance matrix. Biometrika 98, 807–820 (2011)
Article MATH MathSciNet Google Scholar
Bonato, V., Baladandayuthapani, V., Broom, B.M., Sulman, E.P., Aldape, K.D., Do, K.: Bayesian ensemble methods for survival prediction in gene expression data. Bioinformatics 27, 359–367 (2010)
Article Google Scholar
Brooks, S., Giudici, P., Roberts, G.: Efficient construction of reversible jump Markov chain Monte Carlo proposal distributions. J. R. Stat. Soc. 65, 39 (2003)
Article MathSciNet Google Scholar
Cai, T., Liu, W.: Adaptive thresholding for sparse covariance matrix estimation. J. Am. Stat. Assoc. 106, 672–684 (2011)
Article MATH MathSciNet Google Scholar
Candès, E.J., Li, X., Ma, Y., Wright, J.: Robust principal component analysis? J. ACM, 58, Art. 11, 37 (2011)
Carvalho, C.M., Chang, J., Lucas, J.E., Nevins, J., Wang, Q., West, M.: High-dimensional sparse factor modeling: applications in gene expression genomics. J. Am. Stat. Assoc. 103, 1438–1456 (2008)
Article MATH MathSciNet Google Scholar
Chandrasekaran, V., Sanghavi, S., Parrilo, P.A., Willsky, A.S.: Rank-sparsity incoherence for matrix decomposition. SIAM J. Optim. 21, 572–576 (2011)
Article MATH MathSciNet Google Scholar
Chandrasekaran, V., Parrilo, P.A., Willsky, A.S.: Latent variable graphical model selection via convex optimization. Ann. Stat. 40(4), 1935–1967 (2012)
Article MATH MathSciNet Google Scholar
Chiu, T.Y.M., Leonard, T., Tsui, K.W.: The matrix-logarithmic covariance model. J. Am. Stat. Assoc. 91, 198–210 (1996)
Article MATH MathSciNet Google Scholar
Dawid, A.P.: Conditional independence in statistical theory (with Discussion). J. R. Stat. Soc. 41, 1–31 (1979)
MATH MathSciNet Google Scholar
Dawid, A.P., Lauritzen, S.L.: Hyper Markov laws in the statistical analysis of decomposable graphical models. Ann. Stat. 21(3), 1272–1317 (1993)
Article MATH MathSciNet Google Scholar
Deng, X., Tsui, K.W.: Penalized covariance matrix estimation using a matrix-logarithm transformation. J. Comput. Gr. Stat. 22, 494–512 (2013)
Article MathSciNet Google Scholar
Fabrigar, L.R., Wegener, D.T., MacCallum, R.C., Strahan, E.J.: Evaluating the use of exploratory factor analysis in psychological research. Psychol. Methods 3, 272–299 (1999)
Article Google Scholar
Fan, J., Fan, Y., Lv, J.: High dimensional covariance matrix estimation using a factor model. J. Econom. 147, 186–197 (2008)
Article MathSciNet Google Scholar
Friedman, J., Hastie, T., Tibshirani, R.: Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9(3), 432–441 (2008)
Article MATH Google Scholar
Gelfand, A.E., Sahu, S.: Identifiability, Improper priors, and Gibbs sampling for generalized linear models. J. Am. Stat. Assoc. 94, 247–253 (1999)
Article MATH MathSciNet Google Scholar
Geweke, J.F., Zhou, G.: Measuring the pricing error of the arbitrage pricing theory. Rev. Financ. Stud. 9, 557–587 (1996)
Article Google Scholar
Ghosh, J., Dunson, D.: Bayesian model selection in factor analytic models. In: Dunson, D. (ed.) Random Effect and Latent Variable Model Selection. Wiley, New York (2008)
Google Scholar
Giudici, P.: Bayesian inference for graphical factor analysis models. Psychometrika 66(4), 577–592 (2001)
Article MATH MathSciNet Google Scholar
Giudici, P., Green, P.J.: Decomposable graphical Gaussian model determination. Biometrika 86, 785–801 (1999)
Article MATH MathSciNet Google Scholar
Glorfeld, L.W.: An improvement on Horn’s parallel analysis methodology for selecting the correct number of factors to retain. Educ. Psychol. Meas. 55, 377–393 (1995)
Article Google Scholar
Grzebyk, M., Wild, P., Chouanière, D.: On identification of multi-factor models with correlated residuals. Biometrika 91(1), 141–151 (2004)
Article MATH MathSciNet Google Scholar
Hoff, P.: Model averaging and dimension selection for the singular value decomposition. J. Am. Stat. Assoc. 102, 674–685 (2007)
Hoff, P., Niu, X.: A covariance regression model. Stat. Sin. 22, 729–753 (2012)
MATH MathSciNet Google Scholar
Huang, J., Liu, N., Pourahmadi, M., Liu, L.: Covariance matrix selection and estimation via penalised normal likelihood. Biometrika 93, 85–98 (2006)
Article MATH MathSciNet Google Scholar
Johnstone, I.M., Lu, A.Y.: On consistency and sparsity for principal components analysis in high dimensions. J. Am. Stat. Assoc. 104, 682–693 (2009)
Article MathSciNet Google Scholar
Knowles, D., Ghahramani, Z.: Nonparametric Bayesian sparse factor models with application to gene expression modeling. Ann. Appl. Stat. 5, 1534–1552 (2011)
Article MATH MathSciNet Google Scholar
Lauritzen, S.L.: Graphical Models. Oxford University Press, Oxford (1996)
Google Scholar
Ledoit, O., Wolf, M.: A well-conditioned estimator for large-dimensional covariance matrices. J. Multivar. Anal. 88(2), 365–411 (2004)
Lee, S., Song, X.: Bayesian selection on the number of factors in a factor analysis model. Behaviormetrika 29, 2339 (2002)
Article MathSciNet Google Scholar
Leonard, T., Hsu, J.S.J.: Bayesian inference for a covariance matrix. Ann. Stat. 20, 1669–1696 (1992)
Levina, E., Rothman, A.J., Zhu, J.: Sparse estimation of large covariance matrices via a nested lasso penalty. Ann. Appl. Stat. 1, 245–263 (2008)
Lindley, D.: Bayesian Statistics: A Review. SIAM, Philedelphia, PA (1971)
MATH Google Scholar
Liu, R.Z., Graham, K., Glubrecht, D.D., Germain, D.R., et al.: Association of FABP5 expression with poor survival in triple-negative breast cancer: implication for retinoic acid therapy. Am. J. Pathol. 178(3), 997–1008 (2011)
Article Google Scholar
Liu, W., Luo, X.: High-dimensional sparse precision matrix estimation via sparse column inverse operator. Technical report. arXiv:1203.3896 (2012)
Lopes, H.F., Carvalho, C.M.: Factor stochastic volatility with time varying loadings and Markov switching regimes. J. Stat. Plan. Inference 137, 3082–3091 (2007)
Article MATH MathSciNet Google Scholar
Lopes, H.F., West, M.: Bayesian model assessment in factor analysis. Stat. Sin. 14, 41–67 (2004)
MATH MathSciNet Google Scholar
Luo, X.: High dimensional low rank and sparse covariance matrix estimation via convex minimization. arXiv:1111.1133v1 (2013)
Pati, D., Bhattacharya, A., Pillai, N.S., Dunson, D.: Posterior contraction in sparse Bayesian factor models for massive covariance matrices. Ann. Stat. 42(3), 1102–1130 (2014)
Article MATH MathSciNet Google Scholar
Poirier, D.J.: Revisiting beliefs in non-identified models. Technical report, University of Toronto, (1998)
Roverato, A.: Hyper-inverse Wishart distribution for non-decomposable graphs and its application to Bayesian inference for Gaussian graphical models. Scand. J. Stat. 29, 391–411 (2002)
Article MATH MathSciNet Google Scholar
Rowe, D.B.: Multivar. Bayesian Stat. Chapman & Hall, London (2003)
Google Scholar
Stranger, B.E., Forrest, M.S., Clark, A.G., Minichiello, M.J., Deutsch, S., et al.: Genome-wide associations of gene expression variation in humans. PLoS Genet. 1(6), e78 (2005)
Article Google Scholar
Stanghellini, E.: Identification of a single-factor model using graphical Gaussian rules. Biometrika 84(1), 241–244 (1997)
Article MATH Google Scholar
Wang, H.: Bayesian graphical lasso models and efficient posterior computation. Bayesian Anal. 7, 867–886 (2012)
Article MathSciNet Google Scholar
Wong, F., Carter, C., Kohn, R.: Efficient estimation of covariance selection models. Biometrika 90, 809–830 (2003)
Article MathSciNet Google Scholar
Wu, W.B., Pourahmadi, M.: Nonparametric estimation of large covariance matrices of longitudinal data. Biometrika 90, 831–844 (2003)
Article MathSciNet Google Scholar
Wu, W.B., Pourahmadi, M.: Banding sample auto covariance matrices of stationary processes. Stat. Sin. 19, 1755 (2010)
MathSciNet Google Scholar
Yuan, M., Lin, Y.: Model selection and estimation in the Gaussian graphical model. Biometrika 94(1), 19–35 (2007)
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
Lin Zhang
Department of Statistics, Texas A&M University, College Station, TX, USA
Abhra Sarkar & Bani K. Mallick

Authors

Lin Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Abhra Sarkar
View author publications
You can also search for this author in PubMed Google Scholar
Bani K. Mallick
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lin Zhang.

Appendices

Appendix 1: Derivation of the conditional densities for the elements in $S$

1.1 The conditional densities for the diagonal elements in S

For convenience, let $\varLambda =(y-Mf)(y-Mf)^T$.

For each diagonal element $S_{jj}$ in S, the full conditional density is

$$\begin{aligned} p(S_{jj} | \cdot ) \propto&|S|^{-\frac{n}{2}} e^{- (tr(S^{-1} \varLambda ) + \lambda S_{jj})/2 } \mathcal {I} \{S \in \mathcal {S}^+ \}. \end{aligned}$$

Without loss of generality, suppose that $j=p$. Let $S = \left[ \begin{array}{cc} S_{-j,-j} &{} S_{-j,j} \\ S_{j,-j} &{} S_{jj} \end{array} \right] $. With the property of matrices, we have

$$\begin{aligned} |S|&= \det (S_{-j,-j}) \cdot \det (S_{jj}-S_{j,-j}S^{-1}_{-j,-j}S_{-j,j}),\\&\propto (S_{jj}-c), \text { where } c = S_{j,-j}S^{-1}_{-j,-j}S_{-j,j}. \end{aligned}$$

$$\begin{aligned} S^{-1}&= \left[ \begin{array}{cc} S^{-1}_{-j,-j} &{} 0 \\ 0 &{} 0 \end{array} \right] + \left[ \begin{array}{c} S^{-1}_{-j,-j}S_{-j,j} \\ -1 \end{array} \right] \cdot (S_{jj}-c)^{-1} \\&\times \left[ \begin{array}{cc} S_{j,-j}S^{-1}_{-j,-j}&-1 \end{array} \right] . \end{aligned}$$

$$\begin{aligned} tr(S^{-1}\varLambda )&= tr \left( \left[ \begin{array}{cc} S^{-1}_{-j,-j} &{} 0 \\ 0 &{} 0 \end{array} \right] \varLambda \right) + tr \left( \left[ \begin{array}{c} S^{-1}_{-j,-j}S_{-j,j} \\ -1 \end{array} \right] \right. \\&\left. \times (S_{jj}-c)^{-1} \left[ \begin{array}{cc} S_{j,-j}S^{-1}_{-j,-j}&-1 \end{array} \right] \varLambda \right) ,\\&= tr \left( \left[ \begin{array}{cc} S^{-1}_{-j,-j} &{} 0 \\ 0 &{} 0 \end{array} \right] \varLambda \right) + d(S_{jj}-c)^{-1} ,\\ \text { where } d&= \left[ \begin{array}{cc} S_{j,-j}S^{-1}_{-j}&-1 \end{array} \right] \varLambda \left[ \begin{array}{c} S^{-1}_{-j,-j}S_{-j,j} \\ -1 \end{array} \right] . \end{aligned}$$

Hence, we have

$$\begin{aligned} p(S_{jj} | \cdot ) \propto (S_{jj}-c)^{-\frac{n}{2}} e^{-\{ d(S_{jj}-c)^{-1} +\lambda S_{jj} \}/2 } \mathcal {I} \{S_{jj}>c\}. \end{aligned}$$

1.2 The conditional densities for the off-diagonal elements in S

The conditional density of the off-diagonal element of S, $S_{jj^{\prime }}$, is

$$\begin{aligned} p(S_{jj^{\prime }}|\cdot ) \propto&\, |S|^{-\frac{n}{2}} e^{tr(S^{-1}\varLambda )/2} \Big [ (1-\rho _{jj^{\prime }}) \mathcal {I}\{S_{jj^{\prime }}=0\} \\&+ \rho _{jj^{\prime }} \frac{\lambda }{2} e^{-\lambda |S_{jj^{\prime }}|} \Big ] \mathcal {I} \{S \in \mathcal {S}^+ \}. \end{aligned}$$

Without loss of generality, suppose that $j=p-1$ and $j^{\prime }=p$. Let $S = \left[ \begin{array}{cc} S_{-(jj^{\prime })} &{} S_{-(jj^{\prime }),jj^{\prime }} \\ S_{jj^{\prime },-(jj^{\prime })} &{} S_{jj^{\prime },jj^{\prime }} \end{array} \right] $, where $S_{jj^{\prime },jj^{\prime }}=\left[ \begin{array}{cc} S_{jj} &{} S_{jj^{\prime }} \\ S_{jj^{\prime }} &{} S_{j^{\prime }j^{\prime }} \end{array} \right] $.

With the property of matrices, we have

$$\begin{aligned} |S|&\propto \det (S_{jj^{\prime },jj^{\prime }}-B), \end{aligned}$$

$$\begin{aligned} S^{-1} =&\left[ \begin{array}{cc} S_{-(jj^{\prime })}^{-1} &{} 0 \\ 0 &{} 0 \end{array} \right] + \left[ \begin{array}{c} S_{-(jj^{\prime })}^{-1}S_{-(jj^{\prime }),jj^{\prime }} \\ -I_2 \end{array} \right] \\&\times (S_{jj^{\prime }}-B)^{-1} \left[ \begin{array} {cc} S_{jj^{\prime },-(jj^{\prime })}S_{-(jj^{\prime })}^{-1}&-I_2 \end{array} \right] , \end{aligned}$$

$$\begin{aligned} tr(S^{-1}\varLambda ) =&\,tr\left( \left[ \begin{array}{cc} S_{-(jj^{\prime })}^{-1} &{} 0 \\ 0 &{} 0 \end{array} \right] S\right) \\&+ tr( (S_{jj^{\prime },jj^{\prime }}-B)^{-1}D), \end{aligned}$$

where

$$\begin{aligned} B=&S_{jj^{\prime },-(jj^{\prime })}S^{-1}_{-(jj^{\prime })}S_{-(jj^{\prime }),jj^{\prime }}, \\ D =&\left[ \begin{array} {cc} S_{jj^{\prime },-(jj^{\prime })}S_{-(jj^{\prime })}^{-1}&-I_2 \end{array} \right] \varLambda \left[ \begin{array}{c} S_{-(jj^{\prime })}^{-1}S_{-(jj^{\prime }),jj^{\prime }} \\ -I_2 \end{array} \right] . \end{aligned}$$

Hence, we have

$$\begin{aligned} p(S_{jj^{\prime }}|\cdot )&\propto |S_{jj^{\prime },jj^{\prime }}-B|^{-\frac{n}{2}} e^{-tr((S_{jj^{\prime }}-B)^{-1}D)/2 } \\&\times p(S_{ij}|\lambda ,\rho _{jj^{\prime }}) \mathcal {I}\{S \in \mathcal {S}^+\},\\&\propto \left\{ 1-\frac{(S_{jj^{\prime }}\!-\!B_{12})^2}{(S_{jj}\!-\!B_{11})(S_{j^{\prime }j^{\prime }} \!-\!B_{22})}\right\} ^{-\frac{n}{2}} \\&\times \exp \Big [ -\frac{1}{2} \Big \{(S_{j^{\prime }j^{\prime }}\!-\!B_{22})D_{11}+(S_{jj} \!-\!B_{11})D_{22} \\&-2(S_{jj^{\prime }}\!-\!B_{12})D_{12}) \Big \} \Big / \Big \{ (S_{jj}\!-\!B_{11}) (S_{j^{\prime }j^{\prime }}-B_{22}) \\&-(S_{jj^{\prime }}\!-\!B_{12})^2 \Big \} \Big ] \\&\times \Big [ (1-\rho _{jj^{\prime }}) \mathcal {I} \{S_{jj^{\prime }}=0\} \!+\! \rho _{jj^{\prime }} \frac{\lambda }{2} e^{-\lambda |S_{jj^{\prime }}|} \Big ] \\&\times \mathcal {I} \left\{ (S_{jj^{\prime }}-B_{12})^2<(S_{jj}\!-\!B_{11})(S_{j^{\prime }j^{\prime }}\!-\!B_{22}) \right\} . \end{aligned}$$

Appendix 2: Posterior inference for the Bayesian covariance decomposition model

In this section, we present the full conditional posterior distributions of the proposed Bayesian covariance decomposition model with the assumption of an sparse covariance graphical model of the residuals. We also present here a framework to carry out Markov chain Monte Carlo (MCMC) calculations. Note that the full conditional distributions of most parameters are available in a closed form, allowing for a straightforward Gibbs sampling algorithm, except for the off-diagonal elements in the sparse matrix $S$. In this case, we employ an independent Metropolis–Hastings (MH) sampler within the Gibbs sampler to generate posterior samples of the off-diagonal elements in $S$.

1.
Sampling the factor loading matrix $M$: Let $\mathbf {M}_k=(m_{1k},\ldots ,m_{qk})^T$ be the $k$th column vector of $M$, and $M_{(-k)}$ be the matrix of $M$ excluding the $k$th column. The full conditional distribution of $\mathbf {M}_k$ for $k=1,\ldots ,r$ is
$$\begin{aligned} \mathbf {M}_k|y,M_{(-k)},f,S&\sim \mathcal {N}_q(\varvec{\mu }^M_k,\varSigma ^M_k), \end{aligned}$$
where
$$\begin{aligned} \varSigma ^M_k =&\,\Big \{ S^{-1}\left( \sum ^n_{i=1}f^2_{ki} \right) + qI_q \Big \}^{-1}, \\ \varvec{\mu }^M_k =&\, \varSigma ^M_k S^{-1}(y-M_{ (-k)}f_{(-k)\cdot })\mathbf {f}_{k\cdot }^T. \end{aligned}$$
Hence, we can draw samples of each column of $M$ from a multivariate Gaussian distribution.
2.
Sampling the random factors $f$: Let $f$ be the $r \times n$ matrix with $f_{ki}$ to be the value of $k$th factor in $i$th replicate. Then $\mathbf {f}_{k\cdot } = (f_{k1},\ldots ,,f_{kn})$ is the $k$th row vector of $f$, and $f_{(-k)\cdot }$ denotes the matrix of $f$ excluding the $k$th row. Note that $\mathbf {f}_{k\cdot }$ can be viewed as the unobserved values of the random factor $k$ in the $n$ replicates. The full conditional distribution of the transpose of $\mathbf {f}_{k\cdot }$ for $k=1,\ldots ,r$ is
$$\begin{aligned}&\mathbf {f}^T_{k\cdot } | y,M, f_{(-k)\cdot }, S,z_k,\tau ^2_k \sim (1-z_k)\mathcal {N}_q(0,\tau _k^2 I_n)\\&\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad + z_k \mathcal {N}_n(\varvec{\mu }^f_k, \sigma ^f_k I_n), \end{aligned}$$
where
$$\begin{aligned} \sigma ^f_k =&\,\Big ( M^T_k S^{-1} M_k +\tau ^{-2}_k \Big ) ^{-1}, \\ \varvec{\mu }^{f}_k =&\,\sigma ^{f}_k \Big \{ M^T_kS^{-1}(y-M_{(-k)}f_{(k)\cdot }) \Big \}^T. \end{aligned}$$
Hence we can draw samples of each row of $f$ from a mixture of two multivariate Gaussian distributions.
3.
Sampling the binary diagonal matrix $Z$: The full conditional of each diagonal element of $Z$, $z_k$, is a Bernoulli distribution:
$$\begin{aligned} z_k | y,M,f,S,\tau ^2_k,p_k \sim&Bernoulli(p^*_k), \end{aligned}$$
where $p^*_k = 1- (1-p_k)/\Big \{ 1-p_k+p_k\cdot \Big ( \frac{\sigma ^f_k}{\tau ^2_k} \Big )^{\frac{n}{2}} \times \exp \Big ( - \frac{(\varvec{\mu }^f_k)^T \varvec{\mu }^f_k}{2\sigma ^f_k} \Big ) \Big \}$.
4.
Sampling the probabilities $p_k$ and $\pi $: The full conditional of $p_k$ for $k=1,\ldots ,r$ is
$$\begin{aligned} p_k | z_k=1&\sim Beta ( a_p+1, b_p ), \\ p_k | z_k=0&\sim (1-\pi ^*) \mathcal {I}\{p_k=0\} + \pi ^*Beta(a_p, b_p+1), \end{aligned}$$
where $\pi ^* = \frac{\pi b_p}{a_p+b_p-\pi a_p}$. The full conditional of $\pi $ is
$$\begin{aligned} Beta \left( a_{\pi }+\sum _k \mathcal {I}\{p_k=0\}, b_{\pi }+ \sum _k \mathcal {I}\{p_k \ne 0 \} \right) . \end{aligned}$$
5.
Sampling the positive diagonal matrix $D_\tau $: The full conditional of each diagonal element of $D_\tau $, $\tau ^2_k$, for $k=1,\ldots ,r$ is
$$\begin{aligned} (1-z_k) \mathrm{IG}(a_\tau ,b_\tau ) + z_k \mathrm{IG} \left( a_\tau +\frac{n}{2}, b_\tau +\frac{\mathbf {f}_{k \cdot } \mathbf {f}^T_ {k\cdot }}{2} \right) . \end{aligned}$$
6.
Sampling the diagonal elements $S_{jj}$: The full posterior conditional density of the diagonal element $S_{jj}$, for $j=1,\ldots ,q$, is
$$\begin{aligned} p(S_{jj} | \cdot ) \propto \, |S|^{-\frac{n}{2}} e^ {-\frac{tr(\varLambda S^{-1})}{2}-\frac{\lambda S_{jj} }{2} } \mathcal {I} \{S \in \mathcal {S}^+ \}, \end{aligned}$$
where $\varLambda =(y-Mf)(y-Mf)^T$. Without loss of generality, suppose that $j=q$. By the properties of the matrix inverse and the matrix determinant, we have
$$\begin{aligned} p(S_{jj} | \cdot ) \propto \, (S_{jj}-c)^{-\frac{n}{2}} e^{-\frac{d}{2}(S_{jj}-c)^{-1} -\frac{\lambda }{2} S_{jj} } \mathcal {I} \{S_{jj}>c\}, \end{aligned}$$
where $c = S_{j,-j} S^{-1}_{-j,-j} S_{-j,j}$, and $d \!=\! [\begin{array}{lr} S_{j,-j}S^{-1}_{-j,-j}&\!-\!1 \end{array} ] \cdot \varLambda \cdot [\begin{array}{cc} S_{j,-j}S^{-1}_{-j,-j}&\!-\!1 \end{array} ]^T$. Note that the indicator function $\mathcal {I} \{S_{jj}>c\}$ ensures the positive definiteness of $S$ conditional on the other elements in $S$. The derivation of the full conditional distributions for the elements of $S$ is detailed in the Appendix. The above distribution is in a closed form. We now transform $S_{jj}$ to $\nu =S_{jj}-c$, then the conditional density of $\nu $ is
$$\begin{aligned} p(\nu |\cdot ) \propto \, \nu ^{-\frac{n}{2}} \exp \{-(d/\nu +\lambda \nu )/2\} \mathcal {I} \{\nu >0\}, \end{aligned}$$
which is a generalized inverse Gaussian (GIG) distribution with parameters $(1-n/2,d,\lambda )$. Therefore, at each MCMC iteration, we can draw a sample of $\nu $ from the GIG distribution, and obtain $S_{jj}=\nu +c$.
7.
is upper triangular. ThenSampling the off-diagonal elements $S_{jj^{\prime }}$: The full posterior conditional density of the off-diagonal element $S_{jj^{\prime }}$, $j<j^{\prime }$ is
$$\begin{aligned} p(S_{jj^{\prime }} | \cdot ) \propto \,&|S|^{-\frac{n}{2}} e^{-tr(\varLambda S^{-1})/2 } \Big [(1-\rho _{jj^{\prime }}) \mathcal {I}\{S_{jj^{\prime }}=0\} \\&+ \rho _{jj^{\prime }} \frac{\lambda }{2} e^{-\lambda |S_{jj^{\prime }}|} \Big ] \mathcal {I} \{S \in \mathcal {S}^+\} . \end{aligned}$$
Without loss of generality, suppose that $j=q-1$ and $j^{\prime }=q$. By the properties of the matrix inverse and the matrix determinant, we have
$$\begin{aligned} p(S_{jj^{\prime }}|\cdot ) \propto \,&(1\!-\!\frac{\nu ^2}{ab})^{\!-\!\frac{n}{2}} \exp \left\{ \!-\!\frac{bD_{11}\!+\!aD_{22}-2D_{12}\nu }{2(ab-\nu ^2)} \right\} \\&\times \Big [ (1-\rho _{jj^{\prime }})\mathcal {I} \{S_{jj^{\prime }}=0\} + \rho _{jj^{\prime }} \frac{\lambda }{2} e^{-\lambda |S_{jj^{\prime }}|} \Big ] \\&\times \mathcal {I} \{\nu ^2<ab\} \end{aligned}$$
where
$$\begin{aligned} \nu =&S_{jj^{\prime }}-B_{12}, \\ a=&S_{jj}-B_{11}, \\ b=&S_{j^{\prime }j^{\prime }}-B_{22}, \\ B=&S_{jj^{\prime }-(jj^{\prime })}S^{-1}_{-(jj^{\prime })}S_{-(jj^{\prime })jj^{\prime }} \\ D =&[S_{jj^{\prime } -(jj^{\prime })}S^{-1}_{-(jj^{\prime })}\text { } -I_2] \varLambda [S_{jj^{\prime } -(jj^{\prime })}S^{-1}_{-(jj^{\prime })}\text { } -I_2]^T. \end{aligned}$$
We make variable transformation, and the conditional density of $\nu $ is
$$\begin{aligned} p(\nu |\cdot ) \propto&\left( 1-\rho _{jj^{\prime }}\right) g(-B_{12}) \mathcal {I} \{\nu =-B_{12}, \nu ^2<ab\} \\&+ \frac{\rho _{jj^{\prime }} \lambda }{2} g(\nu ), \end{aligned}$$
where
$$\begin{aligned} g(\nu ) =&\, (1-\frac{\nu ^2}{ab})^{-\frac{n}{2}} \exp \Big \{ -\frac{bD_{11}+aD_{22}-2D_{12}\nu }{2(ab-\nu ^2)} \\&-\lambda |\nu +B_{12}| \Big \} \mathcal {I} \{\nu ^2<ab\}. \end{aligned}$$
Note that the positive definite constraint on $S$ is ensured by the indicator $\mathcal {I} \{\nu ^2<ab\}$. That is, the posterior distribution of $\nu $ is only over the region $(-\sqrt{ab},\sqrt{ab})$. The continuous part of the conditional distribution of $\nu $, $g(\nu )$, cannot be sampled directly. Furthermore, $g(\nu )$ is nonconcave and therefore the sampler may be trapped locally if we use the random-walk MH algorithm within the Gibbs sampling. Since $g(\nu )$ has density only over $(-\sqrt{ab},\sqrt{ab})$ and is zero elsewhere, we construct a piecewise uniform proposal distribution that approximates $g(\nu )$. We choose $\kappa -1$ equally spaced grids between ($-\sqrt{ab}$, $\sqrt{ab})$, $-\sqrt{ab}=\nu _0<\nu _1<\cdots <\nu _{\kappa }=\sqrt{ab}$, which divide the domain of $\nu $ into $\kappa $ intervals of width $2\sqrt{ab}/\kappa $. The piecewise uniform distribution is as follows:
$$\begin{aligned} g_a(\nu ) = \left\{ \begin{array}{ll} g\left( \frac{\nu _0 + \nu _1}{2}\right) &{} \quad \text{ if } \, \nu _0 < \nu \le \nu _1 \\ g\left( \frac{\nu _1 + \nu _2}{2}\right) &{} \quad \text{ if } \, \nu _1 < \nu \le \nu _2 \\ \cdots &{} \\ g\left( \frac{\nu _{\kappa -1} + \nu _\kappa }{2}\right) &{} \quad \text{ if } \nu _{\kappa -1} < \nu < \nu _\kappa . \end{array} \right. \end{aligned}$$
The independent MH proposal distribution for generating $\nu =S_{jj^{\prime }}-B_{12}$ is given by
$$\begin{aligned} q(\nu |\cdot ) \propto&\, (1-\rho _{jj^{\prime }}) g(-B_{12}) \mathcal {I} \{\nu =-B_{12}, \nu ^2<ab \} \\&+ \frac{\rho _{jj^{\prime }} \lambda }{2} g_a(\nu ). \end{aligned}$$
Samples of $\nu $ can be generated by using an inverse-cumulative distribution function (CDF) method from the proposal distribution, and the proposal $\nu ^*$ is accepted with the probability
$$\begin{aligned} \alpha = \min \left\{ 1, \frac{p(\nu ^*)}{q(\nu ^*)} \Big / \frac{p(\nu ^c)}{q(\nu ^c)} \right\} , \end{aligned}$$
where $\nu ^c = S^c_{jj^{\prime }}-B_{12}$ denotes the current state of $\nu $. The piecewise uniform proposal distribution avoids the local-trap problem and can be sampled easily using an inverse-CDF method. Furthermore, $q(\nu | \cdot )$ approximates the distribution $p(\nu |\cdot )$ more accurately with an increased number of grids. Based on our simulations, $100$ grids are enough for a fast convergence of $S_{jj^{\prime }}$.
8.
Sampling the shrinkage parameter $\lambda $: The full conditional of $\lambda $ is
$$\begin{aligned} \mathrm{Gamma}\left( a_\lambda +m, b_\lambda +\sum _{j<j^{\prime }}|S_{jj^{\prime }}|+\frac{1}{2}\sum _{j=1}^q S_{jj}\right) , \end{aligned}$$
where $m$ equals the size of the set $\{(j,j^{\prime }): j \le j^{\prime }, S_{jj^{\prime }} \ne 0 \}$.
9.
Sampling the selection parameters $\rho _{jj^{\prime }}$: The full conditional of $\rho _{jj^{\prime }}$ for $j<j^{\prime }$ is
$$\begin{aligned} \mathrm{Beta}(a_\rho +\mathcal {I} \{S_{jj^{\prime }} \ne 0\}, b_\rho +\mathcal {I} \{ S_{jj^{\prime }}=0 \}). \end{aligned}$$

Appendix 3: Posterior inference for the latent-factor conditional graphical model

We derive the full conditionals for all the parameters and perform posterior inference using a Gibbs sampling algorithm. Note that the prior specification for the parameters $\{M, z_k, \tau _k\}$ in the graphical factor analysis models parallels that in the hierarchical model in formulas (9), so the full conditionals of these parameters for a Gibbs algorithm are similar to those in Sect. 1. In this section, we present the sampling algorithm of the parameter set $\{C,\lambda ^C,\varvec{\rho }^C\}$ for the graphical model of the residuals.

1.1 Sampling $\{C,\lambda ^C,\rho ^C\}$ for the graphical model of the residuals

For convenience, let $\varLambda =(y-Mf)(y-Mf)^T$.

1.
Sampling the diagonal elements of $C$, $C_{jj}$, for $j=1,\ldots ,q$: The full conditional density of $C_{jj}$ is
$$\begin{aligned} p(C_{jj} | \cdot ) \propto&\,|C|^{\frac{n}{2}} e^{ -(\varLambda _{jj}C_{jj} + \lambda ^C C_{jj})/2 } \mathcal {I} \{C \in \mathcal {C}^+ \}. \end{aligned}$$
Without loss of generality, suppose that $j=q$. Let $C=R^{\prime }R$ be the Cholesky decomposition of $C$ where the matrix $R=(R_{jj^{\prime }})$ is upper triangular. Then
$$\begin{aligned} p(C_{jj} | \cdot ) \propto&\, (C_{jj}-c)^{\frac{n}{2}} e^{ -\left( \frac{\varLambda _{jj}+\lambda ^C}{2}\right) C_{jj} } \mathcal {I} \{C_{jj}>c \}, \end{aligned}$$
where $c=\sum ^{q-1}_{j=1} R_{j,q}^2$ does not depend on $C_{jj}$, and the indicator function $\mathcal {I} \{C_{jj}>c \}$ ensures that $C$ is positive definite. Let $\nu =(C_{jj}-c)$, then the conditional distribution of $\nu $ is
$$\begin{aligned} p(\nu | \cdot ) \propto \,&\,\nu ^{\frac{n}{2}} e^{ -\left( \frac{\varLambda _{jj}+\lambda ^C}{2}\right) \nu } \mathcal {I} \{\nu >0 \}, \end{aligned}$$
which follows $\mathrm{Gamma}(\frac{n}{2}+1,\frac{\varLambda _{jj}+\lambda ^C}{2})$. Hence, we can draw samples of $\nu $ from the Gamma distribution first, and obtain $C_{jj}=\nu +c$.
2.
Sampling the off-diagonal elements of $C$, $C_{jj^{\prime }}$, for $j<j^{\prime }$: The complete conditional density of $C_{jj^{\prime }}$ is
$$\begin{aligned} p(C_{jj^{\prime }} | \cdot ) \propto \,&|C|^{\frac{n}{2}} e^{-\varLambda _{jj^{\prime }}C_{jj^{\prime }} } p(C_{jj^{\prime }} |\rho ^C_{jj^{\prime }},\lambda ^C) \\&\times \mathcal {I} \{C \in \mathcal {C}^+ \}. \end{aligned}$$
Without loss of generality, suppose that $j=q-1$ and $j^{\prime }=q$. Then using Lemma 2 of Wong et al. (2003),
$$\begin{aligned} p(C_{jj^{\prime }} | \cdot ) \propto \,&\Big \{ 1-\frac{(C_{jj^{\prime }}-a)^2}{cb^2} \Big \}^{\frac{n}{2}} e^{ -\varLambda _{jj^{\prime }}C_{jj^{\prime }}} \\&\times p(C_{jj^{\prime }} |\rho ^C_{jj^{\prime }},\lambda ^C) \mathcal {I} \{|C_{jj^{\prime }}-a|<b\sqrt{c} \} \\ \propto&\Big [ (1-\rho ^C_{jj^{\prime }}) \Big \{ 1-\frac{(C_{jj^{\prime }}-a)^2}{cb^2} \Big \}^{\frac{n}{2}} \mathcal {I} \{C_{jj^{\prime }}=0 \} \\&+ \rho ^C_{jj^{\prime }} \Big \{ 1-\frac{(C_{jj^{\prime }}-a)^2}{cb^2} \Big \}^{\frac{n}{2}} e^{-\varLambda _{jj^{\prime }}C_{jj^{\prime }}-\lambda ^C |C_{jj^{\prime }}| }\\&\times \mathcal {I} \{C_{jj^{\prime }} \ne 0 \} \Big ] \cdot \mathcal {I} \{|C_{jj^{\prime }}-a|<b\sqrt{c} \}, \end{aligned}$$
where $a=\sum ^{q-2}_{j=1} R_{j,q-1}R_{j,q}$, $b=R_{q-1,q-1}$, and $c=R_{q-1,q}^2+R_{q,q}^2$ do not depend on $C_{jj^{\prime }}$. Transform $C_{jj^{\prime }}$ to $\nu = (C_{jj^{\prime }}-a)/(b\sqrt{c})$, and let $\kappa =-a/(b\sqrt{c})$. The full conditional density of $\nu $ is
$$\begin{aligned} p(\nu |\cdot )&\propto \left( 1-\rho ^C_{jj^{\prime }}\right) (1-\kappa ^2)^{\frac{n}{2}} \mathcal {I} \{\nu =\kappa ,|\nu | < 1\} \\&\quad \quad \,\, + \frac{\rho ^C_{jj^{\prime }} \lambda ^C}{2} g(\nu ), \end{aligned}$$
where $g(\nu )=(1-\nu ^2)^{n/2} \exp \{ -\varLambda _{jj^{\prime }}(\nu b\sqrt{c} +a) -\lambda ^C|\nu b\sqrt{c} +a| \} \mathcal {I} \{|\nu |<1\}$. The positive definite constraint on $C$ is ensured by the indicator function $ \mathcal {I} \{|\nu |<1\}$. The continuous part of the conditional distribution of $\nu $, $g(\nu )$, cannot be sampled directly. Since $g(\nu )$ has density only over $(-1,1)$ and is zero elsewhere, we can use an independent MH algorithm as the sampling algorithm for $S_{jj^{\prime }}$ in Sect. 1. The details of the independent MH algorithm are explained in Sect. 1. Briefly, we choose $\kappa -1$ equally spaced grids between (-1,1), $-1=\nu _0<\nu _1<\cdots <\nu _{\kappa }=1$, which divide the domain of $\nu $ into $\kappa $ intervals of width $2/\kappa $, and construct a piecewise uniform distribution $g_a(\nu )$ approximating $g(\nu )$. The independent MH proposal for generating $\nu $ is then given by
$$\begin{aligned} q(\nu |\cdot ) \propto&\left( 1-\rho ^C_{jj^{\prime }}\right) (1-\kappa ^2)^{\frac{n}{2}} \mathcal {I} \{\nu =\kappa ,|\nu | < 1\} \\&+ \frac{\rho ^C_{jj^{\prime }} \lambda ^C}{2} g_a(\nu ). \end{aligned}$$
Samples of $\nu $ can be generated using an inverse-CDF method from the proposal distribution, and the proposal $\nu ^*$ is accepted with the probability
$$\begin{aligned} \alpha = \min \left\{ 1, \frac{p(\nu ^*)}{q(\nu ^*)} \Big / \frac{p(\nu ^c)}{q(\nu ^c)} \right\} , \end{aligned}$$
where $\nu ^c = (C^c_{jj^{\prime }}-a)/(b\sqrt{c})$ denotes the current state of $\nu $. Samples of $C_{jj^{\prime }}$ are obtained as $C_{jj^{\prime }}=\nu b\sqrt{c} +a$.
3.
Sampling the shrinkage parameter $\lambda ^C$: The full conditional of $\lambda $ is
$$\begin{aligned} \mathrm{Gamma}\left( a_\lambda +m, b_\lambda +\sum _{j<j^{\prime }}|C_{jj^{\prime }}|+\frac{1}{2}\sum _{j=1}^q C_{jj}\right) \!, \end{aligned}$$
where $m$ is equal to the size of the set $\{(j,j^{\prime }): j \le j^{\prime }, C_{jj^{\prime }} \ne 0 \}$.
4.
Sampling the selection parameters $\rho ^C_{jj^{\prime }}$: The full conditional of $\rho ^C_{jj^{\prime }}$ for $j<j^{\prime }$ is
$$\begin{aligned} \;\,\mathrm{Beta}(a_\rho +\mathcal {I} \{C_{jj^{\prime }} \ne 0\}, b_\rho +\mathcal {I} \{ C_{jj^{\prime }}=0 \}). \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, L., Sarkar, A. & Mallick, B.K. Bayesian sparse covariance decomposition with a graphical structure. Stat Comput 26, 493–510 (2016). https://doi.org/10.1007/s11222-014-9540-7

Download citation

Received: 27 June 2014
Accepted: 27 October 2014
Published: 09 December 2014
Issue Date: January 2016
DOI: https://doi.org/10.1007/s11222-014-9540-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bayesian sparse covariance decomposition with a graphical structure

Abstract

Access this article

Similar content being viewed by others

Sparse estimation via nonconcave penalized likelihood in factor analysis model

Variance Components Estimation in Mixed Linear Model—The Sub-diagonalization Method

A Variational Bayes Approach to Factor Analysis

References

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: Derivation of the conditional densities for the elements in \(S\)

1.1 The conditional densities for the diagonal elements in S

1.2 The conditional densities for the off-diagonal elements in S

Appendix 2: Posterior inference for the Bayesian covariance decomposition model

Appendix 3: Posterior inference for the latent-factor conditional graphical model

1.1 Sampling \(\{C,\lambda ^C,\rho ^C\}\) for the graphical model of the residuals

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Bayesian sparse covariance decomposition with a graphical structure

Abstract

Access this article

Similar content being viewed by others

Sparse estimation via nonconcave penalized likelihood in factor analysis model

Variance Components Estimation in Mixed Linear Model—The Sub-diagonalization Method

A Variational Bayes Approach to Factor Analysis

References

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: Derivation of the conditional densities for the elements in \(S\)

1.1 The conditional densities for the diagonal elements in S

1.2 The conditional densities for the off-diagonal elements in S

Appendix 2: Posterior inference for the Bayesian covariance decomposition model

Appendix 3: Posterior inference for the latent-factor conditional graphical model

1.1 Sampling \(\{C,\lambda ^C,\rho ^C\}\) for the graphical model of the residuals

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation