Skip to main content
Log in

Bayesian variable selection for mixed effects model with shrinkage prior

  • Original paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

Recently, many shrinkage priors have been proposed and studied in linear models to address massive regression problems. However, shrinkage priors are rarely used in mixed effects models. In this article, we address the problem of joint selection of both fixed effects and random effects with the use of several shrinkage priors in linear mixed models. The idea is to shrink small coefficients to zero while minimally shrink large coefficients due to the heavy tails. The shrinkage priors can be obtained via a scale mixture of normal distributions to facilitate computation. We use a stochastic search Gibbs sampler to implement a fully Bayesian approach for variable selection. The approach is illustrated using simulated data and a real example.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Albert J, Chib S (1997) Bayesian tests and model diagnostics in conditionally independent hierarchical models. J Am Stat Assoc 92:916–925

    MATH  Google Scholar 

  • Armagan A, Dunson DB, Lee J (2013) Generalized double pareto shrinkage. Stat Sin 23:119–143

    MathSciNet  MATH  Google Scholar 

  • Bondell HD, Krishna A, Ghosh SK (2010) Joint variable selection for fixed and random effects in linear mixed effects models. Biometrics 66:1069–1077

    MathSciNet  MATH  Google Scholar 

  • Carvalho C, Polson N, Scott J (2009) Handling sparsity via the horseshoe. JMLR W&CP 5:73–80

    Google Scholar 

  • Carvalho C, Polson N, Scott J (2010) The horseshoe estimator for sparse signals. Biometrika 97:465–480

    MathSciNet  MATH  Google Scholar 

  • Cai B, Dunson DB (2006) Bayesian covariance selection in generalized linear mixed models. Biometrics 62:446–457

    MathSciNet  MATH  Google Scholar 

  • Chen Z, Dunson DB (2003) Random effects selection in linear mixed models. Biometrics 59:762–769

    MathSciNet  MATH  Google Scholar 

  • Crainiceanu CM, Ruppert D (2004) Restricted likelihood ratio tests in nonparametric longitudinal models. Stat Sin 14:713–729

    MathSciNet  MATH  Google Scholar 

  • Daniels MJ, Kass RE (1999) Nonconjugate Bayesian estimation of covariance matrices and its use in hierarchical models. J Am Stat Assoc 94:1254–1263

    MathSciNet  MATH  Google Scholar 

  • Daniels MJ, Pourahmadi M (2002) Bayesian analysis of covariance matrices and dynamic models for longitudinal data. Biometrika 89:553–566

    MathSciNet  MATH  Google Scholar 

  • Daniels MJ, Zhao YD (2003) Modeling the random effects covariance matrix in longitudinal data. Stat Med 22:1631–1647

    Google Scholar 

  • Gelman A (2005) Prior distributions for variance parameters in hierarchical models. Bayesian Anal 1:1–19

    MathSciNet  Google Scholar 

  • Gelman A, Rubin DB (1992) Inference from iterative simulation using multiple sequences. Stat Sci 7:457–511

    MATH  Google Scholar 

  • George EI, McCulloch RE (1993) Variable selection via Gibbs sampling. J Am Stat Assoc 88:881–889

    Google Scholar 

  • George EI, McCulloch RE (1997) Approaches for Bayesian variable selection. Stat Sin 7:339–373

    MATH  Google Scholar 

  • Griffin JE, Brown PJ (2007) Bayesian adaptive lassos with non-convex penalization. Technical Report

  • Hall DB, Praestgaard JT (2001) Order-restricted score tests for homogeneity in generalised linear and nonlinear mixed models. Biometrika 88:739–751

    MathSciNet  MATH  Google Scholar 

  • Hans C (2010) Bayesian lasso regression. Biometrika 96:835–845

    MathSciNet  MATH  Google Scholar 

  • Ibrahim JG, Zhu H, Garcia RI, Guo R (2010) Fixed and random effects selection in mixed effects models. Biometrics 67:495–503. https://doi.org/10.1111/j.1541-0420.2010.01463.x

    Article  MathSciNet  MATH  Google Scholar 

  • Jackson D, Bujkiewicz S, Law M, Riley RR, White IR (2018) A matrix-based method of moments for fitting multivariate network meta-analysis models with multiple outcomes and random inconsistency effects. Biometrics 74:548–556

    MathSciNet  MATH  Google Scholar 

  • Kanno J, Onyon L, Haseman J, Fenner-Crisp P, Ashby J, Owens W (2001) The OECD program to validate the rat uterotrophic bioassay to screen compounds for in vivo estrogenic responses: phase 1. Environ Health Perspect 109(8):785–794

    Google Scholar 

  • Kinney SK, Dunson DB (2007) Fixed and Random Effects Selection in Linear and Logistic Models. Biometrics 63:690–698

    MathSciNet  MATH  Google Scholar 

  • Kuo L, Mallick B (1998) Variable selection for regression models. Sankhya Indian J Stat Ser B 60:65–81

    MathSciNet  MATH  Google Scholar 

  • Laird N, Ware J (1982) Random-effects models for longitudinal data. Biometrics 38:963–974

    MATH  Google Scholar 

  • Lange N, Laird NM (1989) The effect of covariance structures on variance estimation in balance growth-curve models with random parameters. J Am Stat Assoc 84:241–247

    MATH  Google Scholar 

  • Lin X (1997) Variance component testing in generalized linear models with random effects. Biometrika 84:309–326

    MathSciNet  MATH  Google Scholar 

  • Liu JS, Wu YN (1999) Parameter expansion for data augmentation. J Am Stat Assoc 94:1264–1274

    MathSciNet  MATH  Google Scholar 

  • Liu C, Rubin DB, Wu YN (1998) Parameter expansion to accelerate EM: the PX-EM algorithm. Biometrika 85:755–770

    MathSciNet  MATH  Google Scholar 

  • Miller A (2002) Subset selection in regression, 2nd edn. Chapman and Hall/CRC, Boca Raton

    MATH  Google Scholar 

  • O’Hara R, Sillanpaa M (2009) A review of Bayesian variable selection methods: what, how and which. Bayesian Anal 4:85–118

    MathSciNet  MATH  Google Scholar 

  • Park T, Casella G (2008) The Bayesian Lasso. J Am Stat Assoc 103:681–686

    MathSciNet  MATH  Google Scholar 

  • Pauler DK, Wakefield JC, Kass RE (1999) Bayes factors and approximations for variance component models. J Am Stat Assoc 94:1242–1253

    MathSciNet  MATH  Google Scholar 

  • Sinharay S, Stern HS (2001) Bayes factors for variance component testing in genearlized linear mixed models. In: Bayesian methods with applications to science, policy and official statistics, pp 507–516

  • Smith M, Kohn R (1996) Nonparametric regression using Bayesian variable selection. J Econom 317–343

    MATH  Google Scholar 

  • Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodological) 58:267–288

    MathSciNet  MATH  Google Scholar 

  • Verbeke G, Molenberghs G (2003) The use of score tests for inference on variance components. Biometrika 59:254–262

    MathSciNet  MATH  Google Scholar 

  • Yang M (2012) Bayesian variable selection for logistic mixed model with nonparametric random effects. Comput Stat Data Anal 56:2663–2674

    MathSciNet  MATH  Google Scholar 

  • Yang (2013) Bayesian nonparametric centered random effects models with variable selection. Biom J 55:217–230

    MathSciNet  MATH  Google Scholar 

  • Yang (2018) Assessment of noninferiority and equivalence for simple crossover trials using Bayesian approach. Stat Biosci 10:506–519

    Google Scholar 

  • Yang M, Dunson D (2010) Bayesian semiparametric structural equation models with latent variables. Psychometrika 75:675–693

    MathSciNet  MATH  Google Scholar 

  • Yang M, Dunson D, Baird D (2010) Semiparametric Bayes hierarchical models with mean and variance constraints. Comput Stat Data Anal 54:2172–2186

    MathSciNet  MATH  Google Scholar 

  • Zellner A, Siow A (1980) Posterior odds ratios for selected regression hypotheses. In: Bayesian statistics: proceedings of the first international meeting held in Valencia

    MATH  Google Scholar 

  • Zhu ZY, Fung WK (2004) Variance component testing in semiparametric mixed models. J Multivar Anal 91:107–118

    MathSciNet  MATH  Google Scholar 

  • Zou H (2006) The adaptive Lasso and Its oracle properties. J Am Stat Assoc 101:1418–1429

    MathSciNet  MATH  Google Scholar 

  • Zou H, Li R (2008) One-step sparse estimates in nonconcave penalized likelihood models. Ann Stat 36:1509–1533

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

The work is supported in part by the intramural University Graduate Program of San Diego State University. We thank Suba Sudarsan for assistance in producing some results, Dr. Madanat, our colleagues, the Associate Editor, and two anonymous reviewers for comments and suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mingan Yang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

We outline the detailed Gibbs sampler for posterior sampling as follows.

  1. 1.

    Given the data and the current values of \(g, \varvec{\varLambda },\varvec{\varGamma }, \varvec{\xi }, \sigma ^2, J\), the posterior of \(\varvec{\beta }_J\) is sampled from a multivariate normal \(N(\varvec{\mu }_{\varvec{\beta }_J}, \varvec{\varSigma }_{\varvec{\beta }_J})\):

    $$\begin{aligned} \varvec{\varSigma }_{\varvec{\beta }_J}^{-1}= & {} {\varvec{X}}^{{\varvec{J}}'} {\varvec{X}}^{{\varvec{J}}} /\sigma ^2 + g/\sigma ^2 {\varvec{\zeta }}_{\varvec{J}}^{{\varvec{-1}}}, \quad {\varvec{\zeta }}_{\varvec{J}}=\text{ Diag }(\kappa _{k:J_k=1}, k=1, \ldots , l ),\\ \varvec{\mu }_{\varvec{\beta }_J}= & {} \varvec{\varSigma }_{\varvec{\beta }_J} \left\{ \sum _i \sum _j {\varvec{Z}}_{{\varvec{ij}}}(y_{ij}- {\varvec{X}}_{{\varvec{ij}}}' \varvec{\varLambda }\varvec{\varGamma }\varvec{\xi }_i )/\sigma ^2 \right\} . \end{aligned}$$
  2. 2.

    Given the data and the current values of \( \varvec{\varLambda },\varvec{\varGamma }, \varvec{\xi }, g, \sigma ^2, \varvec{\beta }\), the full conditional posterior of \(J_k\) for \(k=1, \ldots , l\) is calculated by integrating out \(\varvec{\beta }\) and \(\sigma ^2\):

    $$\begin{aligned} p(J_k=1|\text{ else })= & {} \frac{1.0}{1.0 + \varpi _k}, \quad \varpi _k= \frac{S_1(J_k=0)}{S_1(J_k=1)} \frac{S_2(J_k=0)}{S_2(J_k=1)}\frac{1-p_{k0}}{p_{k0}} \sqrt{\frac{\kappa _k}{g}},\\ S_1(J)= & {} \left\{ \sum _i \sum _j (y_{ij}- {{\varvec{X}}_{{\varvec{ij}}}'} \varvec{\varLambda }\varvec{\varGamma }\varvec{\xi }_i )^2 - \sigma ^2 \varvec{\mu }_{\varvec{\beta }_J}' \varvec{\varSigma }_{\varvec{\beta }_J}^{-1}\varvec{\mu }_{\varvec{\beta }_J} \right\} ^{-\sum _i n_i/2}, \\ S_2(J)= & {} \left| \sum _i\sum _j {\varvec{X}}_{{\varvec{ij}}} {\varvec{X}}_{{\varvec{ij}}} + g {\varvec{\zeta }}_{\varvec{J}}^{{\varvec{-1}}} \right| ^{-1/2}. \end{aligned}$$
  3. 3.

    Given the data and updated values of \(J, \varvec{\beta }, \sigma ^2\), the posterior of g is sampled from \(G(a^*, b^*)\), where

    $$\begin{aligned} a^*= & {} \left( 1.0 + \sum _k J_k \right) /2.0, \\ b^*= & {} N/2 + \varvec{\beta }^{J'} {\varvec{\zeta }}_{\varvec{J}}^{{\varvec{-1}}}\varvec{\beta }^{J}/(2\sigma ^2). \end{aligned}$$
  4. 4.

    Given the data and updated values of \(J, \varvec{\beta }, \sigma ^2, \varvec{\varLambda }, \varvec{\varGamma }, \varvec{\xi }\), the posterior of \(\sigma ^2\) is sampled from \(IG(h_1, h_2)\), where

    $$\begin{aligned} h_1= & {} \frac{ N+\sum _k J_k}{2}, \\ h_2= & {} \sum _i\sum \tau _{ij}^2 + \varvec{\beta }^{J'} g\zeta _J^{-1}\varvec{\beta }^{J}/2,\\ \tau _{ij}= & {} y_{ij}- {\varvec{X}}_{{\varvec{ij}}}' \varvec{\beta }- {\varvec{Z}}_{{\varvec{ij}}}' \varvec{\varLambda }\varvec{\varGamma }\varvec{\xi }_i. \end{aligned}$$
  5. 5.

    Reparameterize model (2) as \(y_{ij}^*=y_{ij}-\sum _{k=1}^q z_{ijk}\lambda _k \varvec{\xi }_{ik}\) and \(y_{ij}^*= {\varvec{X}}_{{\varvec{ij}}}^{{\varvec{J}}'} \varvec{\beta }^J + {\varvec{o}}_{{\varvec{ij}}}' {\varvec{r}}\), with the kth element \({\varvec{r}}_k=\varvec{\varGamma }(fm), f=2, \ldots , q, m=1, \ldots f-1, \) and \({\varvec{o}}_{{\varvec{ij}}}'= z_{ijf}\lambda _f\xi _{im}\). With prior \({\varvec{r}} \sim N(\varvec{\mu }_{r0}, \varvec{\varSigma }_{r0})\), the posterior is also multivariate normal \( N(\varvec{\mu }_{r}^*, \varvec{\varSigma }_{r}^*)1( {\varvec{r}} \in R_r)\) where \({\varvec{r}} \in R_r\) constrains elements of \({\varvec{r}}\) to be zero when the corresponding random effects are zeroes.

    $$\begin{aligned} \varvec{\varSigma }_{r}^{*-1}= & {} \varvec{\varSigma }_{r0}^{-1} + \sum _i \sum _j {\varvec{o}}_{{\varvec{ij}}} {\varvec{o}}_{{\varvec{ij}}}'/\sigma ^2, \\ \varvec{\mu }_{r}^*= & {} \varvec{\varSigma }_r^* \left\{ \varvec{\varSigma }_{r0}^{-1} \varvec{\mu }_{r0} + \sum _i \sum _j {\varvec{o}}_{{\varvec{ij}}}' ( y_{ij}^* -{\varvec{X}}_{{\varvec{ij}}}^{{\varvec{J}}'} \varvec{\beta }^J)/\sigma ^2 \right\} . \end{aligned}$$
  6. 6.

    Reparameterize \(T_{ij}=y_{ij} - {\varvec{X}}_{{\varvec{ij}}}^{{\varvec{J}}'}\varvec{\beta }^{J}-\sum _{k \ne l}t_{ijk}\lambda _k \), where \(t_{ijk}=\sum _{m=1}^k z_{ijk}\gamma _{km}\varvec{\xi }_{im}\). For prior \(\lambda _k \sim ZIN_+(p_{k0}, \mu _0, \sigma _0^2)\), the full conditional posterior is \(ZIN_+(p_k^*, \mu ^*,\sigma ^{*2})\):

    $$\begin{aligned} \sigma ^{*-2}= & {} \sum _i \sum _j t_{ijl}^2/\sigma ^2 + 1.0,\\ \mu ^*= & {} \sigma ^{*2}\left( \sum _i \sum _j t_{ijl}T_{ij}/\sigma ^2\right) ,\\ p_k^*= & {} \frac{p_{k0}}{p_{k0}+(1-p_{k0}) \frac{N(0|\mu _0, \sigma _0^2)}{N(0|\mu ^*, \sigma ^{*2})} \frac{1-\varPhi (0, \mu ^*, \sigma ^{*2}) }{1-\varPhi (0, \mu _0, \sigma _0^2)} }. \end{aligned}$$

    where \(\varPhi (.)\) is the CDF of standard normal distribution, and \(N(a|\mu , \sigma ^2)\) is the pdf of normal distribution with mean \(\mu \), variance \(\sigma ^2\) at a.

  7. 7.

    Sample the posterior \(\varvec{\xi }_i\) from a multivariate normal distribution with parameters

    $$\begin{aligned} \varvec{\varSigma }^{*-1}= & {} \varvec{\varSigma }_{\varvec{\xi }}^{-1} + \sum _{i=1}^{n_i} \varvec{\varGamma }'\varvec{\varLambda }' {\varvec{Z}}_{{\varvec{ij}}} {\varvec{Z}}_{{\varvec{ij}}}'\varvec{\varLambda }\varvec{\varGamma }\sigma ^2 , \quad \varvec{\varSigma }_{\varvec{\xi }}=\text{ Diag }( \tau _k), \ k=1, \ldots , q\\ \varvec{\mu }^*= & {} \varvec{\varSigma }^* \left\{ \sum _{j=1}^{n_i} \varvec{\varGamma }'\varvec{\varLambda }' {\varvec{Z}}_{{\varvec{ij}}}(y_{ij} - {\varvec{X}}_{{\varvec{ij}}}^{J'} \varvec{\beta }^{J} )/\sigma ^2 \right\} \end{aligned}$$
  8. 8.

    Given the updated parameter \(\lambda _k\), update \(\phi _k^2\) from the inverse gamma distribution \(IG(h_3, h_4)\), where

    $$\begin{aligned} h_3= & {} 1.0\\ h_4= & {} (1.0 + \lambda _k)/2 \end{aligned}$$
  9. 9.

    Given the updated parameters \(\varphi \), \(\sigma \), \(\beta \) and g, update \(\kappa \):

    $$\begin{aligned} \kappa _k|\text{ else } \sim \text{ Inv-Gau } \left( \left| \frac{\varphi _k \sigma }{\beta _k \sqrt{g}} \right| , \varphi _k^2 \right) , \end{aligned}$$

    where \(\text{ Inv-Gau } \text{(a, } \text{ b) }\) is the inverse Gaussian distribution with location and scale parameters a and b.

  10. 10.

    Given the updated parameter \(\sigma \), g and \(\beta \), update \(\varphi \) from

    $$\begin{aligned} \varphi _k \sim Gamma(2.0, |\beta _k \sqrt{g}/\sigma | + 1.0 ) \end{aligned}$$
  11. 11.

    Propose the candidate density \(\exp (1)\) for \(\tau _k\), the updated value \(\tau _k^*\) is accepted with probability \( \min { \left\{ 1, \frac{f(\tau _k^*)\exp ( -\tau _k^0)}{f(\tau _k^0)\exp ( -\tau _k^*)}\right\} } \), where \(f(\tau _k)=\tau _k^{-n/2} \exp { (-\sum _i \varvec{\xi }_{ik}^2/(2\tau _k)} -m_k^2\tau _k/2 ) \).

  12. 12.

    Propose the candidate density \(\exp (1)\) for \(m_k\), the updated value \(m_k^*\) is accepted with probability \( \min { \left\{ 1, \frac{g(m_k^*)\exp {( -m_k^0)}}{g(m_k^0)\exp ( -m_k^*)}\right\} } \), where \(g(m_k)=m_k^2 \exp { (-m_k^2\tau _k/2 -m_k ) } \).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, M., Wang, M. & Dong, G. Bayesian variable selection for mixed effects model with shrinkage prior. Comput Stat 35, 227–243 (2020). https://doi.org/10.1007/s00180-019-00895-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-019-00895-x

Keywords

Navigation