Random effects selection in generalized linear mixed models via shrinkage penalty function

Pan, Jianxin; Huang, Chao

doi:10.1007/s11222-013-9398-0

Random effects selection in generalized linear mixed models via shrinkage penalty function

Published: 08 May 2013

Volume 24, pages 725–738, (2014)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

Jianxin Pan¹ &
Chao Huang¹

1698 Accesses
17 Citations
Explore all metrics

Abstract

In this paper, we discuss the selection of random effects within the framework of generalized linear mixed models (GLMMs). Based on a reparametrization of the covariance matrix of random effects in terms of modified Cholesky decomposition, we propose to add a shrinkage penalty term to the penalized quasi-likelihood (PQL) function of the variance components for selecting effective random effects. The shrinkage penalty term is taken as a function of the variance of random effects, initiated by the fact that if the variance is zero then the corresponding variable is no longer random (with probability one). The proposed method takes the advantage of a convenient computation for the PQL estimation and appealing properties for certain shrinkage penalty functions such as LASSO and SCAD. We propose to use a backfitting algorithm to estimate the fixed effects and variance components in GLMMs, which also selects effective random effects simultaneously. Simulation studies show that the proposed approach performs quite well in selecting effective random effects in GLMMs. Real data analysis is made using the proposed approach, too.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A new criterion for assessing discriminant validity in variance-based structural equation modeling

Article Open access 22 August 2014

Jörg Henseler, Christian M. Ringle & Marko Sarstedt

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

Article Open access 05 May 2021

Levi Kumle, Melissa L.-H. Võ & Dejan Draschkow

Violating the normality assumption may be the lesser of two evils

Article Open access 07 May 2021

Ulrich Knief & Wolfgang Forstmeier

References

Ahn, M., Zhang, H.H., Lu, W.: Moment-based method for random effects selection in linear mixed models. Stat. Sin. 22, 1539–1562 (2012)
MATH MathSciNet Google Scholar
Bondell, H.D., Krishna, A., Ghosh, S.K.: Joint variable selection for fixed and random effects in linear mixed models. Biometrics 66, 1069–1077 (2010)
Article MATH MathSciNet Google Scholar
Breiman, L.: Heuristics of instability and stabilization in model selection. Ann. Stat. 24, 2350–2383 (1996)
Article MATH MathSciNet Google Scholar
Breslow, N.E., Clayton, D.G.: Approximate inference in generalized linear mixed models. J. Am. Stat. Assoc. 88, 9–25 (1993)
MATH Google Scholar
Breslow, N.E., Lin, X.H.: Bias correction in generalised linear mixed models with a single component of dispersion. Biometrika 82, 81–91 (1995)
Article MATH MathSciNet Google Scholar
Chen, Z., Dunson, D.B.: Random effects selection in linear mixed models. Biometrics 59, 762–769 (2003)
Article MATH MathSciNet Google Scholar
Fahrmeir, L., Kneib, T., Konrath, S.: Bayesian regularisation in structured additive regression: a unifying perspective on shrinkage, smoothing and predictor selection. Stat. Comput. 20, 203–219 (2010)
Article MathSciNet Google Scholar
Fan, J.Q., Li, R.Z.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. R. Stat. Assoc. 96, 1348–1360 (2001)
Article MATH MathSciNet Google Scholar
Fan, Y., Li, R.Z.: Variable selection in linear mixed effects models. Ann. Stat. 40, 2043–2068 (2012)
Article MATH MathSciNet Google Scholar
Frank, I.E., Friedman, J.H.: A statistical view of some chenometric regression tools (with discussion). Technometrics 35, 109–148 (1993)
Article MATH Google Scholar
Groll, A., Tutz, G.: Variable selection for generalized linear mixed models by L1-penalized estimation. Stat. Comput. (2013). doi:10.1007/s11222-012-9359-z
MATH Google Scholar
Groll, A., Tutz, G.: Regularization for generalized additive mixed models by likelihood-based boosting. Methods Inf. Med. 51, 168–177 (2012)
Article Google Scholar
Hoerl, A.E., Kennard, R.W.: Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12, 55–67 (1970a)
Article MATH Google Scholar
Hoerl, A.E., Kennard, R.W.: Ridge regression: application to nonorthogonal problems. Technometrics 12, 69–82 (1970b)
Article MATH Google Scholar
Ibrahim, J.G., Zhu, H., Garcia, R.I., Guo, R.: Fixed and random effects selection in mixed effects models. Biometrics 67, 495–503 (2010)
Article MathSciNet Google Scholar
Kaslow, R.A., Ostrow, D.G., Detels, R., et al.: The multicenter AIDS cohort study: rationale, organization and selected characteristics of the participants. Am. J. Epidemiol. 126, 310–318 (1987)
Article Google Scholar
Kinney, S.K., Dunson, D.B.: Fixed and random effects selection in linear and logistic models. Biometrics 63, 690–698 (2007)
Article MATH MathSciNet Google Scholar
Lin, B., Pang, Z., Jiang, J.: Fixed and random effects selection by REML and pathwise coordinate optimization. J. Comput. Graph. Stat. (2012). doi:10.1080/10618600.2012.681219
Google Scholar
Lin, X.H.: Estimation using penalized quasilikelihood and quasi-pseudo-likelihood in Poisson mixed models. Lifetime Data Anal. 13, 533–544 (2007)
Article MATH MathSciNet Google Scholar
Lin, X.H., Breslow, N.E.: Bias correction in generalized linear mixed models with multiple components of dispersion. J. Am. Stat. Assoc. 91, 1007–1016 (1996)
Article MATH MathSciNet Google Scholar
Pan, J., Thompson, R.: Quasi-Monte Carlo estimation in generalized linear mixed models. Comput. Stat. Data Anal. 51, 5765–5775 (2007)
Article MATH MathSciNet Google Scholar
Patterson, H.D., Thompson, R.: Recovery of inter-block information when block sizes are unequal. Biometrika 58, 545–554 (1971)
Article MATH MathSciNet Google Scholar
Schall, R.: Estimation in generalized linear models with random effects. Biometrika 78, 719–727 (1991)
Article MATH Google Scholar
Schelldorfer, J., Bühlmann, P.: GLMMLasso: an algorithm for highdimensional generalized linear mixed models using L1-penalization. Preprint, ETH Zurich (2011). http://stat.ethz.ch/people/schell
Stiratelli, R., Laird, N.M., Ware, J.H.: Random effects models for serial observations with binary response. Biometrics 40, 961–971 (1984)
Article Google Scholar
Tibshirani, R.J.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B 58, 267–288 (1996)
MATH MathSciNet Google Scholar
Ye, H.J., Pan, J.X.: Modelling covariance structures in generalized estimating equations for longitudinal data. Biometrika 93, 927–941 (2006)
Article MathSciNet Google Scholar
Zeger, S.L., Liang, K., Albert, P.S.: Models for longitudinal data: a generalized estimating equation approach. Biometrics 44, 1049–1060 (1988)
Article MATH MathSciNet Google Scholar
Zou, H.: The adaptive LASSO and its oracle properties. J. Am. Stat. Assoc. 101, 1418–1429 (2006)
Article MATH Google Scholar

Download references

Acknowledgements

Pan’s research was supported by a grant from the Royal Society of the UK, and Huang’s research was funded by a scholarship from the University of Manchester. We would like to thank two anonymous referees and the Editor/AE for their constructive comments and helpful suggestions.

Author information

Authors and Affiliations

School of Mathematics, The University of Manchester, Oxford Road, Manchester, M13 9PL, United Kingdom
Jianxin Pan & Chao Huang

Authors

Jianxin Pan
View author publications
You can also search for this author in PubMed Google Scholar
Chao Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jianxin Pan.

Appendix: Derivation of the first- and second-order derivatives

Part 1.

First we show how to derive $\varSigma_{\psi}(\theta_{1}^{(k-1)})$ and $U_{\psi}(\theta_{1}^{(k-1)})$. The main idea is to localize p _ψ(|θ ₁|) approximately by a quadratic function. This is a direct result of Fan and Li (2001). Suppose that we already have $\theta_{1}^{(k-1)}=(\lambda_{1}^{(k-1)},\dots, \lambda_{q}^{(k-1)})$ that is close to the minimizer of (2.5). Then p _ψ(|λ _l|),l=1,…,q can be locally approximated by a quadratic function as

$$\bigl[p_{\psi} \bigl(|\lambda_{l}|\bigr)\bigr]'= p'_{\psi} \bigl(|\lambda_{l}|\bigr)\operatorname{sgn}(\lambda_{l}) \approx \frac{p'_{\psi}(|\lambda_{l}^{(k-1)}|)}{|\lambda_{l}^{(k-1)}|}\lambda_{l} $$

for $\lambda_{l}\approx \lambda_{l}^{(k-1)}$, where $\operatorname{sgn}(\lambda_{l})$ is the sign of λ _l. In other words,

Thus, with the local approximated form of p _ψ(|θ ₁|)=(p _ψ(|λ ₁|),…,p _ψ(|λ _q|)), we can derive its second and first derivatives, i.e., $\varSigma_{\psi}(\theta_{1}^{(k-1)})$ and $U_{\psi}(\theta_{1}^{(k-1)})$ at given value $\theta_{1}^{(k-1)}$.

Part 2.

Without lose of generality, we suppress (k−1) in $\theta_{i}^{(k-1)}$. We will show how to get the elements in ∇l ₂(θ _i) and ∇² l ₂(θ _i), and similarly in ∇l _2R(θ _i) and ∇² l _2R(θ _i), i=1,2, given θ ₁=(λ ₁,…,λ _q) and θ ₂=(γ ₂₁;γ ₃₁,γ ₃₂;…;γ _q1,…,γ _q(q−1)).

With $l_{2}(\hat{\beta},\theta)=-\frac{1}{2}\sum_{i=1}^{n}log|V_{i}|-\frac{1}{2}\sum_{i=1}^{n}(Y_{i}-X_{i}\hat{\beta})^{T}$ $V_{i}^{-1}(Y_{i}-X_{i}\hat{\beta})$ and let $e_{i}=Y_{i}-X_{i}\hat{\beta}$, we have

$$ l_2(\hat{\beta},\theta)=-\frac{1}{2}\sum_{i=1}^n\log |V_i|-\frac{1}{2}\sum_{i=1}^ne_i^TV_i^{-1}e_i. $$

(A.1)

Similarly, we have

(A.2)

At first we deal with the elements in ∇l ₂(θ ₁) and ∇² l ₂(θ ₁), i.e., $\frac{\partial{l_{2}(\hat{\beta},\theta)}}{\partial{\lambda_{l}}}$ and $\frac{\partial{l_{2}(\hat{\beta},\theta)}}{\partial{\lambda_{l}}\partial{\lambda_{m}}}$, where l,m=1,…,q. From (A.1),

Similarly we can calculate the elements in ∇l _2R(θ ₁) and ∇² l _2R(θ ₁), i.e., $\frac{\partial{l_{2R}(\hat{\beta},\theta)}}{\partial{\lambda_{l}}}$ and $\frac{\partial{l_{2R}(\hat{\beta},\theta)}}{\partial{\lambda_{l}}\partial{\lambda_{m}}}$, where l,m=1,…,q. From (A.2),

For the derivatives with respect to θ ₂, we have similar expressions. Denote the elements in ∇l ₂(θ ₂) by $\frac{\partial{l_{2}(\hat{\beta},\theta)}}{\partial{\gamma_{s_{1}r_{1}}}}$ where s ₁=2,…,q; r ₁=1,…,l ₁−1. And denote the elements in ∇² l ₂(θ ₂) by $\frac{\partial{l_{2}(\hat{\beta},\theta)}}{\partial{\gamma_{s_{1}r_{1}}}\partial{\gamma_{s_{2}r_{2}}}}$, where s ₂=2,…,q; r ₂=1,…,l ₂−1. From (A.1),

Similarly form (A.2), we have

Finally, we give the expression forms of $\frac{\partial{V_{k}}}{\partial{\lambda_{l}}}$ ($\frac{\partial{V_{k}}}{\partial{\lambda_{m}}}$), $\frac{\partial^{2}{V_{k}}}{\partial{\lambda_{l}}\partial{\lambda_{m}}}$ and $\frac{\partial{V_{k}}}{\partial{\gamma_{s_{1}r_{1}}}}$ ($\frac{\partial{V_{k}}}{\partial{\gamma_{s_{2}r_{2}}}}$), $\frac{\partial^{2}{V_{k}}}{\partial{\gamma_{s_{1}r_{1}}}\partial{\gamma_{s_{2}r_{2}}}}$ in the above equations. First we have

where $\frac{\partial{\varLambda}}{\partial{\lambda_{l}}}$ is a q×q matrix with 1 at the lth diagonal entry and 0 at all the other entries. Also we have

where $\frac{\partial{\varLambda}}{\partial{\lambda_{m}}}$ is a q×q matrix with 1 at the mth diagonal entry and 0 at all the other entries.

If l≠m, it is easy to see that $\frac{\partial^{2}{V_{k}}}{\partial{\lambda_{l}}\partial{\lambda_{m}}}$ becomes to a q×q matrix with 0 at all entries; if l=m, $\frac{\partial^{2}{V_{k}}}{\partial{\lambda_{l}}\partial{\lambda_{m}}}=2*Z_{k}U_{l}Z_{k}^{T}$, where U _l is a q×q matrix with $1+\varSigma_{i=1}^{l-1}\gamma_{li}^{2}$ at the lth diagonal entry and 0 at all the other entries.

The expression form of $\frac{\partial{V_{k}}}{\partial{\gamma_{s_{1}r_{1}}}}$ is

where $\frac{\partial{\varGamma}}{\partial{\gamma_{s_{1}r_{1}}}}$ is a q×q matrix with 1 at the s ₁th row and r ₁th column entry and 0 at all the other entries. Furthermore, we have

If r ₁≠r ₂, $\frac{\partial{\varGamma}}{\partial{\gamma_{s_{1}r_{1}}}}(\frac{\partial{\varGamma}}{\partial{\gamma_{s_{2}r_{2}}}})^{T}$ turns to be a zero matrix, hence $\frac{\partial^{2}{V_{k}}}{\partial{\gamma_{s_{1}r_{1}}}\partial{\gamma_{s_{2}r_{2}}}}$ also becomes a zero matrix; if r ₁=r ₂, $\frac{\partial{\varGamma}}{\partial{\gamma_{s_{1}r_{1}}}}(\frac{\partial{\varGamma}}{\partial{\gamma_{s_{2}r_{2}}}})^{T}$ turns into a matrix with 1 at s ₁th row and s ₂th column entry and 0 at all the other entries. At this time, if s ₁≠s ₂, $\frac{\partial^{2}{V_{k}}}{\partial{\gamma_{s_{1}r_{1}}}\partial{\gamma_{s_{2}r_{2}}}}$ become a zero matrix since Λ is a diagonal matrix; if s ₁=s ₂, $\frac{\partial^{2}{V_{k}}}{\partial{\gamma_{s_{1}r_{1}}}\partial{\gamma_{s_{2}r_{2}}}}=2*Z_{k}S_{s_{1}}Z_{k}^{T}$, where $S_{s_{1}}$ is matrix with $\lambda_{s_{1}}^{2}$ at the s ₁th diagonal entry and 0 at all the other entries.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pan, J., Huang, C. Random effects selection in generalized linear mixed models via shrinkage penalty function. Stat Comput 24, 725–738 (2014). https://doi.org/10.1007/s11222-013-9398-0

Download citation

Received: 05 September 2012
Accepted: 06 April 2013
Published: 08 May 2013
Issue Date: September 2014
DOI: https://doi.org/10.1007/s11222-013-9398-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Random effects selection in generalized linear mixed models via shrinkage penalty function

Abstract

Access this article

Similar content being viewed by others

A new criterion for assessing discriminant validity in variance-based structural equation modeling

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

Violating the normality assumption may be the lesser of two evils

References

Acknowledgements