Abstract
In this paper, we discuss the selection of random effects within the framework of generalized linear mixed models (GLMMs). Based on a reparametrization of the covariance matrix of random effects in terms of modified Cholesky decomposition, we propose to add a shrinkage penalty term to the penalized quasi-likelihood (PQL) function of the variance components for selecting effective random effects. The shrinkage penalty term is taken as a function of the variance of random effects, initiated by the fact that if the variance is zero then the corresponding variable is no longer random (with probability one). The proposed method takes the advantage of a convenient computation for the PQL estimation and appealing properties for certain shrinkage penalty functions such as LASSO and SCAD. We propose to use a backfitting algorithm to estimate the fixed effects and variance components in GLMMs, which also selects effective random effects simultaneously. Simulation studies show that the proposed approach performs quite well in selecting effective random effects in GLMMs. Real data analysis is made using the proposed approach, too.
Similar content being viewed by others
References
Ahn, M., Zhang, H.H., Lu, W.: Moment-based method for random effects selection in linear mixed models. Stat. Sin. 22, 1539–1562 (2012)
Bondell, H.D., Krishna, A., Ghosh, S.K.: Joint variable selection for fixed and random effects in linear mixed models. Biometrics 66, 1069–1077 (2010)
Breiman, L.: Heuristics of instability and stabilization in model selection. Ann. Stat. 24, 2350–2383 (1996)
Breslow, N.E., Clayton, D.G.: Approximate inference in generalized linear mixed models. J. Am. Stat. Assoc. 88, 9–25 (1993)
Breslow, N.E., Lin, X.H.: Bias correction in generalised linear mixed models with a single component of dispersion. Biometrika 82, 81–91 (1995)
Chen, Z., Dunson, D.B.: Random effects selection in linear mixed models. Biometrics 59, 762–769 (2003)
Fahrmeir, L., Kneib, T., Konrath, S.: Bayesian regularisation in structured additive regression: a unifying perspective on shrinkage, smoothing and predictor selection. Stat. Comput. 20, 203–219 (2010)
Fan, J.Q., Li, R.Z.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. R. Stat. Assoc. 96, 1348–1360 (2001)
Fan, Y., Li, R.Z.: Variable selection in linear mixed effects models. Ann. Stat. 40, 2043–2068 (2012)
Frank, I.E., Friedman, J.H.: A statistical view of some chenometric regression tools (with discussion). Technometrics 35, 109–148 (1993)
Groll, A., Tutz, G.: Variable selection for generalized linear mixed models by L1-penalized estimation. Stat. Comput. (2013). doi:10.1007/s11222-012-9359-z
Groll, A., Tutz, G.: Regularization for generalized additive mixed models by likelihood-based boosting. Methods Inf. Med. 51, 168–177 (2012)
Hoerl, A.E., Kennard, R.W.: Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12, 55–67 (1970a)
Hoerl, A.E., Kennard, R.W.: Ridge regression: application to nonorthogonal problems. Technometrics 12, 69–82 (1970b)
Ibrahim, J.G., Zhu, H., Garcia, R.I., Guo, R.: Fixed and random effects selection in mixed effects models. Biometrics 67, 495–503 (2010)
Kaslow, R.A., Ostrow, D.G., Detels, R., et al.: The multicenter AIDS cohort study: rationale, organization and selected characteristics of the participants. Am. J. Epidemiol. 126, 310–318 (1987)
Kinney, S.K., Dunson, D.B.: Fixed and random effects selection in linear and logistic models. Biometrics 63, 690–698 (2007)
Lin, B., Pang, Z., Jiang, J.: Fixed and random effects selection by REML and pathwise coordinate optimization. J. Comput. Graph. Stat. (2012). doi:10.1080/10618600.2012.681219
Lin, X.H.: Estimation using penalized quasilikelihood and quasi-pseudo-likelihood in Poisson mixed models. Lifetime Data Anal. 13, 533–544 (2007)
Lin, X.H., Breslow, N.E.: Bias correction in generalized linear mixed models with multiple components of dispersion. J. Am. Stat. Assoc. 91, 1007–1016 (1996)
Pan, J., Thompson, R.: Quasi-Monte Carlo estimation in generalized linear mixed models. Comput. Stat. Data Anal. 51, 5765–5775 (2007)
Patterson, H.D., Thompson, R.: Recovery of inter-block information when block sizes are unequal. Biometrika 58, 545–554 (1971)
Schall, R.: Estimation in generalized linear models with random effects. Biometrika 78, 719–727 (1991)
Schelldorfer, J., Bühlmann, P.: GLMMLasso: an algorithm for highdimensional generalized linear mixed models using L1-penalization. Preprint, ETH Zurich (2011). http://stat.ethz.ch/people/schell
Stiratelli, R., Laird, N.M., Ware, J.H.: Random effects models for serial observations with binary response. Biometrics 40, 961–971 (1984)
Tibshirani, R.J.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B 58, 267–288 (1996)
Ye, H.J., Pan, J.X.: Modelling covariance structures in generalized estimating equations for longitudinal data. Biometrika 93, 927–941 (2006)
Zeger, S.L., Liang, K., Albert, P.S.: Models for longitudinal data: a generalized estimating equation approach. Biometrics 44, 1049–1060 (1988)
Zou, H.: The adaptive LASSO and its oracle properties. J. Am. Stat. Assoc. 101, 1418–1429 (2006)
Acknowledgements
Pan’s research was supported by a grant from the Royal Society of the UK, and Huang’s research was funded by a scholarship from the University of Manchester. We would like to thank two anonymous referees and the Editor/AE for their constructive comments and helpful suggestions.
Author information
Authors and Affiliations
Corresponding author
Appendix: Derivation of the first- and second-order derivatives
Appendix: Derivation of the first- and second-order derivatives
Part 1.
First we show how to derive \(\varSigma_{\psi}(\theta_{1}^{(k-1)})\) and \(U_{\psi}(\theta_{1}^{(k-1)})\). The main idea is to localize p ψ (|θ 1|) approximately by a quadratic function. This is a direct result of Fan and Li (2001). Suppose that we already have \(\theta_{1}^{(k-1)}=(\lambda_{1}^{(k-1)},\dots, \lambda_{q}^{(k-1)})\) that is close to the minimizer of (2.5). Then p ψ (|λ l |),l=1,…,q can be locally approximated by a quadratic function as
for \(\lambda_{l}\approx \lambda_{l}^{(k-1)}\), where \(\operatorname{sgn}(\lambda_{l})\) is the sign of λ l . In other words,
Thus, with the local approximated form of p ψ (|θ 1|)=(p ψ (|λ 1|),…,p ψ (|λ q |)), we can derive its second and first derivatives, i.e., \(\varSigma_{\psi}(\theta_{1}^{(k-1)})\) and \(U_{\psi}(\theta_{1}^{(k-1)})\) at given value \(\theta_{1}^{(k-1)}\).
Part 2.
Without lose of generality, we suppress (k−1) in \(\theta_{i}^{(k-1)}\). We will show how to get the elements in ∇l 2(θ i ) and ∇2 l 2(θ i ), and similarly in ∇l 2R (θ i ) and ∇2 l 2R (θ i ), i=1,2, given θ 1=(λ 1,…,λ q ) and θ 2=(γ 21;γ 31,γ 32;…;γ q1,…,γ q(q−1)).
With \(l_{2}(\hat{\beta},\theta)=-\frac{1}{2}\sum_{i=1}^{n}log|V_{i}|-\frac{1}{2}\sum_{i=1}^{n}(Y_{i}-X_{i}\hat{\beta})^{T}\) \(V_{i}^{-1}(Y_{i}-X_{i}\hat{\beta})\) and let \(e_{i}=Y_{i}-X_{i}\hat{\beta}\), we have
Similarly, we have
At first we deal with the elements in ∇l 2(θ 1) and ∇2 l 2(θ 1), i.e., \(\frac{\partial{l_{2}(\hat{\beta},\theta)}}{\partial{\lambda_{l}}}\) and \(\frac{\partial{l_{2}(\hat{\beta},\theta)}}{\partial{\lambda_{l}}\partial{\lambda_{m}}}\), where l,m=1,…,q. From (A.1),
Similarly we can calculate the elements in ∇l 2R (θ 1) and ∇2 l 2R (θ 1), i.e., \(\frac{\partial{l_{2R}(\hat{\beta},\theta)}}{\partial{\lambda_{l}}}\) and \(\frac{\partial{l_{2R}(\hat{\beta},\theta)}}{\partial{\lambda_{l}}\partial{\lambda_{m}}}\), where l,m=1,…,q. From (A.2),
For the derivatives with respect to θ 2, we have similar expressions. Denote the elements in ∇l 2(θ 2) by \(\frac{\partial{l_{2}(\hat{\beta},\theta)}}{\partial{\gamma_{s_{1}r_{1}}}}\) where s 1=2,…,q; r 1=1,…,l 1−1. And denote the elements in ∇2 l 2(θ 2) by \(\frac{\partial{l_{2}(\hat{\beta},\theta)}}{\partial{\gamma_{s_{1}r_{1}}}\partial{\gamma_{s_{2}r_{2}}}}\), where s 2=2,…,q; r 2=1,…,l 2−1. From (A.1),
Similarly form (A.2), we have
Finally, we give the expression forms of \(\frac{\partial{V_{k}}}{\partial{\lambda_{l}}}\) (\(\frac{\partial{V_{k}}}{\partial{\lambda_{m}}}\)), \(\frac{\partial^{2}{V_{k}}}{\partial{\lambda_{l}}\partial{\lambda_{m}}}\) and \(\frac{\partial{V_{k}}}{\partial{\gamma_{s_{1}r_{1}}}}\) (\(\frac{\partial{V_{k}}}{\partial{\gamma_{s_{2}r_{2}}}}\)), \(\frac{\partial^{2}{V_{k}}}{\partial{\gamma_{s_{1}r_{1}}}\partial{\gamma_{s_{2}r_{2}}}}\) in the above equations. First we have
where \(\frac{\partial{\varLambda}}{\partial{\lambda_{l}}}\) is a q×q matrix with 1 at the lth diagonal entry and 0 at all the other entries. Also we have
where \(\frac{\partial{\varLambda}}{\partial{\lambda_{m}}}\) is a q×q matrix with 1 at the mth diagonal entry and 0 at all the other entries.
If l≠m, it is easy to see that \(\frac{\partial^{2}{V_{k}}}{\partial{\lambda_{l}}\partial{\lambda_{m}}}\) becomes to a q×q matrix with 0 at all entries; if l=m, \(\frac{\partial^{2}{V_{k}}}{\partial{\lambda_{l}}\partial{\lambda_{m}}}=2*Z_{k}U_{l}Z_{k}^{T}\), where U l is a q×q matrix with \(1+\varSigma_{i=1}^{l-1}\gamma_{li}^{2}\) at the lth diagonal entry and 0 at all the other entries.
The expression form of \(\frac{\partial{V_{k}}}{\partial{\gamma_{s_{1}r_{1}}}}\) is
where \(\frac{\partial{\varGamma}}{\partial{\gamma_{s_{1}r_{1}}}}\) is a q×q matrix with 1 at the s 1th row and r 1th column entry and 0 at all the other entries. Furthermore, we have
If r 1≠r 2, \(\frac{\partial{\varGamma}}{\partial{\gamma_{s_{1}r_{1}}}}(\frac{\partial{\varGamma}}{\partial{\gamma_{s_{2}r_{2}}}})^{T}\) turns to be a zero matrix, hence \(\frac{\partial^{2}{V_{k}}}{\partial{\gamma_{s_{1}r_{1}}}\partial{\gamma_{s_{2}r_{2}}}}\) also becomes a zero matrix; if r 1=r 2, \(\frac{\partial{\varGamma}}{\partial{\gamma_{s_{1}r_{1}}}}(\frac{\partial{\varGamma}}{\partial{\gamma_{s_{2}r_{2}}}})^{T}\) turns into a matrix with 1 at s 1th row and s 2th column entry and 0 at all the other entries. At this time, if s 1≠s 2, \(\frac{\partial^{2}{V_{k}}}{\partial{\gamma_{s_{1}r_{1}}}\partial{\gamma_{s_{2}r_{2}}}}\) become a zero matrix since Λ is a diagonal matrix; if s 1=s 2, \(\frac{\partial^{2}{V_{k}}}{\partial{\gamma_{s_{1}r_{1}}}\partial{\gamma_{s_{2}r_{2}}}}=2*Z_{k}S_{s_{1}}Z_{k}^{T}\), where \(S_{s_{1}}\) is matrix with \(\lambda_{s_{1}}^{2}\) at the s 1th diagonal entry and 0 at all the other entries.
Rights and permissions
About this article
Cite this article
Pan, J., Huang, C. Random effects selection in generalized linear mixed models via shrinkage penalty function. Stat Comput 24, 725–738 (2014). https://doi.org/10.1007/s11222-013-9398-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-013-9398-0