Skip to main content
Log in

Smoothing and mixed models

  • Published:
Computational Statistics Aims and scope Submit manuscript

Summary

Smoothing methods that use. basis functions with penalisation can be formulated as maximum likelihood estimators and best predictors in a mixed model framework. Such connections are at least a quarter of a century old but, perhaps with the advent of mixed model software, have led to a paradigm shift in the field of smoothing. The reason is that most, perhaps all, models involving smoothing can be expressed as a mixed model and hence enjoy the benefit of the growing body of methodology and software for general mixed model analysis. The handling of other complications such as clustering, missing data and measurement error is generally quite straightforward with mixed model representations of smoothing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6

Similar content being viewed by others

References

  • Antoniadis, A. and Fan, J. (2001). Regularization of wavelet approximations (with discussion). Journal of the American Statistical Association, 96, 939–967.

    Article  MathSciNet  MATH  Google Scholar 

  • Breslow, N.E. and Clayton, D.G. (1993). Approximate inference in generalized linear mixed models. Journal of the American Statistical Association, 88, 9–25.

    MATH  Google Scholar 

  • Brumback, B.A. and Rice, J.A. (1998). Smoothing spline models for the analysis of nested and crossed samples of curves (with discussion). Journal of the American Statistical Association, 93, 961–994.

    Article  MathSciNet  MATH  Google Scholar 

  • Brumback, B.A., Ruppert, D. and Wand, M.P. (1999). Comment on Shively, Kohn and Wood. Journal of the American Statistical Association, 94, 794–797.

    Google Scholar 

  • Cai, T., Hyndman, R.J. and Wand, M.P. (2002). Mixed model-based hazard estimation. Journal of Computational and Graphical Statistics, 11, in press.

    Article  MathSciNet  Google Scholar 

  • Carroll, R. J., Ruppert, D. and Stefanski, L.A. (1995). Measurement Error in Nonlinear Models. London: Chapman and Hall.

    Book  MATH  Google Scholar 

  • Casella, G. and Berger, R. L. (1990). Statistical Inference (Second Edition). Pacific Grove, California: Thomson Learning.

    MATH  Google Scholar 

  • Chaudhuri, P. and Marron, J.S. (1999). SiZer for exploration of structures in curves. Journal of the American Statistical Association, 94, 807–823.

    Article  MathSciNet  MATH  Google Scholar 

  • Chen, Z. (1993). Fitting multivariate regression functions by interaction spline models. Journal of the Royal Statistics Society, Series B, 55, 473–491.

    MathSciNet  MATH  Google Scholar 

  • Coull, B.A., Ruppert, D. and Wand, M.P. (2001). Simple incorporation of interactions into additive models. Biometrics, 57, 539–545.

    Article  MathSciNet  MATH  Google Scholar 

  • Cressie, N. (1993). Statistics for Spatial Data. New York: John Wiley & Sons.

    Book  MATH  Google Scholar 

  • Diggle, P., Liang, K.-L. and Zeger, S. (1995). Analysis of Longitudinal Data. Oxford: Oxford University Press.

    MATH  Google Scholar 

  • Diggle, P. (1997). Spatial and longitudinal data analysis: Two histories with a common future? In Proceedings of the Nantucket conference on Modeling Longitudinal and Spatially Correlated Data: Methods, Applications, and Future Directions. Lecture Notes in Statistics 122, Gregoire, T., Brillinger, D.R., Diggle, P.J., Rusek-Cohen, E., Warren, W.G., Wolfinger, R.D. (eds), Springer-Verlag, New York, 387–402.

    Chapter  Google Scholar 

  • Draper, N.R. and Smith, H. (1998). Applied Regression Analysis (Third Edition). New York: John Wiley & Sons.

    Book  MATH  Google Scholar 

  • Eilers, P.H.C. and Marx, B.D. (1996). Flexible smoothing with B-splines and penalties (with discussion). Statistical Science, 11, 89–121.

    Article  MathSciNet  MATH  Google Scholar 

  • French, J.L., Kammann, E.E. and Wand, M.P. (2001). Comment on Ke and Wang. Journal of the American Statistical Association, 96, 1285–1288.

    Google Scholar 

  • French, J.L. and Wand, M.P. (2002). Generalized additive models for cancer mapping with incomplete covariates. Bio statistics, to appear.

  • Fuller, W.A. (1987). Measurement Error Models. New York: John Wiley & Sons.

    Book  MATH  Google Scholar 

  • Fung, W.-K., Zhu, Z.-Y., Wei, B.-C. and He, X. (2002). Influence diagnostics and outlier tests for semiparametric mixed models. Journal of the Royal Statistical Society, Series B. 64, 565–579.

    Article  MathSciNet  MATH  Google Scholar 

  • Ganguli, B., Staudenmayer, J. and Wand, M.P. (2002). Additive models with predictors subject to measurement error. Unpublished manuscript.

  • Gelman, A., Carlin, J.B., Stern, H.S. and Rubin, D.B. (1995). Bayesian Data Analysis. Boca Raton, Florida: Chapman and Hall.

    Book  MATH  Google Scholar 

  • Gilks, W.R., Richardson, S. and Spiegelhalter, D.J. (1996). Markov Chain Monte Carlo in Practice. London: Chapman and Hall.

    MATH  Google Scholar 

  • Gray, R. J. (1992). Spline-based tests in survival analysis. Biometrics, 50, 640–652.

    Article  MathSciNet  MATH  Google Scholar 

  • Green, P.J. (1985). Linear models for field trials, smoothing and cross-validation. Biometrika, 72, 523–537.

    Article  MathSciNet  Google Scholar 

  • Green, P.J. (1987), Penalized likelihood for general semi-parametric regression models. International Statistical Review, 55, 245–259.

    Article  MathSciNet  MATH  Google Scholar 

  • Hastie, T.J. (1996). Pseudosplines. Journal of the Royal Statistical Society, Series B, 58, 379–396.

    MathSciNet  MATH  Google Scholar 

  • Hastie, T.J. and Tibshirani, R.J. (1990). Generalized Additive Models. London: Chapman and Hall.

    MATH  Google Scholar 

  • Hastie, T.J. and Tibshirani, R.J. (1993). Varying-coefficients models. Journal of the Royal Statistics Society, Series B, 55, 757–796.

    MathSciNet  MATH  Google Scholar 

  • Hastie, T. and Tibshirani, R.J. (2000). Bayesian backfitting. Statistical Science, 15, 196–223.

    Article  MathSciNet  MATH  Google Scholar 

  • Huber, P. (1983). Robust estimation of a location parameter. The Annals of Mathematical Statistics, 35, 73–101.

    Article  MathSciNet  MATH  Google Scholar 

  • Ibrahim, J.G. (1990). Incomplete data Journal of the American Statistical Association, 85, 765–769.

    Article  Google Scholar 

  • Ibrahim, J.G., Chen, M.H., and Lipsitz, S.R. (2001). Missing responses in generalized linear mixed models when the missing data mechanism is nonignorable. Biometrika, 88, 551–564.

    Article  MathSciNet  MATH  Google Scholar 

  • James, G.M. and Hastie, T.J. (2001). Functional linear discriminant analysis for irregularly sampled curves. Journal of the Royal Statistical Society, Series B, 63, 533–550.

    Article  MathSciNet  MATH  Google Scholar 

  • James, G.M., Hastie, T.J. and Sugar, C.A. (2000). Principal component models for sparse functional data. Biometrika, 87, 587–602.

    Article  MathSciNet  MATH  Google Scholar 

  • Johnson, M.E., Moore, L.M. and Ylvisaker, D. (1990). Minimax and maximin distance designs. Journal of Statistical Planning and Inference, 26, 131–148.

    Article  MathSciNet  Google Scholar 

  • Kammann, E.E. and Wand, M.P. (2002). Geoadditive models. Applied Statistics, 52, 1–18.

    MathSciNet  MATH  Google Scholar 

  • Kammann, E.E., Staudenmayer, J. and Wand, M.P. (2002). Robustness for general design mixed models using the t-distribution. Unpublished manuscript.

  • Ke, C. and Wang, Y. (2001). Semiparametric nonlinear mixed-effects models and their applications. Journal of the American Statistical Association, 96, 1272–1281.

    Article  MathSciNet  MATH  Google Scholar 

  • Kelly, C. and Rice, J. (1990). Monotone smoothing with application to dose-response curves and the assessment of synergism. Biometrics, 46, 1071–1085.

    Article  Google Scholar 

  • Laird, N. M. and Ware, J. H. (1982). Random-effects models for longitudinal data. Biometrics, 38, 963–974.

    Article  MATH  Google Scholar 

  • Lange, K.L., Little, R.J.A. and Taylor, J.M.G. (1989). Robust statistical modeling using the t-distribution. Journal of the American Statistical Association, 84, 881–896.

    MathSciNet  Google Scholar 

  • Lin, X. and Zhang, D. (1999). Inference in generalized additive mixed models by using smoothing splines. Journal of the Royal Statistical Society, Series B, 61, 381–400.

    Article  MathSciNet  MATH  Google Scholar 

  • Little, R.J. and Rubin, D.B. (1987). Statistical Analysis with Missing Data. New York: John Wiley & Sons.

    MATH  Google Scholar 

  • MathSoft Inc. (2002).

  • McCulloch, C.E., and Searle, S.R. (2000). Generalized, Linear, and Mixed Models. New York: John Wiley & Sons.

    Book  MATH  Google Scholar 

  • Ngo, L. and Wand, M.P. (2002). Smoothing with mixed model software. Submitted.

  • Nychka, D.W. (2000). Spatial process estimates as smoothers. In Smoothing and Regression (M. Schimek, ed.). Heidelberg: Springer-Verlag.

    Google Scholar 

  • Nychka, D. and Saltzman, N. (1998). Design of Air Quality Monitoring Networks. In Case Studies in Environmental Statistics Nychka (D. Nychka, Cox, L., Piegorsch, W. eds.), Lecture Notes in Statistics, Springer-Verlag, 51–76.

  • Nychka, D., Haaland, P., O’Connell, M., Ellner, S. (1998). FUNFITS, data analysis and statistical tools for estimating functions. In Case Studies in Environmental Statistics (D. Nychka, W.W. Piegorsch, L.H. Cox, eds.), New York: Springer-Verlag, 159–179.

    Chapter  Google Scholar 

  • O’Connell, M.A. and Wolfinger, R.D. (1997). Spatial regression models, response surfaces, and process optimization. Journal of Computational and Graphical Statistics, 6, 224–241.

    MATH  Google Scholar 

  • O’Sullivan, F. (1986). A statistical perspective on ill-posed inverse problems (with discussion). Statistical Science, 1, 505–527.

    MATH  Google Scholar 

  • O’Sullivan, F. (1988). Fast computation of fully automated log-density and log-hazard estimators. SIAM Journal on Scientific and Statistical Computing, 9, 363–379.

    Article  MathSciNet  MATH  Google Scholar 

  • Parker, R.L. and Rice, J.A. (1985). Discussion of “Some aspects of the spline smoothing approach to nonparametric curve fitting” by B.W. Silverman. Journal of the Royal Statistical Society, Series B, 47, 40–42.

    Google Scholar 

  • Patterson, H.D. and Thompson, R. (1973). Recovery of inter-block information when block sizes are unequal. Biometrika, 58, 545–554.

    Article  MathSciNet  MATH  Google Scholar 

  • Pinheiro, J.C. and Bates, D.M. (2000). Mixed-Effects Models in S and S-PLUS. New York: Springer.

    Book  MATH  Google Scholar 

  • Robinson, G.K. (1991). That BLUP is a good thing: the estimation of random effects. Statistical Science, 6, 15–51.

    Article  MathSciNet  MATH  Google Scholar 

  • Rousseeuw, P. J. and Leroy, A. M. (1987). Robust Regression and Outlier Detection. New York: John Wiley & Sons.

    Book  MATH  Google Scholar 

  • Ruppert, D. (2002). Selecting the number of knots for penalized splines. Journal of Computational and Graphical Statistics, in press.

  • Ruppert, D. and Carroll, R.J. (2000). Spatially-adaptive penalties for spline fitting. Australian and New Zealand Journal of Statistics, 42, 205–224.

    Article  Google Scholar 

  • Ruppert, D., Wand, M. P. and Carroll, R.J. (2003). Semiparametric Regression. New York: Cambridge University Press.

    Book  MATH  Google Scholar 

  • SAS Institute, Inc. (2002).

  • Schafer, J. L. (1997). Analysis of Incomplete Multivariate Data. New York: Chapman and Hall.

    Book  MATH  Google Scholar 

  • Searle, S.R., Casella, G. and McCulloch, C.E. (1992). Variance Components. New York: John Wiley & Sons.

    Book  MATH  Google Scholar 

  • Shively, T.S., Kohn, R. and Wood, S. (1999). Variable selection and function estimation in additive nonparametric regression using a data-based prior. Journal of the American Statistical Association, 94, 777–794.

    Article  MathSciNet  MATH  Google Scholar 

  • Speed, T. (1991). Comment on paper by Robinson. Statistical Science, 6, 42–44.

    Article  Google Scholar 

  • Stein, M.L. (1999). Interpolation of Spatial Data: Some Theory for Kriging. New York: Springer.

    Book  MATH  Google Scholar 

  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B, Methodological, 58, 267–288.

    MathSciNet  MATH  Google Scholar 

  • Verbyla, A.P. (1994). Testing linearity in generalized linear models. Contributed Pap. 17th Int. Biometric Conf., Hamilton, Aug. 8th-12th, 177.

  • Verbyla, A.P., Cullis, B.R., Kenward, M.G. and Welham, S.J. (1999). The analysis of designed experiments and longitudinal data by using smoothing splines (with discussion). Journal of the Royal Statistics Society, Series C, 48, 269–312.

    Article  MATH  Google Scholar 

  • Wahba, G. (1978). Improper priors, spline smoothing and the problem of guarding against model errors in regression. Journal of the Royal Statistical Society, Series B, 40, 364–372.

    MathSciNet  MATH  Google Scholar 

  • Wahba, G. (1986). Partial interaction spline models for the semiparametric estimation of functions of several variables. Computer Science and Statistics: Proceedings of the 18th Symposium on the Interface, 75–80.

  • Wahba, G. (1990). Spline Models for Observational Data. Philadelphia: SIAM.

    Book  MATH  Google Scholar 

  • Wang, Y. (1998a). Smoothing spline models with correlated random errors. Journal of the American Statistical Association, 93, 341–348.

    Article  MATH  Google Scholar 

  • Wang, Y. (1998b). Mixed effects smoothing spline analysis of variance. Journal of the Royal Statistical Society, Series B, 60, 159–174.

    Article  MathSciNet  MATH  Google Scholar 

  • Wecker, W.E. and Ansley, C.F. (1983). The signal extraction approach to nonlinear regression and spline smoothing. Journal of the American Statistical Association, 78, 81–89.

    Article  MathSciNet  MATH  Google Scholar 

  • Welsh, A.H. and Richardson, A.M. (1997). Approaches to the robust estimation of mixed models. In Handbook of Statistics, Vol. 15 (G. S. Maddala and C.R. Rao eds.), Amsterdam: Elsevier Science.

    Google Scholar 

Download references

Acknowledgements

The ideas summarised in this article are the result of interaction with several of my colleagues at Harvard School of Public Health in the period 1997–2002: Babette Brumback, Tianxi Cai, Brent Coull, Jonathan French, Bhaswati Ganguli, Erin Kammann, Long Ngo, Nan Laird, Helen Parise, Louise Ryan, Misha Salganik, Joel Schwartz, John Staudenmayer, Sally Thurston, Jim Ware and Yihua Zhao. The paper has also benefited greatly from conversations with Marc Aerts, Ray Carroll, Gerda Claeskens, Ciprian Crainiceanu, Maria Durban, Jim Hobert, Robert Kohn, Xihong Lin, Mary Lindstrom, Michael O’ Connell, José Pinheiro and David Ruppert. I am grateful to Professors Trevor Hastie and Gareth James for making the spinal bone mineral density data available. Finally, thank you to participants in the Euroworkshop on Nonparametric Models (HPCFCT-2000-00041) held in Bernried, Germany in November, 2001 and for its co-organiser, Göran Kauermann, for encouraging me to write this paper. This paper was supported by U.S. National Institute of Environmental Health Sciences grant R01-ES10844-01.

Author information

Authors and Affiliations

Authors

Appendix: Demmler-Reinsch orthogonalisation

Appendix: Demmler-Reinsch orthogonalisation

If X and Z contain the fixed and random effect basis functions for a scatterplot smooth (e.g. as in Section 3.2 or Section 3.6.1) and, as shown in Section 3.2.1, penalised spline regression corresponds to the ridge regression

$${\widehat {\bf{f}}_\alpha } = {\bf{C}}{\left( {{{\bf{C}}^{\bf\top }}{\bf{C}} + \alpha {\bf{D}}} \right)^{ - 1}}{{\bf{C}}^{\bf{ \top} }}{\bf{y}}$$
(9.1)

for some diagonal matrix D and with C = [X Z]. Here α controls the amount of smoothing and in the mixed model formulation of penalised splines \(\alpha = \sigma _\varepsilon ^2/\sigma _u^2\). Algorithm 1 allows for fast and stable calculation of (9.1).

The Cholesky decomposition applies only to nonsingular matrices. If C is ill-conditioned, it is advisable to add a small multiple of D to CTC before applying the Cholesky decomposition, so that

$${{\bf{C}}^{\bf{ \top} }}{\bf{C}}\; + \;\delta {\bf{D}} = {{\bf{R}}^{\bf \top} }{\bf{R}},$$

where δ is small, e.g., δ = 10−10.

Once the matrix A and vectors b and s have been computed, the vector of fitted for different values of α reduces to a matrix multiplication. Therefore, \({\widehat {\bf{f}}_\alpha }\) can be computed cheaply for several α values. This is particularly useful for automatic smoothing parameter selection.

Justification of Algorithm 1

Now

$${{\bf{R}}^{ - \top }}{\bf{D}}{{\bf{R}}^{ - 1}} = {\bf{U}}{\mathop{\rm diag}\nolimits} ({\bf{s}}){{\bf{U}}^ \top }\quad {\rm{ with }}\quad {{\bf{U}}^ \top }{\bf{U}} = {\bf{I}}.$$

Since U is a square matrix, UT = U−1 and so

$${\bf{D}}\; = \;{{\bf{R}}^ \top }{\bf{U}}{\mathop{\rm diag}\nolimits} ({\bf{s}}){{\bf{U}}^{ - 1}}\;{\bf{R}}.$$

Also,

$${{\bf{C}}^ \top }{\bf{C}}\; = \;{{\bf{R}}^ \top }{\bf{R}} = {{\bf{R}}^ \top }{\bf{U}}{{\bf{U}}^{ - 1}}{\bf{R}}$$

and consequently

$${{\bf{C}}^ \top }{\bf{C}}\; = \,\alpha {\bf{D}} = \;{{\bf{R}}^ \top }{\bf{U}}\{ {\bf{I}}\; + \;\alpha {\rm{diag}}\left( {\bf{s}} \right){{\bf{U}}^{ - 1}}{\bf{R}}.$$

Hence

$$\begin{array}{*{20}{c}} {{{\hat f}_\alpha }}& = &{C{{[{R^T}U\{ I + \alpha \text{diag}(s)\} {U^{ - 1}}R]}^{ - 1}}{C^T}y\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;} \\ \;& = &{(C{R^{ - 1}}U){{\{ \text{diag}(1 + \alpha s)\} }^{ - 1}}{{(C{R^{ - 1}}U)}^T}y = A\left( {\frac{b}{{1 + \alpha s}}} \right)} \end{array}$$

where ACR−1U and bATy.

The new expression for \({\widehat {\bf{f}}_\alpha }\) is thus of the form

$${\widehat {\bf{f}}_\alpha } = {\bf{A}}{\left\{ {{{\bf{A}}^ \top }{\bf{A}}\; + \;\alpha {\rm{diag}}\left( {\bf{s}} \right)} \right\}^{ - 1}}{{\bf{A}}^ \top }{\bf{y}}.$$

Comparison with (9.1) shows that we have effectively replaced the basis functions in C with those in A where this design matrix has the orthogonality property ATA = I. The columns of A correspond to the Demmler-Reinsch basis for the vector space spanned by C. The orthogonality property is crucial for the fast computation over several smoothing parameters.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wand, M.P. Smoothing and mixed models. Computational Statistics 18, 223–249 (2003). https://doi.org/10.1007/s001800300142

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s001800300142

Keywords

Navigation