Abstract
Penalized estimation has become an established tool for regularization and model selection in regression models. A variety of penalties with specific features are available and effective algorithms for specific penalties have been proposed. But not much is available to fit models with a combination of different penalties. When modeling the rent data of Munich as in our application, various types of predictors call for a combination of a Ridge, a group Lasso and a Lasso-type penalty within one model. We propose to approximate penalties that are (semi-)norms of scalar linear transformations of the coefficient vector in generalized structured models—such that penalties of various kinds can be combined in one model. The approach is very general such that the Lasso, the fused Lasso, the Ridge, the smoothly clipped absolute deviation penalty, the elastic net and many more penalties are embedded. The computation is based on conventional penalized iteratively re-weighted least squares algorithms and hence, easy to implement. New penalties can be incorporated quickly. The approach is extended to penalties with vector based arguments. There are several possibilities to choose the penalty parameter(s). A software implementation is available. Some illustrative examples show promising results.
Access this article
We’re sorry, something doesn't seem to be working properly.
Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.
Similar content being viewed by others
References
Antoniadis A, Fan J (2001) Regularization of wavelet approximations. J Am Stat Assoc 96(455):939–967
Bondell HD, Reich BJ (2009) Simultaneous factor selection and collapsing levels in ANOVA. Biometrics 65(1):169–177
Claeskens G, Hjort NL (2008) Minimizing average risk in regression models. Econom Theory 24(2):493–527
de Rooi J, Eilers P (2011) Deconvolution of pulse trains with the L0 penalty. Analytica Chimica Acta 705:218–226
Donoho D, Elad M (2003) Optimally sparse representation in general (nonorthogonal) dictionaries via \(l^1\) minimization. Proc Natl Acad Sci 100(5):2197–2202
Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least anlge regression. Ann Stat 32:407–499
Eilers PHC, Marx BD (1996) Flexible smoothing with b-splines and penalties. Stat Sci 11(2):89–121
Fahrmeir L, Belitz C, Biller C, Brezger A, Heim S, Hennerfeind A, Jerak A (2007) Statistik. Dokumentation und Analysen, Landeshauptstadt München, Sozialreferat, Amt für Wohnen und Migration
Fahrmeir L, Kneib T, Konrath S (2010) Bayesian regularisation in structured additive regression: a unifying perspective on shrinkage, smoothing and predictor selection. Stat Comput 20(2):203–219
Fahrmeir L, Kneib T, Lang S (2004) Penalized structured additive regression for space-time data: a bayesian perspective. Stat Sinica 14(3):715–745
Fahrmeir L, Tutz G (2001) Multivariate statistical modelling based on generalized linear models. Springer, New York
Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96(456):1348–1360
Frank lE, Friedman JH (1993) A statistical view of some chemometrics regression tools. Technometrics 35(2):109–135
Friedman JH, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33(1):1–22 (glmnet, R package version 1.9-8)
Gertheiss J, Hogger S, Oberhauser C, Tutz G (2011) Selection of ordinally scaled independent variables with applications to international classification of functioning core sets. R Stat Soc Ser C Appl Stat 60(3):377–395
Gertheiss J, Tutz G (2010) Sparse modeling of categorial explanatory variables. Ann Appl Stat 4(4):2150–2180
Gertheiss J, Tutz G (2012) Regularization and model selection with categorial effect modifiers. Stat Sinica 22(3):957–982
GIMP Team (2012) GNU Image Manipulation Program. http://www.gimp.org
Goeman JJ (2010) L1 penalized estimation in the cox proportional hazards model. Biom J 52(1):70–84
Hastie T, Efron B (2013) lars: Least angle regression, Lasso and forward stagewise. R package version 1:2
Hoerl AE, Kennard RW (1970) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12(1):55–67
Kneib T, Heinzl F, Brezger A, Bove DS, Klein N (2014) BayesX: R utilities accompanying the software package BayesX. R package versions 0.2-6
Koch I (1996) On the asymptotic performance of median smoothers in image analysis and nonparametric regression. Ann Stat 24(4):1648–1666
Marx BD, Eilers PHC (1998) Direct generalized additive modeling with penalized likelihood. J Comput Stat Data Anal 28:193–209
McCullagh P, Nelder JA (1983) Generalized linear models. Chapman & Hall, London
Meier L (2013) grplasso: Fitting user specified models with group Lasso penalty. R package version 0.4-5
Meier L, van de Geer S, Bnhlmann P (2008) The group Lasso for logistic regression. R Stat Soc Ser B Stat Methodol 70(1):53–71
Meier L, van de Geer S, Bnhlmann P (2009) High-dimensional additive modeling. Ann Stat 37(6B):3779–3821
Oelker M-R (2015) gvcm.cat: Regularized categorial effects/categorial effect modifiers in GLMs. R package version 1.9
Osborne MR, Turlach BA (2011) A homotopy algorithm for the quantile regression lasso and related piecewise linear problems. J Comput Graph Stat 20(4):972–987
R Core Team (2015) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. R version 3.1.3 (2015-03-09)
Rippe RCA, Meulman JJ, Eilers PHC (2012) Visualization of genomic changes by segmented smoothing using an \(l_0\) penalty. PLoS One 7(6):1–14
Tibshirani R (1996) Regression shrinkage and selection via the LASSO. R Stat Soc Ser B Stat Methodol 58(1):267–288
Tibshirani R, Saunders M, Rosset J, Zhu J, Knight K (2005) Sparsity and smoothness via the fused LASSO. R Stat Soc Ser B Stat Methodol 67(1):91–108
Ulbricht J (2010) Variable selection in generalized linear models. Dissertation, Department of Statistics, Ludwig-Maximilians-Universität München, Verlag Dr. Hut
Verbyla AP, Cullis BR, Kenward MG, Welham SJ (1999) The analysis of designed experiments and longitudinal data by using smoothing splines. J R Stat Soc Ser C (Appl Stat) 48(3):269–311
Wang H, Leng C (2008) A note on adaptive group lasso. J Comput Stat Data Anal 52:5277–5286
Wood S (2004) Stable and efficient multiple smoothing parameter estimation for generalized additive models. J Am Stat Assoc 99:673–686
Wood S (2011) Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models. J R Stat Soc Ser B 73(1):3–36 (mgcv, R package versions 1.8-4)
Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. R Stat Soc Ser B Stat Methodol 68(1):49–67
Zou H (2006) The adaptive LASSO and its oracle properties. J Am Stat Assoc 101(476):1418–1429
Zou H, Hastie T (2005) Regularization and variable selection via the Elastic net. R Stat Soc Ser B Stat Methodol 67(2):301–320
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Oelker, MR., Tutz, G. A uniform framework for the combination of penalties in generalized structured models. Adv Data Anal Classif 11, 97–120 (2017). https://doi.org/10.1007/s11634-015-0205-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11634-015-0205-y
Keywords
- Model selection
- Penalties
- Generalized linear model (GLM)
- Structured regression
- Ridge
- Lasso
- Group Lasso
- SCAD
- Elastic net
- Fused Lasso