Abstract
In high-dimensional data settings, sparse model fits are desired, which can be obtained through shrinkage or boosting techniques. We investigate classical shrinkage techniques such as the lasso, which is theoretically known to be biased, new techniques that address this problem, such as elastic net and SCAD, and boosting technique CoxBoost and extensions of it, which allow to incorporate additional structure. To examine, whether these methods, that are designed for or frequently used in high-dimensional survival data analysis, provide sensible results in low-dimensional data settings as well, we consider the well known GBSG breast cancer data. In detail, we study the bias, stability and sparseness of these model fitting techniques via comparison to the maximum likelihood estimate and resampling, and their prediction performance via prediction error curve estimates.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Binder, H.: CoxBoost: Cox models by likelihood based boosting for a single survival endpoint or competing risks. R package version 1.1 (2009)
Binder, H., Schumacher, M.: Adapting prediction error estimates for biased complexity selection in high-dimensional bootstrap samples. Stat. Appl. Genet. Mol. Biol. 7(1), 12 (2008a)
Binder, H., Schumacher, M.: Allowing for mandatory covariates in boosting estimation of sparse high-dimensional survival models. BMC Bioinformatics 9, 14 (2008b)
Binder, H., Schumacher, M.: Adapting the degree of sparseness for estimation of high-dimensional risk prediction models. Manuscript (2009a)
Binder, H., Schumacher, M.: Incorporating pathway information into boosting estimation of high-dimensional risk prediction models. BMC Bioinformatics 10, 18 (2009b)
Binder, H., Tutz, G.: A comparison of methods for the fitting of generalized additive models. Stat. Comput. 18(1), 87–99 (2008)
Bondell, H.D., Reich, B.J.: Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR. Biometrics 64(1), 115–123 (2008)
Bøvelstad, H.M., Nygård, S., Størvold, H.L., Aldrin, M., Borgan, Ø., Frigessi, A., Lingjærde, O.C.: Predicting survival from microarray data—a comparative study. Bioinformatics 23(16), 2080–2087 (2007)
Bühlmann, P., Yu, B.: Boosting with the L2 loss: regression and classification. J. Am. Stat. Assoc. 98(462), 324–339 (2003)
Candes, E., Tao, T.: The Dantzig selector: statistical estimation when p is much larger than n. Ann. Stat. 35(6), 2313–2351 (2007)
Chen, C.-H., George, S.: The bootstrap and identification of prognostic factors via Cox’s proportional hazards regression model. Stat. Med. 4(1), 39–46 (1985)
Copas, J.B.: Regression, prediction and shrinkage. J. R. Stat. Soc., Ser. B (Methodol.) 45(3), 311–354 (1983)
Denison, D.: Boosting with Bayesian stumps. Stat. Comput. 11(2), 171–178 (2001)
Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression. Ann. Stat. 32(2), 407–499 (2004)
Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96(456), 1348–1360 (2001)
Fan, J., Li, R.: Variable selection for Cox’s proportional hazards model and frailty model. Ann. Stat. 30(1), 74–99 (2002)
Frank, I.E., Friedman, J.H.: A statistical view of some chemometrics regression tools. Technometrics 35(2), 109–135 (1993)
Gelman, A.: Scaling regression inputs by dividing by two standard deviations. Stat. Med. 27(15), 2865–2873 (2008)
Gerds, T.A., Schumacher, M.: Efron-type measures of prediction error for survival analysis. Biometrics 63(4), 1283–1287 (2007)
Goeman, J.: Penalized: L1 (lasso) and L2 (ridge) penalized estimation in GLMs and in the Cox model. R package version 0.9-21 (2008)
Hoerl, A.E., Kennard, R.W.: Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12(1), 55–67 (1970)
Johnson, B.A., Peng, L.: Rank-based variable selection. J. Nonparametr. Stat. 20(3), 241–252 (2008)
Leeb, H., Pötscher, B.M.: Can one estimate the conditional distribution of post-model-selection estimators? Ann. Stat. 34(5), 2554–2591 (2006)
Park, M.Y., Hastie, T.: L1 regularization path algorithm for generalized linear models. J. R. Stat. Soc., Ser. B (Stat. Methodol.) 69(4), 659–677 (2007)
Porzelius, C., Binder, H., Schumacher, M.: Parallelized prediction error estimation for evaluation of high-dimensional models. Bioinformatics 25(6), 827–829 (2009). doi:10.1093/bioinformatics/btp062
Qiu, X., Xiao, Y., Gordon, A., Yakovlev, A.: Assessing stability of gene selection in microarray data analysis. BMC Bioinformatics 7(1), 50 (2006)
R Development Core Team: R: A language and environment for statistical computing. Vienna, Austria. ISBN 3-900051-07-0 (2008)
Rosenwald, A., Wright, G., Chan, W.C., Connors, J.M., Campo, E., Fisher, R.I., Gascoyna, R.D., Muller-Hermelink, H.K., Smeland, E.B., Staudt, L.M.: The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma. New Engl. J. Med. 346(25), 1937–1946 (2002)
Sauerbrei, W.: The use of resampling methods to simplify regression models in medical statistics. J. R. Stat. Soc., Ser. C: Appl. Stat. 48(3), 313–329 (1999)
Sauerbrei, W., Royston, P.: Building multivariable prognostic and diagnostic models: Transformation of the predictors by using fractional polynomials. J. R. Stat. Soc., Ser. A: Stat. Soc. 162(1), 71–94 (1999)
Sauerbrei, W., Schumacher, M.: A bootstrap resampling procedure for model building: application to the Cox regression model. Stat. Med. 11(16), 2093–2109 (1992)
Schmid, M., Hothorn, T.: Flexible boosting of accelerated failure time models. BMC Bioinformatics 9(1), 269 (2008)
Schumacher, M., Bastert, G., Bojar, H., Hübner, K., Olschewski, M., Sauerbrei, W., Schmoor, C., Beyerle, C., Newmann, R.L.A., Rauschecker, H.F.: Randomized 2×2 trial evaluating hormonal treatment and the duration of chemotherapy in node-positive breast cancer patients. J. Clin. Oncol. 12(10), 2086–2093 (1994)
Schumacher, M., Holländer, N., Schwarzer, G., Sauerbrei, W.: Prognostic factor studies. In: Crowley, J., Pauler Ankerst, D. (eds.) Handbook of Statistics in Clinical Oncology, pp. 289–333. Chapman & Hall/CRC, London (2006)
Smola, A., Scholkopf, B.: A tutorial on support vector regression. Stat. Comput. 14(3), 199–222 (2004)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc., Ser. B (Methodol.) 58(1), 267–288 (1996)
Tibshirani, R.: The lasso method for variable selection in the Cox model. Stat. Med. 16(4), 385–395 (1997)
Tutz, G., Binder, H.: Boosting ridge regression. Comput. Stat. Data Anal. 51(12), 6044–6059 (2007)
Tutz, G., Ulbricht, J.: Penalized regression with correlation-based penalty. Stat. Comput. 19(3), 239–253 (2008)
Vach, K., Sauerbrei, W., Schumacher, M.: Variable selection and shrinkage: comparison of some approaches. Stat. Neerl. 55(1), 53–75 (2001)
van Wieringen, W.N., Kun, D., Hampel, R., Boulesteix, A.-L.: Survival prediction using gene expression data: a review and comparison. Comput. Stat. Data Anal. 53(5), 1590–1603 (2009)
Verweij, P.J.M., van Houwelingen, H.C.: Cross-validation in survival analysis. Stat. Med. 12(24), 2305–2314 (1993)
Verweij, P.J.M., van Houwelingen, H.C.: Penalized likelihood in Cox regression. Stat. Med. 13(23–24), 2427–2436 (1994)
Zhang, H.H., Lu, W.: Adaptive lasso for Cox’s proportional hazards model. Biometrika 94(3), 691–703 (2007)
Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc., Ser. B 67(2), 301–320 (2005)
Zucknick, M., Richardson, S., Stronach, E.A.: Comparing the characteristics of gene expression profiles derived by univariate and multivariate classification methods. Stat. Appl. Genet. Mol. Biol. 7(1), 7 (2008)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Porzelius, C., Schumacher, M. & Binder, H. Sparse regression techniques in low-dimensional survival data settings. Stat Comput 20, 151–163 (2010). https://doi.org/10.1007/s11222-009-9155-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-009-9155-6