Abstract
In regression models with a grouping structure among the explanatory variables, variable selection at the group and within group individual variable level is important to improve model accuracy and interpretability. In this article, we propose a hierarchical bi-level variable selection approach for censored survival data in the linear part of a partially linear additive hazards model where the covariates are naturally grouped. The proposed method is capable of conducting simultaneous group selection and individual variable selection within selected groups. Computational algorithms are developed, and the asymptotic rates and selection consistency of the proposed estimators are established. Simulation results indicate that our proposed method outperforms several existing penalties, for example, LASSO, SCAD, and adaptive LASSO. Application of the proposed method is illustrated with the Mayo Clinic primary biliary cirrhosis (PBC) data.



Similar content being viewed by others
References
Aalen O, Borgan O, Gjessing H (2008) Survival and event history analysis: a process point of view. Springer, Berlin
Afzal AR, Dong C, Lu X (2017) Estimation of partly linear additive hazards model with left-truncated and right-censored data. Stat Model 17(6):423–448
Bassendine M, Collins J, Stephenson J, Saunders P, James O (1985) Platelet associated immunoglobulins in primary biliary cirrhosis: A cause of thrombocytopenia? Gut 26(10):1074–1079
Bradic J, Fan J, Jiang J (2011) Regularization for Cox’s proportional hazards model with NP-dimensionality. Ann Stat 39(6):3092–3120
Breheny P (2015) The group exponential lasso for bi-level variable selection. Biometrics 71(3):731–740
Breheny P, Huang J (2009) Penalized methods for bi-level variable selection. Stat Interface 2(3):369–380
Breiman L (1995) Better subset regression using the nonnegative garrote. Technometrics 37(4):373–384
Buckley J (1984) Additive and multiplicative models for relative survival rates. Biometrics 40(1):51–62
Cheng G, Wang X (2011) Semiparametric additive transformation model under current status data. Electron J Stat 5:1735–1764
Ciuperca G (2016) Adaptive group lasso selection in quantile models. Stat Pap:1–25
Cui X, Peng H, Wen S, Zhu L (2013) Component selection in the additive regression model. Scand J Stat 40(3):491–510
De Boor C (1978) A practical guide to splines, vol 27. Springer-Verlag, New York
Dickson ER, Grambsch PM, Fleming TR, Fisher LD, Langworthy A (1989) Prognosis in primary biliary cirrhosis: model for decision making. Hepatology 10(1):1–7
Du P, Ma S, Liang H (2010) Penalized variable selection procedure for Cox models with semiparametric relative risk. Ann Stat 38(4):2092–2117
Fang K, Wang X, Zhang S, Zhu J, Ma S (2015) Bi-level variable selection via adaptive sparse group lasso. J Stat Comput Simul 85(13):2750–2760
Frank LE, Friedman JH (1993) A statistical view of some chemometrics regression tools. Technometrics 35(2):109–135
Friedman J, Hastie T, Höfling H, Tibshirani R (2007) Pathwise coordinate optimization. Ann Appl Stat 1(2):302–332
Gray RJ (1992) Flexible methods for analyzing survival data using splines, with applications to breast cancer prognosis. J Am Stat Assoc 87(420):942–951
Hu Y, Lian H (2013) Variable selection in a partially linear proportional hazards model with a diverging dimensionality. Stat Prob Lett 83(1):61–69
Huang J, Horowitz JL, Wei F (2010) Variable selection in nonparametric additive models. Ann Stat 38(4):2282–2313
Huang J, Liu L, Liu Y, Zhao X (2014) Group selection in the Cox model with a diverging number of covariates. Stat Sin 24(4):1787–1810
Huang J, Ma S, Xie H, Zhang C-H (2009) A group bridge approach for variable selection. Biometrika 96(2):339–355
Jicai L, Zhang R, Zhao W, Lv Y (2016) Variable selection in partially linear hazard regression for multivariate failure time data. J Nonparametric Stat 28(2):375–394
Johnson BA (2009) Rank-based estimation in the l1-regularized partly linear model for censored outcomes with application to integrated analyses of clinical predictors and gene expression data. Biostatistics 10(4):659–666
Kai B, Li R, Zou H (2011) New efficient estimation and variable selection methods for semiparametric varying-coefficient partially linear models. Ann Stat 39(1):305–332
Kanehisa M, Goto S (2000) Kegg: kyoto encyclopedia of genes and genomes. Nucl Acids Res 28(1):27–30
Kosorok MR (2007) Introduction to empirical processes and semiparametric inference. Springer, New York City
Kubota J, Ikeda F, Terada R, Kobashi H, Fujioka S-I, Okamoto R, Baba S, Morimoto Y, Ando M, Makino Y, Taniguchi H, Yasunaka T, Miyake Y, Iwasaki Y, Yamamoto K (2009) Mortality rate of patients with asymptomatic primary biliary cirrhosis diagnosed at age 55 years or older is similar to that of the general population. J Gastroenterol 44(9):1000–1006
Leng C, Ma S (2007) Path consistent model selection in additive risk model via lasso. Stat Med 26(20):3753–3770
Lian H, Li J, Tang X (2014) Scad-penalized regression in additive partially linear proportional hazards models with an ultra-high-dimensional linear part. J Multivar Anal 125:50–64
Liang H, Li R (2009) Variable selection for partially linear models with measurement errors. J Am Stat Assoc 104(485):234–248
Lin D, Ying Z (1994) Semiparametric analysis of the additive risk model. Biometrika 81(1):61–71
Lin W, Lv J (2013) High-dimensional sparse additive hazards regression. J Am Stat Assoc 108(501):247–264
Liu H, Yang H, Xia X (2017) Robust estimation and variable selection in censored partially linear additive models. J Korean Stat Soc 46(1):88–103
Liu J, Zhang R, Zhao W (2014) Hierarchically penalized additive hazards model with diverging number of parameters. Sci China Math 57(4):873–886
Long Q, Chung M, Moreno CS, Johnson BA (2011) Risk prediction for prostate cancer recurrence through regularized estimation with simultaneous adjustment for nonlinear clinical effects. Ann Appl Stat 5(3):2003–2023
Lv J, Yang H, Guo C (2017) Variable selection in partially linear additive models for modal regression. Commun Stat-Simul Comput 46(7):5646–5665
Ma S, Du P (2012) Variable selection in partly linear regression model with diverging dimensions for right censored data. Stat Sin 22(3):1003–1020
Ma S, Huang J (2007) Combining clinical and genomic covariates via Cov-TGDR. Cancer Inform 3:371–378
Martinussen T, Scheike TH (2009) Covariate selection for the semiparametric additive risk model. Scand J Stat 36(4):602–619
Massart P (2000) About the constants in Talagrand’s concentration inequalities for empirical processes. Ann Prob 28(2):863–884
Mogensen UB, Ishwaran H, Gerds TA (2012) Evaluating random forests for survival analysis using prediction error curves. J Stat Softw 50(11):1–23
Ni X, Zhang HH, Zhang D (2009) Automatic model selection for partially linear models. J Multivar Anal 100(9):2100–2111
O’neill TJ (1986) Inconsistency of the misspecified proportional hazards model. Stat Prob Lett 4(5):219–222
Shen X, Ye J (2002) Adaptive model selection. J Am Stat Assoc 97(457):210–221
Simon N, Friedman J, Hastie T, Tibshirani R (2013) A sparse-group lasso. J Comput Graph Stat 22(2):231–245
Talwalkar JA, Lindor KD (2003) Primary biliary cirrhosis. Lancet 362(9377):53–61
van der Vaart A, Wellner J (1997) Weak convergence and empirical processes with applications to statistics. J R Stat Soc-Ser A Stat Soc 160(3):596–608
van der Vaart AW (1998) Asymptotic statistics, vol 3. Cambridge University Press, Cambridge
Wang H, Leng C (2008) A note on adaptive group lasso. Comput Stat Data Anal 52(12):5277–5286
Wang K, Sun X (2017) Efficient parameter estimation and variable selection in partial linear varying coefficient quantile regression model with longitudinal data. Stat Pap:1–29
Wang L, Chen G, Li H (2007) Group scad regression analysis for microarray time course gene expression data. Bioinformatics 23(12):1486–1494
Wang M, Tian G-L (2017) Adaptive group lasso for high-dimensional generalized linear models. Stat Pap:1–18
Wang S, Nan B, Zhu N, Zhu J (2009) Hierarchically penalized Cox regression with grouped variables. Biometrika 96(2):307–322
Wei F, Huang J (2010) Consistent group selection in high-dimensional linear regression. Bernoulli 16(4):1369–1384
Xia X, Yang H (2016) Variable selection for partially time-varying coefficient error-in-variables models. Statistics 50(2):278–297
Xie H, Huang J (2009) Scad-penalized regression in high-dimensional partially linear models. Ann Stat 37(2):673–696
Yang H, Li N, Yang J (2018) A robust and efficient estimation and variable selection method for partially linear models with large-dimensional covariates. Stat Pap:1–27
Yang J, Lu F, Yang H (2017) Quantile regression for robust estimation and variable selection in partially linear varying-coefficient models. Statistics 51(6):1179–1199
Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J R Stat Soc Ser B (Stat Methodol) 68(1):49–67
Zhang C, Xiang Y (2016) On the oracle property of adaptive group lasso in high-dimensional linear models. Stat Pap 57(1):249–265
Zhang HH, Lu W (2007) Adaptive lasso for Cox’s proportional hazards model. Biometrika 94(3):691–703
Zhao P, Xue L (2010) Variable selection for semiparametric varying coefficient partially linear errors-in-variables models. J Multivar Anal 101(8):1872–1883
Zhao P, Yu B (2006) On model selection consistency of lasso. J Mach Learn Res 7:2541–2563
Zhou N, Zhu J (2010) Group variable selection via a hierarchical lasso and its oracle property. arXiv:1006.2871
Zou H (2006) The adaptive lasso and its oracle properties. J Am Stat Assoc 101(476):1418–1429
Zou H (2008) A note on path-based variable selection in the penalized proportional hazards model. Biometrika 95(1):241–247
Acknowledgements
Lu’s research was partially supported by a Discovery Grant (RG/PIN06466-2018) from Natural Sciences and Engineering Research Council (NSERC) of Canada. Yang’s research was supported by the National Natural Science Foundation of China (Grant 11801168), the Natural Science Foundation of Hunan Province (Grant 2018JJ3322), the Scientific Research Fund of Hunan Provincial Education Department (Grant 18B024), and the support of China Scholarship Council for his visiting to University of California, Riverside. The authors are also grateful to the associate editor and the referees for their insightful comments and suggestions that have greatly improved the paper.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Afzal, A.R., Yang, J. & Lu, X. Variable selection in partially linear additive hazards model with grouped covariates and a diverging number of parameters. Comput Stat 36, 829–855 (2021). https://doi.org/10.1007/s00180-020-01062-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-020-01062-3