Skip to main content
Log in

Variable selection in partially linear additive hazards model with grouped covariates and a diverging number of parameters

  • Original paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

In regression models with a grouping structure among the explanatory variables, variable selection at the group and within group individual variable level is important to improve model accuracy and interpretability. In this article, we propose a hierarchical bi-level variable selection approach for censored survival data in the linear part of a partially linear additive hazards model where the covariates are naturally grouped. The proposed method is capable of conducting simultaneous group selection and individual variable selection within selected groups. Computational algorithms are developed, and the asymptotic rates and selection consistency of the proposed estimators are established. Simulation results indicate that our proposed method outperforms several existing penalties, for example, LASSO, SCAD, and adaptive LASSO. Application of the proposed method is illustrated with the Mayo Clinic primary biliary cirrhosis (PBC) data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Aalen O, Borgan O, Gjessing H (2008) Survival and event history analysis: a process point of view. Springer, Berlin

    Book  MATH  Google Scholar 

  • Afzal AR, Dong C, Lu X (2017) Estimation of partly linear additive hazards model with left-truncated and right-censored data. Stat Model 17(6):423–448

    Article  MathSciNet  MATH  Google Scholar 

  • Bassendine M, Collins J, Stephenson J, Saunders P, James O (1985) Platelet associated immunoglobulins in primary biliary cirrhosis: A cause of thrombocytopenia? Gut 26(10):1074–1079

    Article  Google Scholar 

  • Bradic J, Fan J, Jiang J (2011) Regularization for Cox’s proportional hazards model with NP-dimensionality. Ann Stat 39(6):3092–3120

    Article  MathSciNet  MATH  Google Scholar 

  • Breheny P (2015) The group exponential lasso for bi-level variable selection. Biometrics 71(3):731–740

    Article  MathSciNet  MATH  Google Scholar 

  • Breheny P, Huang J (2009) Penalized methods for bi-level variable selection. Stat Interface 2(3):369–380

    Article  MathSciNet  MATH  Google Scholar 

  • Breiman L (1995) Better subset regression using the nonnegative garrote. Technometrics 37(4):373–384

    Article  MathSciNet  MATH  Google Scholar 

  • Buckley J (1984) Additive and multiplicative models for relative survival rates. Biometrics 40(1):51–62

    Article  MathSciNet  Google Scholar 

  • Cheng G, Wang X (2011) Semiparametric additive transformation model under current status data. Electron J Stat 5:1735–1764

    Article  MathSciNet  MATH  Google Scholar 

  • Ciuperca G (2016) Adaptive group lasso selection in quantile models. Stat Pap:1–25

  • Cui X, Peng H, Wen S, Zhu L (2013) Component selection in the additive regression model. Scand J Stat 40(3):491–510

    Article  MathSciNet  MATH  Google Scholar 

  • De Boor C (1978) A practical guide to splines, vol 27. Springer-Verlag, New York

    Book  MATH  Google Scholar 

  • Dickson ER, Grambsch PM, Fleming TR, Fisher LD, Langworthy A (1989) Prognosis in primary biliary cirrhosis: model for decision making. Hepatology 10(1):1–7

    Article  Google Scholar 

  • Du P, Ma S, Liang H (2010) Penalized variable selection procedure for Cox models with semiparametric relative risk. Ann Stat 38(4):2092–2117

    Article  MathSciNet  MATH  Google Scholar 

  • Fang K, Wang X, Zhang S, Zhu J, Ma S (2015) Bi-level variable selection via adaptive sparse group lasso. J Stat Comput Simul 85(13):2750–2760

    Article  MathSciNet  MATH  Google Scholar 

  • Frank LE, Friedman JH (1993) A statistical view of some chemometrics regression tools. Technometrics 35(2):109–135

    Article  MATH  Google Scholar 

  • Friedman J, Hastie T, Höfling H, Tibshirani R (2007) Pathwise coordinate optimization. Ann Appl Stat 1(2):302–332

    Article  MathSciNet  MATH  Google Scholar 

  • Gray RJ (1992) Flexible methods for analyzing survival data using splines, with applications to breast cancer prognosis. J Am Stat Assoc 87(420):942–951

    Article  Google Scholar 

  • Hu Y, Lian H (2013) Variable selection in a partially linear proportional hazards model with a diverging dimensionality. Stat Prob Lett 83(1):61–69

    Article  MathSciNet  MATH  Google Scholar 

  • Huang J, Horowitz JL, Wei F (2010) Variable selection in nonparametric additive models. Ann Stat 38(4):2282–2313

    Article  MathSciNet  MATH  Google Scholar 

  • Huang J, Liu L, Liu Y, Zhao X (2014) Group selection in the Cox model with a diverging number of covariates. Stat Sin 24(4):1787–1810

    MathSciNet  MATH  Google Scholar 

  • Huang J, Ma S, Xie H, Zhang C-H (2009) A group bridge approach for variable selection. Biometrika 96(2):339–355

    Article  MathSciNet  MATH  Google Scholar 

  • Jicai L, Zhang R, Zhao W, Lv Y (2016) Variable selection in partially linear hazard regression for multivariate failure time data. J Nonparametric Stat 28(2):375–394

    Article  MathSciNet  MATH  Google Scholar 

  • Johnson BA (2009) Rank-based estimation in the l1-regularized partly linear model for censored outcomes with application to integrated analyses of clinical predictors and gene expression data. Biostatistics 10(4):659–666

    Article  MATH  Google Scholar 

  • Kai B, Li R, Zou H (2011) New efficient estimation and variable selection methods for semiparametric varying-coefficient partially linear models. Ann Stat 39(1):305–332

    Article  MathSciNet  MATH  Google Scholar 

  • Kanehisa M, Goto S (2000) Kegg: kyoto encyclopedia of genes and genomes. Nucl Acids Res 28(1):27–30

    Article  Google Scholar 

  • Kosorok MR (2007) Introduction to empirical processes and semiparametric inference. Springer, New York City

    MATH  Google Scholar 

  • Kubota J, Ikeda F, Terada R, Kobashi H, Fujioka S-I, Okamoto R, Baba S, Morimoto Y, Ando M, Makino Y, Taniguchi H, Yasunaka T, Miyake Y, Iwasaki Y, Yamamoto K (2009) Mortality rate of patients with asymptomatic primary biliary cirrhosis diagnosed at age 55 years or older is similar to that of the general population. J Gastroenterol 44(9):1000–1006

    Article  Google Scholar 

  • Leng C, Ma S (2007) Path consistent model selection in additive risk model via lasso. Stat Med 26(20):3753–3770

    Article  MathSciNet  Google Scholar 

  • Lian H, Li J, Tang X (2014) Scad-penalized regression in additive partially linear proportional hazards models with an ultra-high-dimensional linear part. J Multivar Anal 125:50–64

    Article  MathSciNet  MATH  Google Scholar 

  • Liang H, Li R (2009) Variable selection for partially linear models with measurement errors. J Am Stat Assoc 104(485):234–248

    Article  MathSciNet  MATH  Google Scholar 

  • Lin D, Ying Z (1994) Semiparametric analysis of the additive risk model. Biometrika 81(1):61–71

    Article  MathSciNet  MATH  Google Scholar 

  • Lin W, Lv J (2013) High-dimensional sparse additive hazards regression. J Am Stat Assoc 108(501):247–264

    Article  MathSciNet  MATH  Google Scholar 

  • Liu H, Yang H, Xia X (2017) Robust estimation and variable selection in censored partially linear additive models. J Korean Stat Soc 46(1):88–103

    Article  MathSciNet  MATH  Google Scholar 

  • Liu J, Zhang R, Zhao W (2014) Hierarchically penalized additive hazards model with diverging number of parameters. Sci China Math 57(4):873–886

    Article  MathSciNet  MATH  Google Scholar 

  • Long Q, Chung M, Moreno CS, Johnson BA (2011) Risk prediction for prostate cancer recurrence through regularized estimation with simultaneous adjustment for nonlinear clinical effects. Ann Appl Stat 5(3):2003–2023

    Article  MATH  Google Scholar 

  • Lv J, Yang H, Guo C (2017) Variable selection in partially linear additive models for modal regression. Commun Stat-Simul Comput 46(7):5646–5665

    Article  MathSciNet  MATH  Google Scholar 

  • Ma S, Du P (2012) Variable selection in partly linear regression model with diverging dimensions for right censored data. Stat Sin 22(3):1003–1020

    Article  MathSciNet  MATH  Google Scholar 

  • Ma S, Huang J (2007) Combining clinical and genomic covariates via Cov-TGDR. Cancer Inform 3:371–378

    Article  Google Scholar 

  • Martinussen T, Scheike TH (2009) Covariate selection for the semiparametric additive risk model. Scand J Stat 36(4):602–619

    Article  MathSciNet  MATH  Google Scholar 

  • Massart P (2000) About the constants in Talagrand’s concentration inequalities for empirical processes. Ann Prob 28(2):863–884

    Article  MathSciNet  MATH  Google Scholar 

  • Mogensen UB, Ishwaran H, Gerds TA (2012) Evaluating random forests for survival analysis using prediction error curves. J Stat Softw 50(11):1–23

    Article  Google Scholar 

  • Ni X, Zhang HH, Zhang D (2009) Automatic model selection for partially linear models. J Multivar Anal 100(9):2100–2111

    Article  MathSciNet  MATH  Google Scholar 

  • O’neill TJ (1986) Inconsistency of the misspecified proportional hazards model. Stat Prob Lett 4(5):219–222

    Article  MathSciNet  MATH  Google Scholar 

  • Shen X, Ye J (2002) Adaptive model selection. J Am Stat Assoc 97(457):210–221

    Article  MathSciNet  MATH  Google Scholar 

  • Simon N, Friedman J, Hastie T, Tibshirani R (2013) A sparse-group lasso. J Comput Graph Stat 22(2):231–245

    Article  MathSciNet  Google Scholar 

  • Talwalkar JA, Lindor KD (2003) Primary biliary cirrhosis. Lancet 362(9377):53–61

    Article  Google Scholar 

  • van der Vaart A, Wellner J (1997) Weak convergence and empirical processes with applications to statistics. J R Stat Soc-Ser A Stat Soc 160(3):596–608

    Google Scholar 

  • van der Vaart AW (1998) Asymptotic statistics, vol 3. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  • Wang H, Leng C (2008) A note on adaptive group lasso. Comput Stat Data Anal 52(12):5277–5286

    Article  MathSciNet  MATH  Google Scholar 

  • Wang K, Sun X (2017) Efficient parameter estimation and variable selection in partial linear varying coefficient quantile regression model with longitudinal data. Stat Pap:1–29

  • Wang L, Chen G, Li H (2007) Group scad regression analysis for microarray time course gene expression data. Bioinformatics 23(12):1486–1494

    Article  Google Scholar 

  • Wang M, Tian G-L (2017) Adaptive group lasso for high-dimensional generalized linear models. Stat Pap:1–18

  • Wang S, Nan B, Zhu N, Zhu J (2009) Hierarchically penalized Cox regression with grouped variables. Biometrika 96(2):307–322

    Article  MathSciNet  MATH  Google Scholar 

  • Wei F, Huang J (2010) Consistent group selection in high-dimensional linear regression. Bernoulli 16(4):1369–1384

    Article  MathSciNet  MATH  Google Scholar 

  • Xia X, Yang H (2016) Variable selection for partially time-varying coefficient error-in-variables models. Statistics 50(2):278–297

    MathSciNet  MATH  Google Scholar 

  • Xie H, Huang J (2009) Scad-penalized regression in high-dimensional partially linear models. Ann Stat 37(2):673–696

    Article  MathSciNet  MATH  Google Scholar 

  • Yang H, Li N, Yang J (2018) A robust and efficient estimation and variable selection method for partially linear models with large-dimensional covariates. Stat Pap:1–27

  • Yang J, Lu F, Yang H (2017) Quantile regression for robust estimation and variable selection in partially linear varying-coefficient models. Statistics 51(6):1179–1199

    Article  MathSciNet  MATH  Google Scholar 

  • Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J R Stat Soc Ser B (Stat Methodol) 68(1):49–67

    Article  MathSciNet  MATH  Google Scholar 

  • Zhang C, Xiang Y (2016) On the oracle property of adaptive group lasso in high-dimensional linear models. Stat Pap 57(1):249–265

    Article  MathSciNet  MATH  Google Scholar 

  • Zhang HH, Lu W (2007) Adaptive lasso for Cox’s proportional hazards model. Biometrika 94(3):691–703

    Article  MathSciNet  MATH  Google Scholar 

  • Zhao P, Xue L (2010) Variable selection for semiparametric varying coefficient partially linear errors-in-variables models. J Multivar Anal 101(8):1872–1883

    Article  MathSciNet  MATH  Google Scholar 

  • Zhao P, Yu B (2006) On model selection consistency of lasso. J Mach Learn Res 7:2541–2563

    MathSciNet  MATH  Google Scholar 

  • Zhou N, Zhu J (2010) Group variable selection via a hierarchical lasso and its oracle property. arXiv:1006.2871

  • Zou H (2006) The adaptive lasso and its oracle properties. J Am Stat Assoc 101(476):1418–1429

    Article  MathSciNet  MATH  Google Scholar 

  • Zou H (2008) A note on path-based variable selection in the penalized proportional hazards model. Biometrika 95(1):241–247

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

Lu’s research was partially supported by a Discovery Grant (RG/PIN06466-2018) from Natural Sciences and Engineering Research Council (NSERC) of Canada. Yang’s research was supported by the National Natural Science Foundation of China (Grant 11801168), the Natural Science Foundation of Hunan Province (Grant 2018JJ3322), the Scientific Research Fund of Hunan Provincial Education Department (Grant 18B024), and the support of China Scholarship Council for his visiting to University of California, Riverside. The authors are also grateful to the associate editor and the referees for their insightful comments and suggestions that have greatly improved the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xuewen Lu.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 251 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Afzal, A.R., Yang, J. & Lu, X. Variable selection in partially linear additive hazards model with grouped covariates and a diverging number of parameters. Comput Stat 36, 829–855 (2021). https://doi.org/10.1007/s00180-020-01062-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-020-01062-3

Keywords

Navigation