Abstract
In this article, we investigate bi-level variable selection approaches in semiparametric transformation models when a grouping structure of covariates is available. This large class of transformation models includes the Cox proportional hazards model and proportional odds model as special cases. For this class of models, there are only a few works on variable selection and all the selection methods are at individual variable level. To fill the gap of variable selection at both group and individual levels, we propose a penalized nonparametric maximum likelihood estimation method with three different penalties, i.e., group bridge (GB), adaptive group bridge (AGB) and composite group bridge (CGB), and develop their respective computational algorithms. Further, we prove that the resulting estimators from AGB and CGB have desirable oracle properties. Our simulation studies demonstrate that all the three penalties work well in bi-level variable selection, while AGB and CGB outperform GB when within-group sparsity is present. The proposed methods are applied to two real datasets for illustration.
Similar content being viewed by others
References
Andersen PK, Gill RD (1982) Cox’s regression model for counting processes: a large sample study. Ann Stat 10(4):1100–1120
Breheny P (2015) The group exponential lasso for bi-level variable selection. Biometrics 71(3):731–740
Breheny P, Huang J (2009) Penalized methods for bi-level variable selection. Stat Interface 2(3):369–380
Chen L, Lin D, Zeng D (2012) Checking semiparametric transformation models with censored data. Biostatistics 13(1):18–31
Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96(456):1348–1360
Fan J, Li R (2002) Variable selection for cox’s proportional hazards model and frailty model. Ann Stat 30(1):74–99
Fu WJ (1998) Penalized regressions: the bridge versus the lasso. J Comput Graph Stat 7(3):397–416
Huang J, Breheny P, Ma S (2012) A selective review of group selection in high-dimensional models. Stat Sci Rev J Inst Math Stat 27(4):481–499
Huang J, Liu L, Liu Y, Zhao X (2014) Group selection in the cox model with a diverging number of covariates. Stat Sin 24(4):1787–1810
Huang J, Ma S, Xie H, Zhang C-H (2009) A group bridge approach for variable selection. Biometrika 96(2):339–355
Li J, Gu M (2012) Adaptive lasso for general transformation models with right censored data. Comput Stat Data Anal 56(8):2583–2597
Li Chenxi, Pak D, Todem D (2020) Adaptive lasso for the cox regression with interval censored and possibly left truncated data. Stat Methods Med Res 29(4):1243–1255
Li Shuwei, Wu Q, Sun J (2020) Penalized estimation of semiparametric transformation models with interval-censored data and application to alzheimer’s disease. Stat Methods Med Res 29(8):2151–2166
Liu X, Zeng D (2013) Variable selection in semiparametric transformation models for right-censored data. Biometrika 100(4):859–876
Lu W, Zhang HH (2007) Variable selection for proportional odds model. Stat Med 26(20):3771–3781
Seetharaman I (2013) Consistent bi-level variable selection via composite group bridge penalized regression. Ph. D. thesis, Kansas State University
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodol) 58(1):267–288
Tibshirani R et al (1997) The lasso method for variable selection in the cox model. Stat Med 16(4):385–395
Wang C (2016) Variable selection through adaptive elastic net for proportional odds model. Ph.D. thesis, The University of Texas at San Antonio
Wang L, Chen G, Li H (2007) Group scad regression analysis for microarray time course gene expression data. Bioinformatics 23(12):1486–1494
Wang S, Nan B, Zhu N, Zhu J (2009) Hierarchically penalized cox regression with grouped variables. Biometrika 96(2):307–322
Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J R Stat Soc Ser B (Stat Methodol) 68(1):49–67
Zeng D, Lin D (2006) Efficient estimation of semiparametric transformation models for counting processes. Biometrika 93(3):627–640
Zeng D, Lin D (2007a) Maximum likelihood estimation in semiparametric regression models with censored data. J R Stat Soc Ser B (Stat Methodol) 69(4):507–564
Zeng D, Lin D (2007b) Semiparametric transformation models with random effects for recurrent events. J Am Stat Assoc 102(477):167–180
Zeng D, Lin D (2010) A general asymptotic theory for maximum likelihood estimation in semiparametric regression models with censored data. Stat Sin 20(2):871–910
Zhang HH, Cheng G, Liu Y (2011) Linear or nonlinear? Automatic structure discovery for partially linear models. J Am Stat Assoc 106(495):1099–1112
Zhang HH, Lu W (2007) Adaptive lasso for cox’s proportional hazards model. Biometrika 94(3):691–703
Zou H (2006) The adaptive lasso and its oracle properties. J Am Stat Assoc 101(476):1418–1429
Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B (Stat Methodol) 67(2):301–320
Acknowledgements
The authors acknowledge with gratitude the support for this research via Discovery Grants from the Natural Sciences and Engineering Research Council (NSERC) of Canada. The authors are also grateful to editor, the associate editor, and the referees for their insightful comments and suggestions that have greatly improved the paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Zhong, W., Lu, X. & Wu, J. Bi-level variable selection in semiparametric transformation models with right-censored data. Comput Stat 36, 1661–1692 (2021). https://doi.org/10.1007/s00180-021-01075-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-021-01075-6