Weighted least squares model averaging for accelerated failure time models
Introduction
In survival analysis, the accelerated failure time (AFT) model has received extensive attention and has become an important alternative to Cox models, since it is more natural and direct in describing the covariates effects on the event time than Cox models (Kalbfleisch and Prentice, 2011). Various strategies have been proposed to estimate the parameters in the AFT model, including Miller's estimator (Miller, 1976), Buckley-James estimator (Buckley and James, 1979; Jin et al., 2006), KSV estimator (Koul et al., 1981), and in this paper, the weighted least squares (WLS) estimator (Stute, 1993, Stute, 1996; He and Huang, 2003). Compared with other estimators, the WLS estimator has three major advantages. Firstly, it is easy to be carried out because no iterations are required. Next, it has consistency and asymptotic normality under reasonable assumptions. Lastly, comprehensive simulation studies in Bao et al. (2007) show that it performs much better than the other estimators, particularly when the number of covariates is large or the censoring is heavy.
In some practical problems, we need to choose useful covariates from many potential ones. Earlier model selection methods were based on information criteria such as AIC and BIC. Later, regularization methods become popular, including Tibshirani (1996); Fan and Li (2001); Zou (2006); Lv and Fan (2009); Dai et al. (2018). About model selection in the AFT model, there are some methods based on the penalized weighted least squares estimator, such as Huang et al. (2006); Hu and Chai (2013); Cheng et al. (2022). However, when a single model is not overwhelmingly supported by the data, these model selection methods may ignore contributions of other candidate models and suffer from the model selection uncertainty and bias problem (Hjort and Claeskens, 2003). More importantly, when the data change, different model selection methods or criteria may lead to different optimal models.
To address these issues and improve prediction accuracy, various model averaging approaches have been proposed by exploiting all information from every candidate model. Inspired by AIC and BIC, Buckland et al. (1997) proposed smoothed AIC (SAIC) and smoothed BIC (SBIC) methods. Hjort and Claeskens (2003) proposed a local misspecification framework to establish properties of model averaging estimators. Hansen (2007) proposed a model averaging estimator with weights selected by minimizing a Mallows criterion. This Mallows model averaging (MMA) estimator asymptotically achieves the smallest possible squared error in the class of model averaging estimators. Wan et al. (2010) modified the conditions of Hansen (2007) by allowing non-nested candidate models and continuous weights. These improvements make the conditions of MMA more natural at the cost of limiting the number of candidate models. Another important model averaging criterion is the Jackknife model averaging (JMA) proposed by Hansen and Racine (2012), which selects the weights by minimizing a cross-validation criterion and has significantly lower MSE than MMA when the errors are heteroskedastic.
In survival analysis, MMA and JMA, the most representative frequentist model averaging criteria, have not been used until recently. Under the proportional hazards model assumption, He et al. (2020) improved the prediction accuracy of the integral intensity function by JMA, and Li et al. (2021) proposed a semiparametric model averaging prediction method to approximate the nonparametric regression function by a weighted sum of low-dimensional nonparametric submodels.
As for the AFT model, Yan et al. (2021) proposed a high dimensional JMA procedure, where the penalized Buckley-James method (Wang et al., 2008) was used to obtain the coefficient estimators. However, the convergence of Buckley-James estimate cannot be guaranteed, and the possible overlap of variables in different candidate models is not considered in Yan et al. (2021). Recently, Liang et al. (2022) proposed another model averaging method based on KSV estimate and MMA criterion. As specified by Bao et al. (2007), in many cases, the effect of KSV estimator is not as good as WLS estimator. Moreover, constructing a linear model for the synthetic response may face the problem of excessive error variance.
Therefore in this paper, we propose the weighted least squares model averaging (WLSMA) method under the AFT model, where the averaging weights are selected by minimizing a MMA criterion. We show that the proposed method has asymptotic optimality in the sense of Li (1986). In particular, as the variances of error terms are unknown in many applications, we also consider the estimation of variance in the Mallows criterion and prove that even when the variances of the error terms are estimated and the feasible weighted least squares estimators are averaged, our method still has asymptotic optimality, which is the most important theoretical property of model averaging method and one of main theoretical contributions of this paper. Besides, our method allows continuous weights, and the variables in each candidate model can be overlapped, which greatly improves the flexibility and applicability of the method. Extensive simulation shows that our WLSMA method outperforms many existing model selection and model averaging methods. In the empirical study of the PBC dataset, WLSMA method has also obtained good prediction accuracy.
The rest of the paper is organized as follows. We begin in Section 2 with the description of some notations, the AFT model and the WLS estimate. In Section 3, we propose our WLSMA method and present the asymptotic optimality of this new method. Sections 4 and 5 report the simulation results and the application in the PBC dataset. Finally, we provide some concluding remarks in Section 6 and outline the proofs of the theorems in the Appendix.
Section snippets
Notations and model
Let T and V denote the survival time and censored time, respectively. and . denotes the covariate matrix for N independent observations, where the dimension of is countably infinite. The AFT model assumes with and . We consider a sequence of linear approximating models , where the mth model, with any regressors belonging to , takes the form of
The proposed model averaging method
Let n denote the number of uncensored observations in all N observations, and denote the diagonal matrix consisting of the non-zero elements in . Similarly, denote as the submatrix of composed of the n uncensored individuals' covariates under the mth candidate model.
Since the weight for the censored individuals, the WLS estimator (2) of in the mth () model can be rewritten as where is the corresponding uncensored
Simulation
In the simulation study, the data are generated from the AFT model, , where follows the normal distribution . The censoring time is generated from . By adjusting the value of , the censoring rate(CR) is about 20%, 35% and 50%. We set and , and consider different cases about the selection of candidate models and true values of β.
- Case 1
(The nested models): We assume that only the first covariates could be observed. The mth model uses
Applications
In this section, we will evaluate the prediction performance of the proposed WLSMA method in a real dataset. The Mayo Clinic has established a dataset of 424 patients with primary biliary cirrhosis (PBC), which includes complete data on 17 covariates from 276 patients. The survival time of interest is the days between registration and death. Patients who underwent liver transplantation or were still alive at the end of the study were considered right censored. Since we can never know the real
Discussion
In order to overcome model selection uncertainty and improve prediction accuracy, we propose a WLS model average method based on the Mallows criterion for the AFT model with right censored data in this paper. Simulation results demonstrate the good performance of the proposed WLSMA method. In addition, the asymptotic optimality is also proved under certain mild conditions.
Note that although the proposed method does not require nested candidate models, the construction of candidate models is
References (32)
- et al.
The Koul–Susarla–Van Ryzin and weighted least squares estimates for censored linear regression model: a comparative study
Comput. Stat. Data Anal.
(2007) - et al.
Broken adaptive ridge regression and its asymptotic properties
J. Multivar. Anal.
(2018) - et al.
Jackknife model averaging
J. Econom.
(2012) - et al.
Adjusted regularized estimation in the accelerated failure time model with high dimensional covariates
J. Multivar. Anal.
(2013) Consistent estimation under random censorship when covariables are present
J. Multivar. Anal.
(1993)- et al.
Least squares model averaging by Mallows criterion
J. Econom.
(2010) - et al.
Optimal model averaging forecasting in high-dimensional survival analysis
Int. J. Forecast.
(2021) - et al.
Model selection: an integral part of inference
Biometrics
(1997) - et al.
Linear regression with censored data
Biometrika
(1979) - et al.
-regularized high-dimensional accelerated failure time model
Comput. Stat. Data Anal.
(2022)
Prognosis in primary biliary cirrhosis: model for decision making
Hepatology
Least angle regression
Ann. Stat.
Variable selection via nonconcave penalized likelihood and its oracle properties
J. Am. Stat. Assoc.
Least squares model averaging
Econometrica
Functional martingale residual process for high-dimensional Cox regression with model averaging
J. Mach. Learn. Res.
Central limit theorem of linear regression model under right censorship
Sci. China Ser. A, Math.
Cited by (2)
Model averaging for right censored data with measurement error
2024, Lifetime Data Analysis