Weighted least squares model averaging for accelerated failure time models

https://doi.org/10.1016/j.csda.2023.107743Get rights and content

Abstract

This paper proposes a new model averaging method for the accelerated failure time models with right censored data. A weighted least squares procedure is used to estimate the parameters of candidate models. In this procedure, the candidate models are not required to be nested, and the weights selected by Mallows criterion are not limited to be discrete, which make the proposed method very flexible and general. The asymptotic optimality of the proposed method is proved under some mild conditions. Particularly, it is shown that the optimality remains valid even when the variances of the error terms are estimated and the feasible weighted least squares estimators are averaged. Simulation studies show that the proposed method has better prediction performance than many popular model selection or model averaging methods when all candidate models are misspecified. Finally, an application about primary biliary cirrhosis is provided.

Introduction

In survival analysis, the accelerated failure time (AFT) model has received extensive attention and has become an important alternative to Cox models, since it is more natural and direct in describing the covariates effects on the event time than Cox models (Kalbfleisch and Prentice, 2011). Various strategies have been proposed to estimate the parameters in the AFT model, including Miller's estimator (Miller, 1976), Buckley-James estimator (Buckley and James, 1979; Jin et al., 2006), KSV estimator (Koul et al., 1981), and in this paper, the weighted least squares (WLS) estimator (Stute, 1993, Stute, 1996; He and Huang, 2003). Compared with other estimators, the WLS estimator has three major advantages. Firstly, it is easy to be carried out because no iterations are required. Next, it has consistency and asymptotic normality under reasonable assumptions. Lastly, comprehensive simulation studies in Bao et al. (2007) show that it performs much better than the other estimators, particularly when the number of covariates is large or the censoring is heavy.

In some practical problems, we need to choose useful covariates from many potential ones. Earlier model selection methods were based on information criteria such as AIC and BIC. Later, regularization methods become popular, including Tibshirani (1996); Fan and Li (2001); Zou (2006); Lv and Fan (2009); Dai et al. (2018). About model selection in the AFT model, there are some methods based on the penalized weighted least squares estimator, such as Huang et al. (2006); Hu and Chai (2013); Cheng et al. (2022). However, when a single model is not overwhelmingly supported by the data, these model selection methods may ignore contributions of other candidate models and suffer from the model selection uncertainty and bias problem (Hjort and Claeskens, 2003). More importantly, when the data change, different model selection methods or criteria may lead to different optimal models.

To address these issues and improve prediction accuracy, various model averaging approaches have been proposed by exploiting all information from every candidate model. Inspired by AIC and BIC, Buckland et al. (1997) proposed smoothed AIC (SAIC) and smoothed BIC (SBIC) methods. Hjort and Claeskens (2003) proposed a local misspecification framework to establish properties of model averaging estimators. Hansen (2007) proposed a model averaging estimator with weights selected by minimizing a Mallows criterion. This Mallows model averaging (MMA) estimator asymptotically achieves the smallest possible squared error in the class of model averaging estimators. Wan et al. (2010) modified the conditions of Hansen (2007) by allowing non-nested candidate models and continuous weights. These improvements make the conditions of MMA more natural at the cost of limiting the number of candidate models. Another important model averaging criterion is the Jackknife model averaging (JMA) proposed by Hansen and Racine (2012), which selects the weights by minimizing a cross-validation criterion and has significantly lower MSE than MMA when the errors are heteroskedastic.

In survival analysis, MMA and JMA, the most representative frequentist model averaging criteria, have not been used until recently. Under the proportional hazards model assumption, He et al. (2020) improved the prediction accuracy of the integral intensity function by JMA, and Li et al. (2021) proposed a semiparametric model averaging prediction method to approximate the nonparametric regression function by a weighted sum of low-dimensional nonparametric submodels.

As for the AFT model, Yan et al. (2021) proposed a high dimensional JMA procedure, where the penalized Buckley-James method (Wang et al., 2008) was used to obtain the coefficient estimators. However, the convergence of Buckley-James estimate cannot be guaranteed, and the possible overlap of variables in different candidate models is not considered in Yan et al. (2021). Recently, Liang et al. (2022) proposed another model averaging method based on KSV estimate and MMA criterion. As specified by Bao et al. (2007), in many cases, the effect of KSV estimator is not as good as WLS estimator. Moreover, constructing a linear model for the synthetic response may face the problem of excessive error variance.

Therefore in this paper, we propose the weighted least squares model averaging (WLSMA) method under the AFT model, where the averaging weights are selected by minimizing a MMA criterion. We show that the proposed method has asymptotic optimality in the sense of Li (1986). In particular, as the variances of error terms are unknown in many applications, we also consider the estimation of variance in the Mallows criterion and prove that even when the variances of the error terms are estimated and the feasible weighted least squares estimators are averaged, our method still has asymptotic optimality, which is the most important theoretical property of model averaging method and one of main theoretical contributions of this paper. Besides, our method allows continuous weights, and the variables in each candidate model can be overlapped, which greatly improves the flexibility and applicability of the method. Extensive simulation shows that our WLSMA method outperforms many existing model selection and model averaging methods. In the empirical study of the PBC dataset, WLSMA method has also obtained good prediction accuracy.

The rest of the paper is organized as follows. We begin in Section 2 with the description of some notations, the AFT model and the WLS estimate. In Section 3, we propose our WLSMA method and present the asymptotic optimality of this new method. Sections 4 and 5 report the simulation results and the application in the PBC dataset. Finally, we provide some concluding remarks in Section 6 and outline the proofs of the theorems in the Appendix.

Section snippets

Notations and model

Let T and V denote the survival time and censored time, respectively. Y˜=logT and C=logV. X=(x1,x2,,xN) denotes the covariate matrix for N independent observations, where the dimension of xi=(xi1,xi2,) is countably infinite. The AFT model assumesY˜i=μ˜i+ei=j=1βjxij+ei,i=1,,N, with E(ei|xi)=0 and E(ei2|xi)=σ2. We consider a sequence of linear approximating models m=1,,M, where the mth model, with any km(>0) regressors belonging to xi, takes the form ofY˜i=j=1kmβj,mxij,m+ei,i=1,,N,

The proposed model averaging method

Let n denote the number of uncensored observations in all N observations, and D=diag(d1,,dn) denote the diagonal matrix consisting of the non-zero elements in {ai}i=1N. Similarly, denote Zm as the n×km submatrix of Xm composed of the n uncensored individuals' covariates under the mth candidate model.

Since the weight ai=0 for the censored individuals, the WLS estimator (2) of βm in the mth (m=1,,M) model can be rewritten asβˆm=(ZmDZm)1ZmDY, where Y=(Y1,,Yn) is the corresponding uncensored

Simulation

In the simulation study, the data are generated from the AFT model, log(Ti)=Y˜i=j=1pβjxij+ei, where ei follows the normal distribution N(0,1). The censoring time Ci is generated from N(C0,2). By adjusting the value of C0, the censoring rate(CR) is about 20%, 35% and 50%. We set N=100,200 and p=100, and consider different cases about the selection of candidate models and true values of β.

  • Case 1

    (The nested models): We assume that only the first 3N1/3 covariates could be observed. The mth model uses

Applications

In this section, we will evaluate the prediction performance of the proposed WLSMA method in a real dataset. The Mayo Clinic has established a dataset of 424 patients with primary biliary cirrhosis (PBC), which includes complete data on 17 covariates from 276 patients. The survival time of interest is the days between registration and death. Patients who underwent liver transplantation or were still alive at the end of the study were considered right censored. Since we can never know the real

Discussion

In order to overcome model selection uncertainty and improve prediction accuracy, we propose a WLS model average method based on the Mallows criterion for the AFT model with right censored data in this paper. Simulation results demonstrate the good performance of the proposed WLSMA method. In addition, the asymptotic optimality is also proved under certain mild conditions.

Note that although the proposed method does not require nested candidate models, the construction of candidate models is

References (32)

  • E.R. Dickson et al.

    Prognosis in primary biliary cirrhosis: model for decision making

    Hepatology

    (1989)
  • B. Efron et al.

    Least angle regression

    Ann. Stat.

    (2004)
  • J. Fan et al.

    Variable selection via nonconcave penalized likelihood and its oracle properties

    J. Am. Stat. Assoc.

    (2001)
  • B.E. Hansen

    Least squares model averaging

    Econometrica

    (2007)
  • B. He et al.

    Functional martingale residual process for high-dimensional Cox regression with model averaging

    J. Mach. Learn. Res.

    (2020)
  • S. He et al.

    Central limit theorem of linear regression model under right censorship

    Sci. China Ser. A, Math.

    (2003)
  • View full text