ℓ0-Regularized high-dimensional accelerated failure time model

doi:10.1016/j.csda.2022.107430

Computational Statistics & Data Analysis

Volume 170, June 2022, 107430

https://doi.org/10.1016/j.csda.2022.107430 Get rights and content

Abstract

We develop a constructive approach for $ℓ_{0}$ -penalized estimation in the sparse accelerated failure time (AFT) model with high-dimensional covariates. The proposed approach is based on Stute's weighted least squares criterion combined with $ℓ_{0}$ -penalization. This method is a computational algorithm that generates a sequence of solutions iteratively, based on active sets derived from primal and dual information and root finding according to the Karush-Kuhn-Tucker (KKT) conditions. We refer to the proposed method as AFT-SDAR (for support detection and root finding). An important aspect of our theoretical results is that we directly concern the sequence of solutions generated based on the AFT-SDAR algorithm. We prove that the estimation errors of the solution sequence decay exponentially to the optimal error bound with high probability, as long as the covariate matrix satisfies a mild regularity condition which is necessary and sufficient for model identification even in the setting of high-dimensional linear regression. An adaptive version of AFT-SDAR is also proposed, i.e., AFT-ASDAR, which determines the support size of the estimated coefficient in a data-driven fashion. Simulation studies demonstrate the superior performance of the proposed method over the lasso and MCP in terms of accuracy and speed. The application of the proposed method is also illustrated by analyzing a real data set.

Introduction

In survival analysis, an attractive alternative to the widely used proportional hazards model (Cox, 1972) is the accelerated failure time (AFT) model (Koul et al., 1981; Wei, 1992; Kalbfleisch and Prentice, 2011). The AFT model is a linear regression model in which the response variable is usually the logarithm or a known monotone transformation of the failure time. Let $T_{i}$ be the failure time and $x_{i}$ be a p-dimensional covariate vector for the ith subject in a random sample of size n. The AFT model assumes $T_{i} = x_{i}^{⊤} β^{⁎} + ϵ_{i}, i = 1, \dots, n,$ where $β^{⁎} \in R^{p}$ is the underlying regression coefficient vector, $ϵ_{i}$ 's are random error terms. Often times, $T_{i}$ is taken to be the logarithm of the failure time. When $T_{i}$ is subject to right censoring, we only observe $(Y_{i}, δ_{i}, x_{i})$ , where $Y_{i} = \min {T_{i}, C_{i}}$ , $C_{i}$ is the censoring time, and $δ_{i} = 1_{{T_{i} \leq C_{i}}}$ is the censoring indicator. Assume that a random sample of i.i.d. observations ( $Y_{i}$ , $δ_{i}$ , $x_{i}$ ), $i = 1, \dots, n$ , is available. To estimate $β^{⁎}$ when the distribution of the error terms is unspecified, several approaches have been proposed in the literature. One approach is the Buckley-James estimator (Buckley and James, 1979), which adjusts for censored observations using the Kaplan-Meier estimator. The second approach is the rank-based estimator (Ying, 1993), which is motivated by the score function of the partial likelihood. Another interesting alternative is the weighted least squares approach (Stute et al., 1993; Stute, 1996), which involves the minimization of a weighted least squares objective function.

In this paper, we focus on the high-dimensional AFT model, where the dimension of the covariate vector can exceed the sample size. Under the high-dimensional AFT model, many researchers have proposed various methods for parameter estimation and variable selection. For example, Huang et al. (2006) considered the LASSO (Tibshirani, 1996) in the AFT model, based on the weighted least squares criterion; Johnson (2008) and Johnson et al. (2008) applied the SCAD (Fan and Li, 2001) penalty to the rank-based estimator and Buckley-James estimator; Cai et al. (2009) proposed the rank-based adaptive LASSO (Zou, 2006) method; Huang and Ma (2010) used the bridge penalization for the regularized estimation and variable selection; Hu and Chai (2013) extended the MCP (Zhang et al., 2010) penalty to the weighted least square estimation; Khan and Shaw (2016) used the adaptive and weighted elastic net methods (Zou and Zhang, 2009; Hong and Zhang, 2010) based on the weighted least squares criterion.

We propose the $ℓ_{0}$ -penalized method for estimation and variable selection under the high-dimensional AFT model. We extend the support detection and root finding (SDAR) algorithm (Huang et al., 2018) for linear regression model to the AFT model. For convenience, we refer to the proposed method as AFT-SDAR. In the same spirit as the SDAR method, AFT-SDAR is a constructive approach to estimating the sparse and high-dimensional AFT model. This approach is a computational algorithm motivated from the KKT conditions for the $ℓ_{0}$ -penalized weighted least squares solution, and generates a sequence of solutions iteratively, based on support detection using primal and dual information and root finding. Theoretically, we show that the $ℓ_{\infty}$ -norm of the estimation errors of the solution sequence decay exponentially to the optima order $O (\sqrt{\frac{\log p}{n}})$ with high probability, as long as the covariate matrix satisfies the weakest regularity condition that is necessary and sufficient for model identification. Moreover, the estimated support coincides with the true support of the underlying vector regression coefficients if the minimum absolute value of the nonzero entries of the target is above the detectable order.

The rest of this paper is organized as follows. In Section 2, we described the $ℓ_{0}$ -penalized criterion for the AFT model. In Section 3, we give the KKT conditions for the $ℓ_{0}$ -penalized weighted least squares solutions and describe the proposed AFT-SDAR algorithm. In Section 4, we first establish the finite-step and deterministic error bounds for the solution sequence generated by the AFT-SDAR algorithm. As a consequence of these deterministic error bounds, we provide nonasymptotic error bounds for the solution sequence. We also show that the proposed method recovers the support of the underlying regression coefficient vector in finite iterations with high probability. In Section 5, we describe AFT-ASDAR, the adaptive version of AFT-SDAR that selects the tuning parameter in a data driven fashion. In Section 6, we assess the finite sample performance of the proposed method with different simulation studies and a real case study on a breast cancer gene expression data set. Concluding remarks are given in Section 7. Proofs for all the lemmas and theorems are deferred to Appendix. An R package implementing the proposed method is available at https://github.com/Shuang-Zhang/ASDAR/.

Section snippets

AFT regression with $ℓ_{0}$ -penalization

Let $Y_{(1)}, \dots, Y_{(n)}$ be the order statistics of $Y_{i}$ 's. Let $δ_{(1)}, \dots, δ_{(n)}$ be the associated censoring indicators and let $x_{(1)}, \dots, x_{(n)}$ be the associated covariates. In the weighted least squares method, the weights $w_{(i)}$ 's are the jumps in Kaplan-Meier estimator based on $(Y_{(i)}, δ_{(i)})$ , $i = 1, \dots, n$ , which can be expressed as $w_{(1)} = \frac{δ_{(1)}}{n}, w_{(i)} = \frac{δ_{(i)}}{n - i + 1} \cdot \prod_{j = 1}^{i - 1} {(\frac{n - j}{n - j + 1})}^{δ_{(j)}}, i = 2, \dots, n .$ Let $w_{n i} = n w_{(i)}, i = 1, \dots, n$ . The weighted least squares criterion is given by $L_{1} (β) = \frac{1}{2 n} \sum_{i = 1}^{n} w_{n i} {(Y_{(i)} - x_{(i)}^{⊤} β)}^{2} .$ In the high-dimensional

AFT-SDAR algorithm

We first introduce some notation used throughout the paper. Let ${‖ η ‖}_{q} = {(\sum_{i = 1}^{p} | η_{i} |^{q})}^{\frac{1}{q}}$ be the usual q ( $q \in [1, \infty]$ ) norm of the vector $η = {(η_{1}, \dots, η_{p})}^{⊤} \in R^{p}$ . Let $| A |$ denote the cardinality of the set A. Denote $η_{A} = (η_{i}, i \in A) \in R^{| A |}$ , $η |_{A} \in R^{p}$ with its ith element ${(η |_{A})}_{i} = η_{i} 1 (i \in A)$ , where $1 (\cdot)$ is the indicator function. Let ${‖ η ‖}_{T, \infty}$ and ${‖ η ‖}_{\min}$ be the Tth largest elements (in absolute value) and the minimum absolute value of η, respectively. Let ${‖ M ‖}_{\infty}$ denote the maximum value (in absolute value) of the matrix M. Let $\nabla L$

Theoretical properties

In this section, we consider the finite-step error bound for the solution sequence computed based on Algorithm 1. We also study the probabilistic and nonasymptotic $ℓ_{\infty}$ error bound for the solution sequence.

We first consider the deterministic error bounds for the solution sequence generated based on AFT-SDAR. We choose the step size τ satisfies $0 < τ < \frac{1}{\sqrt{T} U}$ with $U \geq {‖ \bar{X} ‖}_{2}^{2} / n$ , and let L be a constant satisfying $0 < L \leq \frac{σ_{(\min, 2 T)}}{n \sqrt{2 T}} .$

Theorem 1

Suppose $T \geq K$ and set $β^{0} = 0$ in Algorithm 1. If (11) and (12) hold, for the

Adaptive AFT-SDAR

In practice, the sparsity levels of the true parameters $η^{⁎}$ or $β^{⁎}$ are unknown. Therefore, we can regard T as a tuning parameter. Let T increase from 0 to Q, where Q is a large enough integer. In general, we set $Q = α n / \log (n)$ as suggested by Fan and Lv (2008), where α is a positive constant. Then we can obtain a set of solutions paths: ${\hat{η} (T) : T = 0, 1, \dots, Q}$ , where $\hat{η} (0) = 0$ . Finally, we use the cross-validation method or the HBIC criterion (Wang et al., 2013) to determine $\hat{T}$ as the estimated

Numerical studies

In this section, we conduct simulation studies and real data analysis to illustrate the effectiveness of the proposed method. We compare the simulation results of AFT-SDAR/AFT-ASDAR with those of Lasso and MCP in terms of accuracy and efficiency. We also evaluate the performance under different study designs by considering the sample size n, the variable dimension p, the correlation measure ρ among covariates and the censoring rate $c . r$ . Moreover, we examine the average number of iterations for

Conclusion

In this paper, we consider the $ℓ_{0}$ -penalized method for estimation and variable selection under the high-dimensional AFT models. We extend the SDAR algorithm for the linear regression to the AFT model with censored survival data based on a weighted least squares criterion. The proposed AFT-SDAR algorithm is a constructive approach for approximating $ℓ_{0}$ -penalized weighted least squares solutions. In theoretical analysis, we establish the $ℓ_{\infty}$ nonasymptotic error bounds for the solution sequence

Acknowledgement

The authors thank two anonymous reviewers and the Associate Editor for many valuable comments and suggestions, which have helped to improve the quality of the article. Dr. Feng's work was partially supported by National Natural Science Foundation of China (No. 11971292). Dr. Jiao's work was partially supported by National Natural Science Foundation of China (No. 11871474).

References (32)

J. Hu et al.
Adjusted regularized estimation in the accelerated failure time model with high dimensional covariates
J. Multivar. Anal.
(2013)
P. Breheny et al.
Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection
Ann. Appl. Stat.
(2011)
J. Buckley et al.
Linear regression with censored data
Biometrika
(1979)
T. Cai et al.
Regularized estimation for the accelerated failure time model
Biometrics
(2009)
D.R. Cox
Regression models and life-tables
J. R. Stat. Soc., Ser. B, Stat. Methodol.
(1972)
J. Fan et al.
Variable selection via nonconcave penalized likelihood and its oracle properties
J. Am. Stat. Assoc.
(2001)
J. Fan et al.
Sure independence screening for ultrahigh dimensional feature space
J. R. Stat. Soc., Ser. B, Stat. Methodol.
(2008)
D. Hong et al.
Weighted elastic net model for mass spectrometry imaging processing
Math. Model. Nat. Phenom.
(2010)
J. Huang et al.
Variable selection in the accelerated failure time model via the bridge method
Lifetime Data Anal.
(2010)
J. Huang et al.
Regularized estimation in the accelerated failure time model with high-dimensional covariates
Biometrics
(2006)

J. Huang et al.

A constructive approach to $ℓ_{0}$ penalized regression

J. Mach. Learn. Res.

(2018)

B.A. Johnson

Variable selection in semiparametric linear regression with censored data

J. R. Stat. Soc., Ser. B, Stat. Methodol.

(2008)

B.A. Johnson et al.

Penalized estimating functions and variable selection in semiparametric regression models

J. Am. Stat. Assoc.

(2008)

J.D. Kalbfleisch et al.

The Statistical Analysis of Failure Time Data, vol. 360

(2011)

M.H.R. Khan et al.

Variable selection for survival data with a class of adaptive elastic net techniques

Stat. Comput.

(2016)

H. Koul et al.

Regression analysis with randomly right-censored data

Ann. Stat.

(1981)

Cited by (7)

L<inf>0</inf> regularized logistic regression for large-scale data
2024, Pattern Recognition
In this paper, we investigate $L_{0}$ -regularized logistic regression models, and design two fast and efficient algorithms for high-dimensional correlated data and massive data, respectively. Our first algorithm, the Variable Sorted Active Set (VSAS) algorithm, is based on the local quadratic approximation of the KKT conditions for $L_{0}$ -penalized maximum log-likelihood function in high-dimensional correlated data. We establish an $L_{\infty}$ error upper bound for the estimator obtained by the VSAS algorithm and prove its optimal convergence rate. Moreover, when the target signal exceeds the detectable level, the estimator obtained by the VSAS algorithm can achieve the oracle estimator with high probability. Our second algorithm, Communication Effective Variable Sorted Active Set (CEVSAS), aims to solve high-dimensional and large-sample $L_{0}$ -regularized logistic regression models by reduce computational and communication costs, while maintaining estimation efficiency. Finally, simulations and real data demonstrate the effectiveness of our proposed VSAS and CEVSAS algorithms.
A fast robust best subset regression
2024, Knowledge-Based Systems
In this paper, we present a novel approach called the Fast Robust Best Subset Regression (FRBSR) procedure, specifically designed to tackle outliers in both covariates and response variables. Through the incorporation of C-steps, the FRBSR procedure not only enhances the robustness of the estimation process, but also relaxes the stringent assumptions imposed by the Enhanced Support Detection and Root Finding (ESDAR, Huang et al. (2018, 2021)) algorithm regarding the distribution of covariates and response variables. More specifically, we propose additional techniques, namely reweighted least squares (REWLS) and one-step weighted least squares (OSWLS), which build upon the $L_{0}$ regularized least trimmed squares (LTS) method and aim to improve estimation efficiency. Compared to the majorization-minimization (MM, Liu et al. (2021)) principle for solving non-convex loss $L_{2} E$ , our proposed OSWLS requires only one step to compute the $L_{0}$ regularized weighted least squares, resulting in reduced estimation error and computational complexity. Extensive simulations and real data analysis demonstrate the superior performance of our FRBSR procedure in terms of estimation accuracy and prediction compared to the latest robust methods.
Weighted least squares model averaging for accelerated failure time models
2023, Computational Statistics and Data Analysis
This paper proposes a new model averaging method for the accelerated failure time models with right censored data. A weighted least squares procedure is used to estimate the parameters of candidate models. In this procedure, the candidate models are not required to be nested, and the weights selected by Mallows criterion are not limited to be discrete, which make the proposed method very flexible and general. The asymptotic optimality of the proposed method is proved under some mild conditions. Particularly, it is shown that the optimality remains valid even when the variances of the error terms are estimated and the feasible weighted least squares estimators are averaged. Simulation studies show that the proposed method has better prediction performance than many popular model selection or model averaging methods when all candidate models are misspecified. Finally, an application about primary biliary cirrhosis is provided.
A network-constrain Weibull AFT model for biomarkers discovery
2024, arXiv
Right-censored models by the expectile method
2024, arXiv
High dimensional controlled variable selection with model-X knockoffs in the AFT model
2023, Computational Statistics

View all citing articles on Scopus

View full text

ℓ0-Regularized high-dimensional accelerated failure time model

Abstract

Introduction

Section snippets

AFT regression with ℓ0-penalization

AFT-SDAR algorithm

Theoretical properties

Adaptive AFT-SDAR

Numerical studies

Conclusion

Acknowledgement

J. Multivar. Anal.

Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection

Ann. Appl. Stat.

Linear regression with censored data

Biometrika

Regularized estimation for the accelerated failure time model

Biometrics

Regression models and life-tables

J. R. Stat. Soc., Ser. B, Stat. Methodol.

Variable selection via nonconcave penalized likelihood and its oracle properties

J. Am. Stat. Assoc.

Sure independence screening for ultrahigh dimensional feature space

J. R. Stat. Soc., Ser. B, Stat. Methodol.

Weighted elastic net model for mass spectrometry imaging processing

Math. Model. Nat. Phenom.

Variable selection in the accelerated failure time model via the bridge method

Lifetime Data Anal.

Regularized estimation in the accelerated failure time model with high-dimensional covariates

Biometrics

A constructive approach to ℓ0 penalized regression

J. Mach. Learn. Res.

Variable selection in semiparametric linear regression with censored data

J. R. Stat. Soc., Ser. B, Stat. Methodol.

Penalized estimating functions and variable selection in semiparametric regression models

J. Am. Stat. Assoc.

The Statistical Analysis of Failure Time Data, vol. 360

Variable selection for survival data with a class of adaptive elastic net techniques

Stat. Comput.

Regression analysis with randomly right-censored data

Ann. Stat.

ℓ₀-Regularized high-dimensional accelerated failure time model

AFT regression with $ℓ_{0}$ -penalization

A constructive approach to $ℓ_{0}$ penalized regression