Censored broken adaptive ridge regression in high-dimension

Lee, Jeongjin; Choi, Taehwa; Choi, Sangbum

doi:10.1007/s00180-023-01446-1

Censored broken adaptive ridge regression in high-dimension

Original Paper
Published: 17 January 2024

Volume 39, pages 3457–3482, (2024)
Cite this article

Computational Statistics Aims and scope Submit manuscript

575 Accesses
Explore all metrics

Abstract

Broken adaptive ridge (BAR) is a penalized regression method that performs variable selection via a computationally scalable surrogate to $L_0$ regularization. The BAR regression has many appealing features; it converges to selection with $L_0$ penalties as a result of reweighting $L_2$ penalties, and satisfies the oracle property with grouping effect for highly correlated covariates. In this paper, we investigate the BAR procedure for variable selection in a semiparametric accelerated failure time model with complex high-dimensional censored data. Coupled with Buckley-James-type responses, BAR-based variable selection procedures can be performed when event times are censored in complex ways, such as right-censored, left-censored, or double-censored. Our approach utilizes a two-stage cyclic coordinate descent algorithm to minimize the objective function by iteratively estimating the pseudo survival response and regression coefficients along the direction of coordinates. Under some weak regularity conditions, we establish both the oracle property and the grouping effect of the proposed BAR estimator. Numerical studies are conducted to investigate the finite-sample performance of the proposed algorithm and an application to real data is provided as a data example.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Broken adaptive ridge regression for right-censored survival data

Article 05 April 2021

Variable selection in proportional odds model with informatively interval-censored data

Article 29 September 2023

Variable selection for semiparametric accelerated failure time models with nonignorable missing data

Article 19 November 2023

References

Breiman L (1996) Heuristics of instability and stabilization in model selection. Ann Stat 24(6):2350–2383
Article MathSciNet Google Scholar
Buckley J, James I (1979) Linear regression with censored data. Biometrika 66(3):429–436
Article Google Scholar
Choi S, Cho H (2019) Accelerated failure time models for the analysis of competing risks. J Korean Stat Soc 48:315–326
Article MathSciNet Google Scholar
Choi T, Choi S (2021) A fast algorithm for the accelerated failure time model with high-dimensional time-to-event data. J Stat Comput Simul 91(16):3385–3403
Article MathSciNet Google Scholar
Choi S, Choi T, Cho H, Bandyopadhyay D (2022) Weighted least-squares regression with competing risks data. Stat Med 41(2):227–241
Article MathSciNet Google Scholar
Choi T, Kim AK, Choi S (2021) Semiparametric least-squares regression with doubly-censored data. Comput Stat Data Anal 164:107306
Article MathSciNet Google Scholar
Dai L, Chen K, Li G (2020) The broken adaptive ridge procedure and its applications. Statistica Sinica 30(2):1069–1094
MathSciNet Google Scholar
Dai L, Chen K, Sun Z, Liu Z, Li G (2018) Broken adaptive ridge regression and its asymptotic properties. J Multivar Anal 168:334–351
Article MathSciNet Google Scholar
Daubechies I, DeVore R, Fornasier M, Güntürk CS (2010) Iteratively reweighted least squares minimization for sparse recovery. Commun Pure Appl Math J Issued Courant Instit Math Sci 63(1):1–38
Article MathSciNet Google Scholar
Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96(456):1348–1360
Article MathSciNet Google Scholar
Frommlet F, Nuel G (2016) An adaptive ridge procedure for $l_0$ regularization. PloS one 11(2):e0148620
Article Google Scholar
Gao F, Zeng D, Lin DY (2017) Semiparametric estimation of the accelerated failure time model with partly interval-censored data. Biometrics 73(4):1161–1168
Article MathSciNet Google Scholar
Huang J (1999) Asymptotic properties of nonparametric estimation based on partly interval-censored data. Statistica Sinica 9(2):501–519
MathSciNet Google Scholar
Jin Z, Lin D, Wei L, Ying Z (2003) Rank-based inference for the accelerated failure time model. Biometrika 90(2):341–353
Article MathSciNet Google Scholar
Jin Z, Lin D, Ying Z (2006) On least-squares regression with censored data. Biometrika 93(1):147–161
Article MathSciNet Google Scholar
Johnson BA (2009) On lasso for censored data. Electron J Stat 3:485–506
Article MathSciNet Google Scholar
Johnson BA, Lin DY, Zeng D (2008) Penalized estimating functions and variable selection in semiparametric regression models. J Am Stat Assoc 103(482):672–680
Article MathSciNet Google Scholar
Kawaguchi ES, Shen JI, Suchard MA, Li G (2021) Scalable algorithms for large competing risks data. J Comput Graph Stat 30(3):685–693
Article MathSciNet Google Scholar
Kawaguchi ES, Suchard MA, Liu Z, Li G (2020) A surrogate ${L}_0$ sparse Cox’s regression with applications to sparse high-dimensional massive sample size time-to-event data. Stat Med 39(6):675–686
Article MathSciNet Google Scholar
Leurgans S (1987) Linear models, random censoring and synthetic data. Biometrika 74(2):301–309
Article MathSciNet Google Scholar
Li Y, Dicker L, Zhao S (2014) The Dantzig selector for censored linear regression models. Statistica Sinica 24(1):251–268
MathSciNet Google Scholar
Liu Y, Chen X, Li G (2019) A new joint screening method for right-censored time-to-event data with ultra-high dimensional covariates. Stat Methods Med Res 29(6):1499–1513
Article MathSciNet Google Scholar
Meir A, Keeler E (1969) A theorem on contraction mappings. J Math Anal Appl 28(2):326–329
Article MathSciNet Google Scholar
Rippe RC, Meulman JJ, Eilers PH (2012) Visualization of genomic changes by segmented smoothing using an l 0 penalty. PloS one 7(6):e38230
Article Google Scholar
Ritov Y (1990) Estimation in a linear regression model with censored data. Ann Stat 18(1):303–328
Article MathSciNet Google Scholar
Shao J (1993) Linear model selection by cross-validation. J Am Stat Assoc 88(422):486–494
Article MathSciNet Google Scholar
Son M, Choi T, Shin SJ, Jung Y, Choi S (2021) Regularized linear censored quantile regression. J Korean Stat Soc 51:1–19
MathSciNet Google Scholar
Sun Z, Liu Y, Chen K, Li G (2022) Broken adaptive ridge regression for right-censored survival data. Ann Instit Stat Math 74(1):69–91
Article MathSciNet Google Scholar
Sun Z, Yu C, Li G, Chen K, Liu Y (2020) CenBAR: Broken Adaptive Ridge AFT Model with Censored Data. https://cran.r-project.org/web/packages/CenBAR/index.html, r package version 0.1.1
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J Royal Stat Soc Series B (Methodological) 58(1):267–288
Article MathSciNet Google Scholar
Turnbull BW (1976) The empirical distribution function with arbitrarily grouped, censored and truncated data. J Royal Stat Soc Ser B 38(3):290–295
Article MathSciNet Google Scholar
Wang S, Nan B, Zhu J, Beer DG (2008) Doubly penalized Buckley-James method for survival data with high-dimensional covariates. Biometrics 64(1):132–140
Article MathSciNet Google Scholar
Xu J, Leng C, Ying Z (2010) Rank-based variable selection with censored data. Stat Comput 20(2):165–176
Article MathSciNet Google Scholar
Zeng D, Lin D (2007) Efficient estimation for the accelerated failure time model. J Am Stat Assoc 69(4):507–564
MathSciNet Google Scholar
Zhao H, Sun D, Li G, Sun J (2018) Variable selection for recurrent event data with broken adaptive ridge regression. Can J Stat 46(3):416–428
Article MathSciNet Google Scholar
Zhao H, Wu Q, Li G, Sun J (2020) Simultaneous estimation and variable selection for interval-censored data with broken adaptive ridge regression. J Am Stat Assoc 115(529):204–216
Article MathSciNet Google Scholar
Zou H (2006) The adaptive lasso and its oracle properties. J Am Stat Assoc 101(476):1418–1429
Article MathSciNet Google Scholar

Download references

Funding

The research of T. Choi was supported by the National Research Foundation of Korea (NRF) grant funded by the Ministry of Education (RS-2023-00237435). The research of S. Choi was supported by grant from the National Research Foundation (NSF) of Korea (2022M3J6A1063595, 2022R1A2C1008514) and the Korea University research grant (K2018721, K2008341).

Author information

Authors and Affiliations

Department of Statistics, Ohio State University, 281 W Lane Ave, Columbus, OH, USA
Jeongjin Lee
Department of Biostatistics and Bioinformatics, Duke University, 2424 Erwin Rd, Durham, NC, USA
Taehwa Choi
Department of Statistics, Korea University, 145 Anam-ro, Seongbuk-gu, Seoul, South Korea
Sangbum Choi

Authors

Jeongjin Lee
View author publications
You can also search for this author in PubMed Google Scholar
Taehwa Choi
View author publications
You can also search for this author in PubMed Google Scholar
Sangbum Choi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sangbum Choi.

Ethics declarations

Conflict of interest

We declare that we have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

In this appendix, we prove the asymptotic properties of the proposed BJ-BAR estimator $\hat{\beta }$. For the proof, define

$$\begin{aligned} \left( \begin{array}{c} {\alpha }^*(\beta ) \\ {\gamma }^*(\beta ) \\ \end{array} \right) = g(\beta )=(X^T X +\lambda _n { D}(\beta ))^{-1} \hat{Y} \,, \end{aligned}$$

(10)

and the partition of matrix ${\Sigma }_n^{-1}$ as

$$\begin{aligned} {\Sigma }_n^{-1} = \left( \begin{array}{cc} A &{} B\\ B^T &{} G \\ \end{array} \right) \,, \end{aligned}$$

where A is a $q\times q$ matrix and $\hat{Y}=\hat{Y}(\tilde{\beta })$ is the“imputed” failure time. We write ${\alpha }^*(\beta )$ and ${\gamma }^*(\beta )$ as $\alpha ^*$ and $\gamma ^*$, where $\alpha ^*$ is a $q\times 1$ vector and $\gamma ^*$ is a $(p-q)\times 1$ vector. Note that since ${\Sigma }_n=n^{-1} X^T X$ is nonsingular, by multiplying $(X^T X)^{-1} (X^T X + \lambda _n { D}(\beta ))$ and subtracting $\beta _0=(\beta _{10}^T,0^T)^T$ on both sides of Eq. (10), we have

$$\begin{aligned} \left( \begin{array}{c} {\alpha }^*-{\beta }_{10} \\ {\gamma }^* \\ \end{array} \right) +\frac{\lambda _n}{n}\left( \begin{array}{c} A { D}_1({\beta }_1){\alpha }^* + B { D}_2({\beta }_2){\gamma }^* \\ B^T { D}_1({\beta }_1){\alpha }^* + G { D}_2({\beta }_2){\gamma }^* \\ \end{array} \right) ={\tilde{\beta }} -{\beta }_0 \,, \end{aligned}$$

(11)

since $\tilde{\beta }=(X^TX)^{-1}X^T \hat{Y}(\tilde{\beta })$, where ${ D}_1({\beta }_1) = \text{ diag }(\beta _1^{-2})$ and ${ D}_2({\beta }_2) = \text{ diag }(\beta _2^{-2})$. We also define ${\gamma }^*=0$ if ${\beta _2}=0$ in Eq. (11). According to Ritov (1990), we have $\Vert {\tilde{\beta }} -{\beta }_0\Vert = O_p(n^{-1/2}).$

In order to establish the validity of Theorem 1, we rely on the foundation provided by two lemmas. These lemmas are derived from the works of Dai et al. (2018); Zhao et al. (2018, 2020), and form an integral part of the proof for Theorem 1.

Lemma 1

Let $\{\, \delta _n \, \}$ be a sequence of positive real numbers satisfying $\delta _n\rightarrow \infty$ and $\delta _n^2/\lambda _n \rightarrow 0$. Define $H= \{{\beta } = ({\beta }_1^T, {\beta }_2^T)^T: {\beta }_1 \in [1/K_0, K_0]^{q}, \Vert {\beta }_2\Vert \le \delta _n/\sqrt{n}\}$, where $K_0>1$ is a constant such that ${\beta }_{10}\in [1/K_0, K_0]^{q}$. Then under the regularity conditions (C1)–(C6) and with probability tending to 1, we have

(i)
$\displaystyle \sup _{{\beta }\in H}\dfrac{\Vert {\gamma }^*({\beta })\Vert }{\Vert {\beta }_2\Vert } < \dfrac{1}{c_0}$ for some constant $c_0>1$.
(ii)
$g(\cdot )$ is a mapping from H to itself.

Proof

By Eq. (11) and Ritov (1990), we have

$$\begin{aligned} \displaystyle \sup _{{\beta }\in H}\Big \Vert \ \gamma ^*+ \frac{\lambda _n}{n}B^T {D}_1({\beta }_1){\alpha }^* + \frac{\lambda _n}{n}G {D}_2({\beta }_2){\gamma }^* \Big \Vert =O_p(n^{-1/2}). \end{aligned}$$

Note that $\Vert \beta _2\Vert \le \delta _n / \sqrt{n}$ and $\lambda _n/n=o_p(n^{-1/2})$. Based on conditions (C5) and (C6) and the fact that ${\beta }_1 \in [1/K_0, K_0]^{q}$ and $\Vert {\alpha }^*\Vert \le \Vert g({\beta })\Vert < K$ for some constant $K>0$, we have

$$\begin{aligned} \displaystyle \sup _{{\beta }\in H}\Big \Vert \frac{\lambda _n}{n}B^T {D}_1({\beta }_1){\alpha }^* \Big \Vert \le \frac{\lambda _n}{n} \Vert B^T \Vert \displaystyle \sup _{{\beta }\in H} \Vert {D}_1({\beta }_1){\alpha }^*\Vert \le \sqrt{2}c \frac{\lambda _n}{n} \frac{a_1}{a_0 ^2} \displaystyle \sup _{{\beta }\in H} \Vert {\alpha }^*\Vert =o_p(n^{-1/2}), \end{aligned}$$

where $a_0 = \min _{1 \le j \le q} |\beta _{10_j} |$, $a_1 = \max _{1 \le j \le q} |\beta _{10_j} |$, and $\Vert B^T\Vert \le \sqrt{2}c$, which follows from the inequality $\Vert BB^T\Vert - \Vert A^2 \Vert \le \Vert BB^T + A^2 \Vert \le \Vert {\Sigma }_n ^{-2}\Vert < c^2.$ Then, it follows from Eq. (11) that, with probability tending to 1,

$$\begin{aligned} c^{-1}\Big \Vert \dfrac{\lambda _n}{n}{D}_2({\beta }_2){\gamma }^*\Big \Vert -\Vert {\gamma }^*\Vert \le \displaystyle \sup _{{\beta }\in H}\Big \Vert {\gamma }^* +\frac{\lambda _n}{n}G {D}_2({\beta }_2){\gamma }^*\Big \Vert = O_p(n^{-1/2}) \le \dfrac{\delta _n}{\sqrt{n}}, \end{aligned}$$

(12)

because $\lambda _{\min }({ G})> c^{-1}$. Let $m_{{\gamma }^*/{\beta }_2}=(\gamma _1^*/\beta _{{q}+1}, \gamma _2^*/\beta _{{q}+2},\dots , \gamma _{p-q}^*/\beta _{p})^T$. It follows from the Cauchy-Schwarz inequality and the assumption $\Vert {\beta }_2\Vert \le \delta _n/\sqrt{n}$ and ${ D}_2({\beta }_2) = \text{ diag }(\beta _2^{-2})$ that

$$\begin{aligned} \Vert m_{{\gamma }^*/{\beta }_2}\Vert = \Vert D_2(\beta _2) \gamma ^* \odot \beta _2 \Vert \le \Vert D_2(\beta _2) \gamma ^* \Vert \cdot \Vert \beta _2 \Vert \le \Vert {D}_2({\beta }_2){\gamma }^*\Vert \frac{\delta _n}{\sqrt{n}}, \end{aligned}$$

(13)

where $\odot$ denotes the component-wise product and

$$\begin{aligned} \Vert {\gamma }^*\Vert =\Vert {D}_2({\beta }_2)^{-1/2}m_{{\gamma }^*/{\beta }_2}\Vert \le \Vert m_{{\gamma }^*/{\beta }_2}\Vert \cdot \Vert {\beta }_2\Vert \le \Vert m_{{\gamma }^*/{\beta }_2}\Vert \frac{\delta _n}{\sqrt{n}}, \end{aligned}$$

(14)

for all large n. Thus, Eq. (12), together with (13) and (14), implies that

$$\begin{aligned} \frac{\lambda _n}{nc}\frac{\sqrt{n}}{\delta _n}\Vert m_{{\gamma }^*/{\beta }_2}\Vert -\Vert m_{{\gamma }^*/{\beta }_2}\Vert \frac{\delta _n}{\sqrt{n}}\le \frac{\delta _n}{\sqrt{n}}. \end{aligned}$$

Immediately from $\delta _n^2/\lambda _n\rightarrow 0$, we have

$$\begin{aligned} \Vert m_{{\gamma }^*/{\beta }_2}\Vert \le \dfrac{1}{\frac{\lambda _n}{\delta ^2_n c}-1} < \frac{1}{c_0} \end{aligned}$$

(15)

for some constant $c_0>1$, with probability tending to 1. Hence, it follows from inequality (14) and (15) that

$$\begin{aligned} \Vert {\gamma }^*\Vert < \Vert {\beta }_2\Vert \le \frac{\delta _n}{\sqrt{n}}\rightarrow 0\;\;\text{ as }\;\; n\rightarrow \infty , \end{aligned}$$

(16)

which implies that conclusion (i) holds.

To prove (ii), we need to verify that ${\alpha }^*\in [1/K_0, K_0]^{q}$ with probability tending to 1. By Eq. (11) and the results from Ritov (1990), we have

$$\begin{aligned} \displaystyle \sup _{{\beta }\in H}\bigg \Vert \alpha ^*-\beta _{10}+ \frac{\lambda _n}{n}A {D}_1({\beta }_1){\alpha }^* + \frac{\lambda _n}{n}B {D}_2({\beta }_2){\gamma }^* \bigg \Vert =O_p(n^{-1/2}). \end{aligned}$$

Similarly, given conditions (C5), $\beta _1 \in [1/K_0, K_0]^{q}$ and $\Vert {\alpha }^*\Vert < K$,

$$\begin{aligned} \sup _{\beta \in H}\bigg \Vert \frac{\lambda _n}{n}A {D}_1({\beta }_1){\alpha }^* \bigg \Vert =o_p(n^{-1/2}). \end{aligned}$$

Then, from Eq. (11), we have

$$\begin{aligned} \sup _{{\beta }\in H}\bigg \Vert {\alpha }^*-{\beta }_{10}+\frac{\lambda _n}{n}B {D}_2({\beta }_2){\gamma }^*\bigg \Vert =O_p(n^{-1/2})\le \frac{\delta _n}{\sqrt{n}}. \end{aligned}$$

(17)

Also, according to inequalities in (12) and condition (C5), we know that as $n\rightarrow \infty$ and with probability tending to 1,

$$\begin{aligned} \sup _{\beta \in H}\Big \Vert \frac{\lambda _n}{n} B {D}_2({\beta }_2) {\gamma }^*\Big \Vert \le \frac{\lambda _n}{n} \Vert {B}\Vert \sup _{{\beta }\in H}\Vert {D}_2({\beta }_2) {\gamma }^*\Vert \le \frac{2c^2\delta _n}{\sqrt{n}}. \end{aligned}$$

(18)

Therefore, from (17) and (18), we can get

$$\begin{aligned} \sup _{{\beta }\in H}\Vert {\alpha }^*-{\beta }_{10}\Vert \le \frac{(2c^2+1)\delta _n}{\sqrt{n}}\rightarrow 0 \end{aligned}$$

with probability tending to 1, which implies that for any $\varepsilon >0$, $P(\Vert {\alpha }^*-{\beta }_{10}\Vert \le \varepsilon )\rightarrow 1$. Since ${\beta }_{10}\in [1/K_0, K_0]^{q}$, thus ${\alpha }^* \in [1/K_0, K_0]^{q}$ holds for large n. Then with the fact that $\Vert {\gamma }^*\Vert \le \delta _n/\sqrt{n}$, we proved that (ii) holds. $\square$

Lemma 2

Under the regular conditions (C1)–(C6) and with probability tending to 1, the equation ${\alpha } = (X_1 ^T X_1 + \lambda _n {D}_1({\alpha }))^{-1}\hat{Y}_1$ has a unique fixed-point $\hat{\alpha }^*$ in the domain $[1/K_0, K_0]^{q}$.

Proof

Since ${\beta }_{20}=0$, we define

$$\begin{aligned} f(\alpha )=(f_1(\alpha ), f_2(\alpha ),\dots , f_{q}(\alpha ))^T = (X_1 ^T X_1 + \lambda _n {D}_1({\alpha }))^{-1}\hat{Y}_1, \end{aligned}$$

(19)

where ${\alpha }=(\alpha _1,\dots ,\alpha _{q})^T$. Note that $(f({\alpha })^T,0^T)^T = g(({\alpha }^T, 0^T)^T)$ and $f({\alpha })$ is a map from $[1/K_0, K_0]^q$ to itself. Multiplying $(X_1 ^T X_1 + \lambda _n {D}_1({\alpha }))$ and taking derivative with respect to ${\alpha }$ on both sides of Eq. (19), we have

$$\begin{aligned} \Big ({\Sigma }_{n1} + \frac{\lambda _n}{n}{D}_1({\alpha })\Big ){f}^{\prime }({\alpha }) + \frac{\lambda _n}{n}\text{ diag }\bigg (\frac{-2 f_1(\alpha )}{\alpha _1^3},\dots , \frac{-2 f_q(\alpha )}{\alpha _{q}^3}\bigg )=0, \end{aligned}$$

where ${f}^{\prime }({\alpha })=\partial f({\alpha })/\partial {\alpha }^T$. Then

$$\begin{aligned} \sup _{\alpha \in [1/K_0, K_0]^{q}}\bigg \Vert \Big (\Sigma _{n1} + \frac{\lambda _n}{n}{D}_1({\alpha })\Big ){f}^{\prime }({\alpha })\bigg \Vert = \sup _{\alpha \in [1/K_0, K_0]^{q}}\frac{2\lambda _n}{n}\bigg \Vert \text{ diag }\Big (\frac{f_1({\alpha })}{\alpha _1^3},\dots , \frac{f_q({\alpha })}{\alpha _{q}^3}\Big )\bigg \Vert =o_p(1). \end{aligned}$$

According to condition (C5) and the fact that ${\alpha }\in [1/K_0, K_0]^{q}$, we can derive

$$\begin{aligned} \bigg \Vert \Big (\Sigma _{n1} + \frac{\lambda _n}{n}{D}_1({\alpha })\Big ){f}^{\prime }({\alpha })\bigg \Vert \ge \bigg \Vert \Sigma _{n1}{f}^{\prime }({\alpha })\bigg \Vert -\bigg \Vert \frac{\lambda _n}{n}{D}_1({\alpha }){f}^{\prime }({\alpha })\bigg \Vert \ge \Big (\frac{1}{c}-\frac{\lambda _n}{n}K_0^2\Big )\Vert {f}^{\prime }({\alpha })\Vert . \end{aligned}$$

Thus, $\sup _{{\alpha }\in [1/K_0, K_0]^{q}}\Vert {f}^{\prime }({\alpha })\Vert \rightarrow 0$, which implies that $f(\cdot )$ is a contraction mapping from $[1/K_0, K_0]^{q}$ to itself with probability tending to 1 (Meir and Keeler 1969). Hence, according to the contraction mapping theorem, there exists one unique fixed-point $\hat{\alpha }^* \in [1/K_0, K_0]^{q}$, such that

$$\begin{aligned} \hat{\alpha }^* =( X_1^T X_1+ \lambda _n {D}_1(\hat{\alpha }^*))^{-1} \hat{Y}_1. \end{aligned}$$

(20)

$\square$

Proof of Theorem 1

First, we prove the sparsity of the BJ-BAR estimator. According to the definitions of $\hat{\beta }_2$ and $\hat{\beta }_2^{(k)}$, it follows from inequality (15) that

$$\begin{aligned} \hat{\beta }_2=\lim _{k\rightarrow \infty }\hat{\beta }_2^{(k)}=0 \end{aligned}$$

(21)

holds with the probability tending to 1.

Next, we will show that $P( \hat{\beta }_1 = \hat{\alpha }^*)\rightarrow 1$. Consider Eq. (11), we define ${\gamma }^*=0$ if ${\beta _2}=0$. Note that for any fixed large n, from Eq. (11), we have $\lim _{\beta _2\rightarrow 0} \gamma ^*(\beta )=0$. By multiplying $(X^T X +\lambda _n {D}(\beta ))$ on both sides of Eq. (10), we can have

$$\begin{aligned} \lim _{\beta _2\rightarrow 0}{\alpha }^*(\beta )=(X_1 ^T X_1 +\lambda _n {D}_1(\beta _1))^{-1}\hat{Y}_1=f(\beta _1). \end{aligned}$$

(22)

Combining Eqs. (20) and (22), we have

$$\begin{aligned} \sup _{\beta _1\in [1/K_0, K_0]^{q}}\Vert f(\beta _1)-{\alpha }^*(\beta _1, \hat{\beta }_2^{(k)})\Vert \rightarrow 0,\; \text{ as } \; k\rightarrow \infty . \end{aligned}$$

(23)

Since $f(\cdot )$ is a contract mapping, it follows from Eq. (20) that

$$\begin{aligned} \Vert f(\hat{\beta }_1^{(k)})-\hat{\alpha }^*\Vert =\Vert f(\hat{\beta }_1^{(k)})- f(\hat{\alpha }^*)\Vert \le \frac{1}{c} \Vert \hat{\beta }_1^{(k)}-\hat{\alpha }^*\Vert \;\;\;(c>1). \end{aligned}$$

(24)

Let $h_k=\Vert \hat{\beta }_1^{(k)}-\hat{\alpha }^*\Vert$, then, from (23) and (24), we get

$$\begin{aligned} \begin{array}{rl} h_{k+1}=\Vert {\alpha }^*(\hat{\beta }^{(k)})-\hat{\alpha }^*\Vert &{}\le \Vert \alpha ^*(\hat{\beta }^{(k)})-f(\hat{\beta }_1^{(k)})\Vert +\Vert f(\hat{\beta }_1^{(k)})-\hat{\alpha }^*\Vert \\ &{}\le \eta _k + \dfrac{1}{c} ~ h_k,\;\text{ for } \text{ some } \text{ small }\; \eta _k>0. \end{array} \end{aligned}$$

From (23), for any $\varepsilon \ge 0$, there exists $N>0$, such that $|\eta _k|<\varepsilon$. Following some recursive calculation as in Dai et al. (2018), we can show that $h_k\rightarrow 0$ as $k\rightarrow \infty$. Hence, with probability tending to 1, $h_k\rightarrow 0 \; \text{ as } \; k\rightarrow \infty .$ Since $\hat{\beta }_1 = \lim _{k\rightarrow \infty }\hat{\beta }_1^{(k)}$ and from the uniqueness of fixed-point, we have $P(\hat{\beta }_1 = \hat{\alpha }^* )\rightarrow 1$, completing the proof of Theorem 1(i).

Finally, based on Eq. (11) and condition (C5) and the fact that $\lambda _n/n=o_p(n^{-1/2})$, we get $\sqrt{n}(\hat{\beta }_1 - {\beta }_{10})\approx \sqrt{n}(\tilde{\beta }_1 - {\beta }_{10}),$ where $\tilde{\beta }_1$ is the first q elements of $\tilde{\beta }$. In other words, if we plug in $\lambda _n/n=o_p(n^{-1/2})$ in Eq. (11) under the regularity conditions, we deduce that $\sqrt{n}(\hat{\beta }_1 - {\beta }_{10})\approx \sqrt{n}(\tilde{\beta }_1 - {\beta }_{10}).$ Then Theorem 1(ii) follows from the asymptotic normality of $\tilde{\beta }$ (Ritov 1990).

Proof of Theorem 2

Recall that $\hat{\beta }= \lim _{k\rightarrow \infty } \hat{\beta }^{(k+1)}$ and $\hat{\beta }^{(k+1)}=\arg \min _{\beta } Q\big (\beta \mid \hat{\beta }^{(k)}\big )$, where

$$\begin{aligned} Q\big (\beta \mid \hat{\beta }^{(k)}\big )=\big \Vert \hat{Y}-X \beta \big \Vert ^{2}+\lambda _n \sum _{i=1}^{p} \frac{\beta _{i}^{2}}{\big \{\hat{\beta }_{i}^{(k)}\big \}^{2}}. \end{aligned}$$

If $\beta _{\ell } \ne 0$ for $\ell \in \{i, j\}$, then $\hat{\beta }$ must satisfy the following normal equations for $\ell \in \{i, j\}$:

$$\begin{aligned} -2 X_{\ell }^T\big (\hat{Y}-X \hat{\beta }^{(k+1)}\big )+2 \lambda _n \frac{\hat{\beta }_{\ell }^{(k+1)}}{\big \{\hat{\beta }_{\ell }^{(k)}\big \}^{2}}=0. \end{aligned}$$

Thus, for $\ell \in \{i, j\}$,

$$\begin{aligned} \frac{\hat{\beta }_{\ell }^{(k+1)}}{\big \{\hat{\beta }_{\ell }^{(k)}\big \}^{2}}= \frac{X_{\ell }^{T} \hat{\varepsilon }^{*(k+1)}}{\lambda _n}, \end{aligned}$$

(25)

where $\hat{\varepsilon }^{*(k+1)}=\hat{Y}-X \hat{\beta }^{(k+1)}.$ Since

$$\begin{aligned} \big \Vert \hat{\varepsilon }^{*(k+1)}\big \Vert ^{2}+\lambda _n \sum _{i=1}^{p} \frac{\hat{\beta }_{i}^{2}}{\big \{\hat{\beta }_{i}^{(k)}\big \}^2}= Q\big (\hat{\beta }^{(k+1)} \mid \hat{\beta }^{(k)}\big ) \le Q\big (0 \mid \hat{\beta }^{(k)}\big )=\big \Vert \hat{Y}\big \Vert ^{2}, \end{aligned}$$

we have

$$\begin{aligned} \big \Vert \hat{\varepsilon }^{*(k+1)}\big \Vert \le \big \Vert \hat{Y}\big \Vert . \end{aligned}$$

(26)

Letting $k \rightarrow \infty$ in (25) and (26), we have that for $\ell \in \{i, j\}$ and $\big \Vert \hat{\varepsilon }^{*}\big \Vert \le \big \Vert \hat{Y}\big \Vert ,$ $\hat{\beta }_{\ell }^{-1}=X_{\ell }^{T} \hat{\varepsilon }^{*} / \lambda _n$, where $\hat{\varepsilon }^{*}=\hat{Y}-X \hat{\beta }$. Therefore,

$$\begin{aligned} \big |\hat{\beta }_{i}^{-1}-\hat{\beta }_{j}^{-1}\big | \le \frac{1}{\lambda _n}\big \Vert \hat{\varepsilon }^{*} \big \Vert \big \Vert X_{i}-X_{j}\big \Vert \le \frac{1}{\lambda _n}\big \Vert \hat{Y}\big \Vert \sqrt{2\big (1-r_{i j}\big )}, \end{aligned}$$

which completes the proof.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Lee, J., Choi, T. & Choi, S. Censored broken adaptive ridge regression in high-dimension. Comput Stat 39, 3457–3482 (2024). https://doi.org/10.1007/s00180-023-01446-1

Download citation

Received: 11 March 2022
Accepted: 04 December 2023
Published: 17 January 2024
Issue Date: September 2024
DOI: https://doi.org/10.1007/s00180-023-01446-1

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Censored broken adaptive ridge regression in high-dimension

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Broken adaptive ridge regression for right-censored survival data

Variable selection in proportional odds model with informatively interval-censored data

Variable selection for semiparametric accelerated failure time models with nonignorable missing data

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix

Lemma 1

Proof

Lemma 2

Proof

Proof of Theorem 1

Proof of Theorem 2

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Censored broken adaptive ridge regression in high-dimension

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Broken adaptive ridge regression for right-censored survival data

Variable selection in proportional odds model with informatively interval-censored data

Variable selection for semiparametric accelerated failure time models with nonignorable missing data

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix

Appendix

Lemma 1

Proof

Lemma 2

Proof

Proof of Theorem 1

Proof of Theorem 2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation