Usage of the GO estimator in high dimensional linear models

Genç, Murat; Özkale, M. Revan

doi:10.1007/s00180-020-01001-2

Usage of the GO estimator in high dimensional linear models

Original paper
Published: 18 June 2020

Volume 36, pages 217–239, (2021)
Cite this article

Computational Statistics Aims and scope Submit manuscript

334 Accesses
6 Citations
Explore all metrics

Abstract

This paper discusses simultaneous parameter estimation and variable selection and presents a new penalized regression method. The method is based on the idea that the coefficient estimates are shrunken towards a predetermined coefficient vector which represents the prior information. This method can result in smaller length estimates of the coefficients depending on the prior information compared to elastic net. In addition to the establishment of the grouping property, we also show that the new method has the grouping effect when the predictors are highly correlated. Simulation studies and real data example show that the prediction performance of the new method is improved over the well-known ridge, lasso and elastic net regression methods yielding a lower mean squared error and competes about the variable selection under sparse and non-sparse situations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

RMSEA, CFI, and TLI in structural equation modeling with ordered categorical data: The story they tell depends on the estimation methods

Article 04 June 2018

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

Article 30 August 2016

semPower: General power analysis for structural equation models

Article Open access 10 November 2023

Notes

This method is originally named as naive elastic net by Zou and Hastie (2005). The authors use a scaled version of the method and called it as elastic net. But we follow the same line with Friedman et al. (2010) who drop this distinction.

References

Boyd S, Parikh N, Chu E, Peleato B, Eckstein J (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends Mach Learn 3(1):1–122
Article Google Scholar
Bühlmann P, Kalisch M, Meier L (2014) High-dimensional statistics with a view toward applications in biology. Ann Rev Stat Appl 1(1):255–278
Article Google Scholar
Donoho DL, Johnstone JM (1994) Ideal spatial adaptation by wavelet shrinkage. Biometrika 81(3):425–455
Article MathSciNet Google Scholar
Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression. Ann Stat 32(2):407–499
Article MathSciNet Google Scholar
Friedman J, Hastie T, Höfling H, Tibshirani R (2007) Pathwise coordinate optimization. Ann Appl Stat 1(2):302–332
Article MathSciNet Google Scholar
Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33(1):1
Article Google Scholar
Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference and prediction, 2nd edn. Springer, Berlin
Book Google Scholar
Hastie T, Tibshirani R, Wainwright M (2015) Statistical learning with sparsity: the lasso and generalizations. CRC Press, Boca Raton
Book Google Scholar
Hoerl AE, Kennard RW (1970) Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12(1):55–67
Article Google Scholar
Javanmard A, Montanari A (2014) Confidence intervals and hypothesis testing for high-dimensional regression. J Mach Learn Res 15(1):2869–2909
MathSciNet MATH Google Scholar
Özkale MR, Kaçıranlar S (2007) The restricted and unrestricted two-parameter estimators. Commun Stat Theory Methods 36(15):2707–2725
Article MathSciNet Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodological) 58:267–288
MathSciNet MATH Google Scholar
Tibshirani R, Saunders M, Rosset S, Zhu J, Knight K (2005) Sparsity and smoothness via the fused lasso. J R Stat Soc Ser B (Stat Methodol) 67(1):91–108
Article MathSciNet Google Scholar
Wang Y, Jiang Y, Zhang J, Chen Z, Xie B, Zhao C (2019) Robust variable selection based on the random quantile lasso. Commun Stat Simul Comput 2009:1–11
Google Scholar
Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J R Stat Soc Ser B (Stat Methodol) 68(1):49–67
Article MathSciNet Google Scholar
Zhang C, Wu Y, Zhu M (2019) Pruning variable selection ensembles. Stat Anal Data Min ASA Data Sci J 12(3):168–184
Article MathSciNet Google Scholar
Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B (Stat Methodol) 67(2):301–320
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics, Faculty of Science and Letters, Çukurova University, Adana, 01330, Turkey
Murat Genç & M. Revan Özkale

Authors

Murat Genç
View author publications
You can also search for this author in PubMed Google Scholar
M. Revan Özkale
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Murat Genç.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

1.1 Proof of Theorem 1

Let

$$\begin{aligned} Q\left( \hat{\varvec{\beta }};{\mathbf {b}},\lambda ,\alpha \right) =\frac{1}{2n}\left\| {\mathbf {y}}-{\mathbf {X}}\varvec{\beta }\right\| _{2}^{2}+\lambda \left( \alpha \left\| \varvec{\beta }\right\| _{1}+\frac{1-\alpha }{2}\left\| \varvec{\beta }-{\mathbf {b}}\right\| _{2}^{2}\right) . \end{aligned}$$

We write the sub-gradients of this function with respect to $\beta _{i}$, $\beta _{j}$ and set them equal to zero:

$$\begin{aligned} \frac{\partial Q}{\partial \beta _{i}}=-\frac{1}{n}{\mathbf {x}}_{i}^{\top }\left( {\mathbf {y}}-{\mathbf {X}}\hat{\varvec{\beta }}\right) +\lambda \alpha {\hat{s}}_{i}+\lambda \left( 1-\alpha \right) \left( {\hat{\beta }}_{i}-b_{i}\right)&=0 \end{aligned}$$

(12)

$$\begin{aligned} \frac{\partial Q}{\partial \beta _{j}}=-\frac{1}{n}{\mathbf {x}}_{j}^{\top }\left( {\mathbf {y}}-{\mathbf {X}}\hat{\varvec{\beta }}\right) +\lambda \alpha {\hat{s}}_{j}+\lambda \left( 1-\alpha \right) \left( {\hat{\beta }}_{j}-b_{j}\right)&=0, \end{aligned}$$

(13)

where ${\hat{s}}_{i}$ and ${\hat{s}}_{j}$ are the sub-gradients of the absolute value function of $\beta _{i}$ and $\beta _{j}$.

Subtracting Eq. (12) from Eq. (13) and applying Cauchy Schwarz inequality, we get

$$\begin{aligned} \left| {\hat{\beta }}_{j}-{\hat{\beta }}_{i}-\left( b_{j}-b_{i}\right) \right| \le \frac{1}{n\lambda \left( 1-\alpha \right) }\sqrt{\left\| {\mathbf {x}}_{i}-{\mathbf {x}}_{j}\right\| _{2}^{2}\left\| \hat{{\mathbf {r}}}\right\| _{2}^{2}}, \end{aligned}$$

(14)

where $\hat{{\mathbf {r}}}={\mathbf {y}}-{\mathbf {X}}\hat{\varvec{\beta }}$. Since $\left\| {\mathbf {x}}_{i}-{\mathbf {x}}_{j}\right\| _{2}^{2}=2\left( 1-\rho \right) $, we obtain

$$\begin{aligned} \left| {\hat{\beta }}_{j}-{\hat{\beta }}_{i}-\left( b_{j}-b_{i}\right) \right| \le \frac{1}{n\lambda \left( 1-\alpha \right) }\sqrt{2\left( 1-\rho \right) \left\| \hat{{\mathbf {r}}}\right\| _{2}^{2}}. \end{aligned}$$

(15)

Furthermore, $Q\left( \hat{\varvec{\beta }};{\mathbf {b}},\lambda ,\alpha \right) \le Q\left( {\mathbf {0}};{\mathbf {b}},\lambda ,\alpha \right) $ holds because $\hat{\varvec{\beta }}$ is the minimizer of Q. Hence, we write

$$\begin{aligned} \frac{1}{2n}\left\| \hat{{\mathbf {r}}}\right\| _{2}^{2}+\lambda \alpha \left\| \hat{\varvec{\beta }}\right\| _{1}+\frac{\lambda \left( 1-\alpha \right) }{2}\left\| \hat{\varvec{\beta }}-{\mathbf {b}}\right\| _{2}^{2}\le \frac{1}{2n}\left\| {\mathbf {y}}\right\| _{2}^{2}+\frac{\lambda \left( 1-\alpha \right) }{2}\left\| {\mathbf {b}}\right\| _{2}^{2} \end{aligned}$$

which implies that

$$\begin{aligned} \left\| \hat{{\mathbf {r}}}\right\| _{2}^{2}\le \left\| {\mathbf {y}}\right\| _{2}^{2}+n\lambda \left( 1-\alpha \right) \left\| {\mathbf {b}}\right\| _{2}^{2}. \end{aligned}$$

(16)

If we consider Eqs. (15) and (16) together, then

$$\begin{aligned} \left| {\hat{\beta }}_{j}-{\hat{\beta }}_{i}-\left( b_{j}-b_{i}\right) \right|&\le \frac{1}{n\lambda \left( 1-\alpha \right) }\sqrt{2\left( 1-\rho \right) }\sqrt{\left\| {\mathbf {y}}\right\| _{1}^{2}+n\lambda \left( 1-\alpha \right) \left\| {\mathbf {b}}\right\| _{2}^{2}} \end{aligned}$$

which completes the proof.$\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Genç, M., Özkale, M.R. Usage of the GO estimator in high dimensional linear models. Comput Stat 36, 217–239 (2021). https://doi.org/10.1007/s00180-020-01001-2

Download citation

Received: 12 July 2019
Accepted: 11 June 2020
Published: 18 June 2020
Issue Date: March 2021
DOI: https://doi.org/10.1007/s00180-020-01001-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Usage of the GO estimator in high dimensional linear models

Abstract

Access this article

Similar content being viewed by others

RMSEA, CFI, and TLI in structural equation modeling with ordered categorical data: The story they tell depends on the estimation methods

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

semPower: General power analysis for structural equation models

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

1.1 Proof of Theorem 1

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Usage of the GO estimator in high dimensional linear models

Abstract

Access this article

Similar content being viewed by others

RMSEA, CFI, and TLI in structural equation modeling with ordered categorical data: The story they tell depends on the estimation methods

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

semPower: General power analysis for structural equation models

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

1.1 Proof of Theorem 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation