Skip to main content
Log in

An iterative approach to minimize the mean squared error in ridge regression

  • Original Paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

The methods of computing the ridge parameters have been studied for more than four decades. However, there is still no way to compute its optimal value. Nevertheless, many methods have been proposed to yield ridge regression estimators of smaller mean squared errors than the least square estimators empirically. This paper compares the mean squared errors of 26 existing methods for ridge regression in different scenarios. A new approach is also proposed, which minimizes the empirical mean squared errors iteratively. It is found that the existing methods can be divided into two groups: one is those that are better, but only slightly, than the least squares method in many cases, and the other is those that are much better than the least squares method in only some cases but can be (sometimes much) worse than it in many others. The new method, though not uniformly the best, outperforms the least squares method well in many cases and underperforms it only slightly in a few cases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Alkhamisi M, Khalaf G, Shukur G (2006) Some modifications for choosing ridge parameters. Commun Stat Theory Methods 35:2005–2020

    Article  MATH  MathSciNet  Google Scholar 

  • Alkhamisi MA, Shukur G (2007) A Monte Carlo study of recent ridge parameters. Commun Stat Simul Comput 36:535–547

    Article  MATH  MathSciNet  Google Scholar 

  • Allen DM (1974) The relationship between variable selection and data agumentation and a method for prediction. Technometrics 16:125–127

    Article  MATH  MathSciNet  Google Scholar 

  • Brown PJ (1994) Measurement, regression, and calibration. Oxford University Press, New York

    Google Scholar 

  • Clark AE, Troskie CG (2006) Ridge regression—a simulation study. Commun Stat Simul Comput 35:605–619

    Article  MATH  MathSciNet  Google Scholar 

  • Delaney NJ, Chatterjee S (1986) Use of the bootstrap and cross-validation in ridge regression. J Bus Econ Stat 4:255–262

    Google Scholar 

  • Dempster AP, Schatzoff M, Wermuth N (1977) A simulation study of alternatives to ordinary least squares. J Am Stat Assoc 72:77–91

    Article  MATH  Google Scholar 

  • Dorugade AV, Kashid DN (2010) Alternative method for choosing ridge parameter for regression. Appl Math Sci 4:447–456

    MATH  MathSciNet  Google Scholar 

  • Fahrmeir L, Kneib T, Lang S, Marx B (2013) Regression. Models, methods and applications. Springer, Berlin

    MATH  Google Scholar 

  • Farebrother RW (1976) Further results on the mean square error of ridge regression. J R Stat Soc Ser B 38:248–250

    MATH  MathSciNet  Google Scholar 

  • Golub GH, Heath M, Wahba G (1979) Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics 21:215–223

    Article  MATH  MathSciNet  Google Scholar 

  • Groß J (2003) Linear regression. Lecture notes in statistics 175. Springer, Berlin

    Google Scholar 

  • Hald A (1952) Statistical theory with engineering applications. Wiley, New York

    MATH  Google Scholar 

  • Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning, 2nd edn. Springer, New York

  • Hocking RR, Speed FM, Lynn MJ (1976) A class of biased estimators in linear regression. Technometrics 18:425–437

    Article  MATH  MathSciNet  Google Scholar 

  • Hoerl AE, Kannard RW, Baldwin KF (1975) Ridge regression: some simulations. Commun Stat Theory Methods 4:105–123

    Article  MATH  Google Scholar 

  • Hoerl AE, Kennard RW (1970a) Ridge regression: applications to nonorthogonal problems. Technometrics 12:69–82

    Article  MATH  Google Scholar 

  • Hoerl AE, Kennard RW (1970b) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12:55–67

    Article  MATH  Google Scholar 

  • Hoerl AE, Kennard RW (1976) Ridge regression iterative estimation of the biasing parameter. Commun Stat Theory Methods 5:77–88

    Article  Google Scholar 

  • Khalaf G, Shukur G (2005) Choosing ridge parameter for regression problems. Commun Stat Theory Methods 34:1177–1182

    Article  MATH  MathSciNet  Google Scholar 

  • Kibria BMG (2003) Performance of some new ridge regression estimators. Commun Stat Simul Comput 32:419–435

    Article  MATH  MathSciNet  Google Scholar 

  • Kutner MH, Nachtsheim CJ, Neter J, Li W (2005) Applied linear statistical models, 5th edn. McGraw-Hill/Irwin, Boston

    Google Scholar 

  • Lawless JF, Wang P (1976) A simulation study of ridge and other regression estimators. Commun Stat Theory Methods 5:307–323

    Article  Google Scholar 

  • McDonald GC (2009) Ridge regression. Wiley Interdiscip Rev Comput Stat 1:93–100

    Article  Google Scholar 

  • McDonald GC, Galarneau DI (1975) A Monte Carlo evaluation of some ridge-type estimators. J Am Stat Assoc 70:407–416

    Article  MATH  Google Scholar 

  • Muniz G, Kibria BMG (2009) On some ridge regression estimators: an empirical comparisons. Commun Stat Simul Comput 38:621–630

    Article  MATH  MathSciNet  Google Scholar 

  • Newhouse JP, Oman SD (1971) An evaluation of ridge estimators. Technical Report R-716-PR, The RAND Corporation

  • Nomura M (1988) On the almost unbiased ridge regression estimator. Commun Stat Simul Comput 17:729–743

    Article  MATH  MathSciNet  Google Scholar 

  • Theobald CM (1974) Generalizations of mean square error applied to ridge regression. J R Stat Soc Ser B 36:103–106

    MATH  MathSciNet  Google Scholar 

  • Tibshirani RJ (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B 58:267–288

    MATH  MathSciNet  Google Scholar 

Download references

Acknowledgments

We thank the two referees for their helpful suggestions. Research supported by a GRF grant from the Research Grants Council of the Hong Kong Special Administrative Region, China (Project No. HKBU200710).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sung Nok Chiu.

Appendix: List of the ridge parameters considered

Appendix: List of the ridge parameters considered

Denote by \(e_i(k)\) the residual of the \(i\)th observation in the fitted model with ridge parameter \(k\), \(H(k) = [h_{ij}(k)] = \varvec{X}(\varvec{X}'\varvec{X}+k\varvec{I})^{-1}\varvec{X}'\), \(r\) the rank of \(\varvec{X}\), \(\lambda _{\max } = \lambda _1\) the largest eigenvalue of \(\varvec{X}'\varvec{X}\), and \(\hat{\alpha }_{\max }\) the maximum among \(\hat{\alpha }_i\).

1.

\(k = \hat{\sigma }^2/\hat{\alpha }_{\max }^2\)

Hoerl and Kennard (1970b)

2.

\( k = p\hat{\sigma }^2/(\varvec{\hat{\alpha }}'\varvec{\hat{\alpha }})\)

Hoerl et al. (1975)

3.

\( k = \hat{\sigma }^2 (\sum {\lambda _i^2\hat{\alpha }_i^2})/{\sum {(\lambda _i\hat{\alpha }_i^2)^2}}\)

Hocking et al. (1976)

4.

\(k_{(-1)}=0\) and for \(i \ge 0\), compute iteratively

Hoerl and Kennard (1976)

 

\( k_{(i)} = p\hat{\sigma }^2/\{\varvec{\tilde{\alpha }}(k_{(i-1)})'\varvec{\tilde{\alpha }}(k_{(i-1)})\}\)

 
 

until \((k_{(i)}-k_{(i-1)})/k_{(i-1)} \leqslant \delta \),

 
 

and finally choose \(k = k_{(i)}\),

 
 

where \(\delta =20 \cdot \text {tr}((\varvec{X}'\varvec{X})^{-1}/p)^{-1.3}\)

 

5.

\( k = p\hat{\sigma }^2/(\sum {\lambda _i\hat{\alpha }_i^2})\)

Lawless and Wang (1976)

6.

\(k\) satisfies \( \sum { \hat{\alpha }_i^2/(\hat{\sigma }^2/k + \hat{\sigma }^2/\lambda _i) } = p\)

Dempster et al. (1977)

7.

\(k = \hbox {arg} \min _{u \ge 0} \frac{1}{n}\sum {e_i(u)^2/\{1-h_{ii}(u)\}^2}\)

Allen (1974)

8.

\(k = \hbox {arg} \min _{u \ge 0} n\sum {e_i(u)^2}/[\sum {\{1-h_{ii}(u)\}}]^2\)

Golub et al. (1979)

9.

For the \(j\)th bootstrap sample of size \(n\), chosen randomly with replacement from the observations, ridge estimates are computed for each member in a pre-selected set \(\Theta \) of ridge parameter values, \(1 \le j \le B\). Let \(\hat{\varvec{Y}}\!_j(u)\) be the prediction vector for the unchosen observations \(\varvec{Y}\!_j\) from the ridge estimates with ridge parameter value \(u\). Choose \(k = \underset{u \in \Theta }{\arg \min } \frac{\sum _{j=1}^B{(\hat{\varvec{Y}}\!_{j}(u)-\varvec{Y}\!_{j})'(\hat{\varvec{Y}}\!_{j}(u)-\varvec{Y}\!_{j})}}{\sum _{j=1}^B \# \{\hbox {elements in} \,\varvec{Y}\!_j\}} \)

Delaney and Chatterjee (1986)

10.

\( k = p\hat{\sigma }^2/[\sum \{ \hat{\alpha }_i^2/(1+\sqrt{1+\lambda _i\hat{\alpha }_i^2/\hat{\sigma }^2})\}]\)

Nomura (1988)

11.

\( k = (r-2)\hat{\sigma }^2/(\varvec{\hat{\alpha }}'\varvec{\hat{\alpha }})\)

Brown (1994)

12.

\( k = (r-2)\hat{\sigma }^2\text {tr}(\varvec{X}'\varvec{X})/(r\varvec{\hat{Y}}'\varvec{\hat{Y}})\) where \(\varvec{\hat{Y}}\) is the predicted \(\varvec{Y}\) using OLS

Brown (1994)

13.

\( k = \hat{\sigma }^2/(\prod {\hat{\alpha }_i^2})^{\frac{1}{p}}\)

Kibria (2003)

14.

\( k = \text {median}\left\{ \hat{\sigma }^2/\hat{\alpha }_i^2\right\} \)

Kibria (2003)

15.

\( k = \lambda _{\max }\hat{\sigma }^2/\{\lambda _{\max }\hat{\alpha }_{\max }^2+(n-p)\hat{\sigma }^2\}\)

Khalaf and Shukur (2005)

16.

\( k = \max \left\{ \lambda _i\hat{\sigma }^2/[(n-p)\hat{\sigma }^2+\lambda _i\hat{\alpha }_i^2]\right\} \)

Alkhamisi et al. (2006)

17.

\(k = \hbox {arg} \min _{u \ge 0} \hbox {ICOMP}(u)\)

Clark and Troskie (2006)

 

where

 
 

\(\begin{array}{ll} \hbox {ICOMP}(u) &{}= -2\log L(\varvec{\tilde{\beta }}(u))+d\log \left( \sum \limits _{i=1}^p{ \frac{\lambda _i}{(\lambda _i+u)^2} } \right) \\ &{}\quad -d\log (d) - \sum \limits _{i=1}^p{\log \left( \frac{\lambda _i}{(\lambda _i+u)^2} \right) } \end{array}\)

 
 

in which \(L(\cdot )\) is the likelihood function and

 
 

   \(d = \text {rank of diag}\left\{ \frac{\lambda _1}{(\lambda _1+k)^2}, \ldots , \frac{\lambda _p}{(\lambda _p+k)^2} \right\} \)

 

18.

\(k=\left\{ \begin{array}{ll} k_{5} &{} \hbox {if}\, k_{17}<k_{5}\\ k_{17} &{} \hbox {otherwise} \end{array} \right. \)

Clark and Troskie (2006)

19.

\( k = \max \left\{ \hat{\sigma }^2/\hat{\alpha }_i^2+1/\lambda _i\right\} \)

Alkhamisi and Shukur (2007)

20.

\( k = \left\{ \sum {\left( \hat{\sigma }^2/\hat{\alpha }_i^2+1/\lambda _i\right) }\right\} /p\)

Alkhamisi and Shukur (2007)

21.

\( k = \text {median}\left\{ \hat{\sigma }^2/\hat{\alpha }_i^2+1/\lambda _i\right\} \)

Alkhamisi and Shukur (2007)

22.

\( k = p\hat{\sigma }^2/(\sum {\lambda _i\hat{\alpha }_i^2})+1/\lambda _{\max }\)

Alkhamisi and Shukur (2007)

23.

\( k = \left( \prod {\sqrt{\hat{\alpha }_i^2/\hat{\sigma }^2}}\right) ^{1/p}\)

Muniz and Kibria (2009)

24.

\( k =\left( \prod {\sqrt{\hat{\sigma }^2/\hat{\alpha }_i^2}}\right) ^{1/p}\)

Muniz and Kibria (2009)

25.

\( k = \text {median}\left\{ \sqrt{\hat{\alpha }_i^2/\hat{\sigma }^2}\right\} \)

Muniz and Kibria (2009)

26.

\( k = \max \left\{ 0,p\hat{\sigma }^2/(\varvec{\hat{\alpha }}'\varvec{\hat{\alpha }})-1/(n \text {VIF}_{\max })\right\} \),

Dorugade and Kashid (2010)

 

where \(\hbox {VIF}_{\max }\) is the maximum among the variance inflation factors of the \(p\) regressors

 

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wong, K.Y., Chiu, S.N. An iterative approach to minimize the mean squared error in ridge regression. Comput Stat 30, 625–639 (2015). https://doi.org/10.1007/s00180-015-0557-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-015-0557-y

Keywords

Navigation