Skip to main content
Log in

Obtaining a threshold for the stewart index and its extension to ridge regression

  • Original paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

The linear regression model is widely applied to measure the relationship between a dependent variable and a set of independent variables. When the independent variables are related to each other, it is said that the model presents collinearity. If the relationship is between the intercept and at least one of the independent variables, the collinearity is nonessential, while if the relationship is between the independent variables (excluding the intercept), the collinearity is essential. The Stewart index allows the detection of both types of near multicollinearity. However, to the best of our knowledge, there are no established thresholds for this measure from which to consider that the multicollinearity is worrying. This is the main goal of this paper, which presents a Monte Carlo simulation to relate this measure to the condition number. An additional goal of this paper is to extend the Stewart index for its application after the estimation by ridge regression that is widely applied to estimate model with multicollinearity as an alternative to ordinary least squares (OLS). This extension could be also applied to determine the appropriate value for the ridge factor.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. A variable is standardized by resting its mean and dividing by the square root of n times the variance.

  2. To determine whether there is nonessential collinearity from the CV, it is necessary to check the thresholds provided by Salmerón et al. (2020c): \(CV (\mathbf {x}_{i}) < 0.1002506\), \(CV (\mathbf {x}_{i}) < 0.06674082\). In addition, VIF values higher than 10 should be obtained to verify that the collinearity is essential.

  3. This will be the default value used throughout the work.

References

  • Belsley DA, Kuh E, Welsch RE (2005) Regression diagnostics: Identifying influential data and sources of collinearity, vol 571. John Wiley & Sons, London

    MATH  Google Scholar 

  • García J, Salmerón R, García C, López Martín MdM (2016) Standardization of variables and collinearity diagnostic in ridge regression. International Statistical Review 84(2):245–266

    Article  MathSciNet  Google Scholar 

  • Gorman J, Toman R (1970) Selection of variables for fitting equations to data. Technometrics 8:27–51

    Article  Google Scholar 

  • Hoerl A, Kennard R (1970a) Ridge regression: applications to nonorthogonal problems. Technometrics 12(1):69–82

    Article  Google Scholar 

  • Hoerl A, Kennard R (1970b) Ridge regression: biased estimation for nnorthogonal problems. Technometrics 12(1):55–67

    Article  Google Scholar 

  • Klein LR, Goldberger AS (1955) An economic model of the Untied States 1929–1952. North-Holland Publishing Company, Amsterdam

    Google Scholar 

  • Marquardt DW (1970) Generalized inverses, ridge regression, biased linear estimation, and nonlinear estimation. Technometrics 12(3):591–612

    Article  Google Scholar 

  • Marquardt DW, Snee RD (1975) Ridge regression in practice. The American Statistician 29(1):3–20

    MATH  Google Scholar 

  • McDonald GC (2010) Tracing ridge regression coefficients. Wiley Interdisciplinary Reviews: Computational Statistics 2(6):695–703

    Article  Google Scholar 

  • R Core Team (2019) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, https://www.R-project.org/

  • Sánchez AR, Gómez RS, García C (2019) The coefficient of determination in the ridge regression. Commun Stat-Simul Comput. https://doi.org/10.1080/03610918.2019.1649421

  • Salmerón R, García C, García J (2018a) Variance inflation factor and condition number in multiple linear regression. Journal of Statistical Computation and Simulation 88(12):2365–2384

    Article  MathSciNet  Google Scholar 

  • Salmerón R, García J, García C, del Mar López M (2018b) Transformation of variables and the condition number in ridge estimation. Computational Statistics 33(3):1497–1524

    Article  MathSciNet  Google Scholar 

  • Salmerón R, García C, García J (2020) Comment on “a note on collinearity diagnostics and centering” by velilla (2018). The American Statistician 74(1):68–71

    Article  Google Scholar 

  • Salmerón R, García C, García J (2020a) Detection of near-multicollinearity through centered and noncentered regression. Mathematics 8(6):931–948

    Article  Google Scholar 

  • Salmerón R, García C, García J (2020b) A guide to using the r package “multicoll” for detecting multicollinearity. Computational Economics. https://doi.org/10.1007/s10614-019-09967-y

    Article  Google Scholar 

  • Salmerón R, Rodríguez A, García C (2020c) Diagnosis and quantification of the non-essential collinearity. Computational Statistics 35:647–66

    Article  MathSciNet  Google Scholar 

  • Silva T, Ribeiro A (2018) A new accelerated algorithm for ill-conditioned ridge regression problems. Computational and Applied Mathematics 37:1941–1958

    Article  MathSciNet  Google Scholar 

  • Stewart GW (1987) Collinearity and least squares regression. Statistical Science 2(1):68–84

    MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ainara Rodríguez Sánchez.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Main properties of \(k_{i}^{2} (\lambda )\)

Main properties of \(k_{i}^{2} (\lambda )\)

In this section the main properties of \(k_{i}^{2} (\lambda )\) are demonstrated.

Proposition 1

\(k_{i}^{2} (\lambda )\) is continuous at zero, that is, \(k_{i}^{2} (0) = k_{i}^{2}\).

Proof

Evident comparing expressions (7) and (14). Also from expression (16). \(\square \)

Proposition 2

\(k_{i}^{2} (\lambda )\) decreases as a function of \(\lambda \).

Proof

To study the monotony, McDonald (2010) is considered, where \(\mathbf {X}_{-i}^{t}\mathbf {X}_{-i} + \lambda \mathbf {I}_{p-1} = \varvec{\varGamma } \mathbf {D}_{\mu + \lambda } \varvec{\varGamma }^{t}\) with \(\mathbf {D}_{\mu + \lambda } = diag ((\mu _{1} + \lambda ), \dots , (\mu _{p-1} + \lambda ))\), \(\varvec{\varGamma }\) is the \((p-1) \times (p-1)\) orthogonal matrix of eigenvectors of the \((p-1) \times (p-1)\) matrix \(\mathbf {X}_{-i}^{t}\mathbf {X}_{-i}\) and \(\mu _{j}\) represents the eigenvalues of this matrix. Taking into account that \(\varvec{\gamma } = \mathbf {X}_{-i}^{t}\mathbf {x}_{i}\) and \(\varvec{\alpha } =\varvec{\gamma }^{t}\varvec{\varGamma }\):

$$\begin{aligned} k_{i}^{2} (\lambda )= & {} \frac{\mathbf {x}_{i}^{t}\mathbf {x}_{i} + \lambda }{(\mathbf {x}_{i}^{t}\mathbf {x}_{i} + \lambda ) - \varvec{\gamma }^{t}\varvec{\varGamma }\mathbf {D}_{\frac{1}{\mu + \lambda }}\varvec{\varGamma }^{t}\varvec{\gamma }} = \frac{\mathbf {x}_{i}^{t}\mathbf {x}_{i} + \lambda }{(\mathbf {x}_{i}^{t}\mathbf {x}_{i} + \lambda ) - \varvec{\alpha }\mathbf {D}_{\frac{1}{\mu + \lambda }}\varvec{\alpha }^{t}} \nonumber \\= & {} \frac{\mathbf {x}_{i}^{t}\mathbf {x}_{i} + \lambda }{(\mathbf {x}_{i}^{t}\mathbf {x}_{i} + \lambda ) - \sum \limits ^{p-1}_{j=1} \frac{\alpha ^{2}_{j}}{\mu _{j} + \lambda }}= \frac{1}{ 1 - \sum \limits ^{p-1}_{j=1} \frac{\alpha ^{2}_{j}}{(\mathbf {x}_{i}^{t}\mathbf {x}_{i} + \lambda )(\mu _{j} + \lambda )}} \nonumber \\= & {} \frac{1}{ 1 - \sum \limits ^{p-1}_{j=1} \frac{\alpha ^{2}_{j}}{ \lambda ^{2} + \lambda (\mathbf {x}_{i}^{t}\mathbf {x}_{i} + \mu _{j}) +\mathbf {x}_{i}^{t}\mathbf {x}_{i} \mu _{j}}}, \quad i = 1,\dots , p. \end{aligned}$$
(15)

From this expression, it is verified that \(k_{i}^{2} (\lambda )\) decreases as a function of \(\lambda \) because \(\lambda \ge 0\) and \(\mu _{j} > 0\), and it is verified that:

$$\begin{aligned} \frac{\partial k_{i}^{2} (\lambda )}{\partial \lambda } = \frac{-\sum \nolimits ^{p-1}_{j=1} \frac{\alpha ^{2}_{j} (2\lambda + \mathbf {x}_{i}^{t}\mathbf {x}_{i} + \mu _{j})}{(\lambda ^{2} + \lambda (\mathbf {x}_{i}^{t}\mathbf {x}_{i} + \mu _{j}) +\mathbf {x}_{i}^{t}\mathbf {x}_{i} \mu _{j})^{2}}}{\left( 1 - \sum \nolimits ^{p-1}_{j=1} \frac{\alpha ^{2}_{j}}{ \lambda ^{2} + \lambda (\mathbf {x}_{i}^{t}\mathbf {x}_{i} + \mu _{j}) +\mathbf {x}_{i}^{t}\mathbf {x}_{i} \mu _{j}}\right) ^{2}} < 0. \end{aligned}$$

\(\square \)

Proposition 3

\(k_{i}^{2} (\lambda ) \ge 1\), \(\forall \lambda \).

Proof

Suppose that:

$$\begin{aligned}&\frac{\mathbf {x}_{i}^{t}\mathbf {x}_{i} + \lambda }{(\mathbf {x}_{i}^{t}\mathbf {x}_{i} + \lambda ) - \sum \nolimits ^{p-1}_{j=1} \frac{\alpha ^{2}_{j}}{\mu _{j} + \lambda }}< 1 \Leftrightarrow \mathbf {x}_{i}^{t}\mathbf {x}_{i} + \lambda< (\mathbf {x}_{i}^{t}\mathbf {x}_{i} + \lambda ) - \sum \limits ^{p-1}_{j=1} \frac{\alpha ^{2}_{j}}{\mu _{j} + \lambda } \\&\Leftrightarrow 0 < - \sum \limits ^{p-1}_{j=1} \frac{\alpha ^{2}_{j}}{\mu _{j} + \lambda }. \end{aligned}$$

Because the last condition is not possible since \(\lambda \ge 0\) and \(\mu _{j} > 0\), it can be concluded that the first supposition is not true, and consequently, \(k_{i}^{2} (\lambda ) \ge 1\), \(\forall \lambda .\) In addition, from expression (15), it is evident that \(\lim \nolimits _{\lambda \rightarrow +\infty } k_{i}^{2}(\lambda ) = 1\). \(\square \)

Proposition 4

\(k_{i}^{2}(\lambda )\) is related to \(k_{i}^{2}\).

Proof

Starting from the expression (14) and taking into account that the augmented model (4) is given as \(\widetilde{\mathbf {x}}_{i} = \widetilde{\mathbf {X}}_{-i} \varvec{\delta }+\mathbf {v}\) being \(\widetilde{\mathbf {X}} = \left( \begin{array}{c} \mathbf {X}\\ \sqrt{\lambda }\mathbf {I}_{p} \end{array}\right) \), it is obtained that:

$$\begin{aligned} \widehat{\varvec{\delta }} = (\widetilde{\mathbf {X}}^{t}_{-i}\widetilde{\mathbf {X}}_{-i})^{-1} \widetilde{\mathbf {X}}^{t}_{-i}\widetilde{\mathbf {x}}_{i}= (\mathbf {X}^{t}_{-i}\mathbf {X}_{-i} + \lambda \mathbf {I}_{p-1})^{-1} \mathbf {X}^{t}_{-i}\mathbf {x}_{i}, \end{aligned}$$

and consequently:

$$\begin{aligned} SSR_{i} (\lambda )= & {} ( \widetilde{\mathbf {x}}_{i} - \widetilde{\mathbf {X}}_{-i} \widehat{\varvec{\delta }})^{t} (\widetilde{\mathbf {x}}_{i} - \widetilde{\mathbf {X}}_{-i} \widehat{\varvec{\delta }}) \\= & {} \mathbf {x}_{i}^{t}\mathbf {x}_{i} + \lambda - \mathbf {x}_{i}^{t}\mathbf {X}_{-i} (\mathbf {X}^{t}_{-i}\mathbf {X}_{-i} + \lambda \mathbf {I}_{p-1})^{-1} \mathbf {X}^{t}_{-i}\mathbf {x}_{i}. \end{aligned}$$

Then, adding and subtracting the same quantity \(\mathbf {x}_{i}^{t}\mathbf {X}_{-i}(\mathbf {X}_{-i}^{t}\mathbf {X}_{-i})^{-1}\mathbf {X}_{-i}^{t}\mathbf {x}_{i}\), we obtain:

$$\begin{aligned} SSR_{i} (\lambda )= & {} \mathbf {x}_{i}^{t}\mathbf {x}_{i} - \mathbf {x}_{i}^{t}\mathbf {X}_{-i}(\mathbf {X}_{-i}^{t}\mathbf {X}_{-i})^{-1}\mathbf {X}_{-i}^{t}\mathbf {x}_{i} + \lambda + \mathbf {x}_{i}^{t}\mathbf {X}_{-i}(\mathbf {X}_{-i}^{t}\mathbf {X}_{-i})^{-1}\mathbf {X}_{-i}^{t}\mathbf {x}_{i} \\&- \mathbf {x}_{i}^{t}\mathbf {X}_{-i} (\mathbf {X}^{t}_{-i}\mathbf {X}_{-i} + \lambda \mathbf {I}_{p-1})^{-1} \mathbf {X}^{t}_{-i}\mathbf {x}_{i} \\= & {} SSR_{i} + \lambda + \mathbf {x}_{i}^{t}\mathbf {X}_{-i} [( \mathbf {X}_{-i}^{t}\mathbf {X}_{-i})^{-1} - ( \mathbf {X}_{-i}^{t}\mathbf {X}_{-i} +\lambda \mathbf {I}_{p-1})^{-1}] \mathbf {X}_{-i}^{t}\mathbf {x}_{i} \\= & {} SSR_{i} + \lambda + a_{i}(\lambda ), \end{aligned}$$

where \(SSR_{i}\) denotes the sum of squares residuals of the auxiliary regression (8).

Then, because \(\mathbf {x}_{i}^{t} \mathbf {x}_{i} = SST_{i} + n \overline{\mathbf {x}}_{i}^{2}\), where \(SST_{i}\) denotes the sum of squares totals of the auxiliary regression (8), we obtain:

$$\begin{aligned} k_{i}^{2} (\lambda )= & {} \frac{SST_{i} + n \overline{\mathbf {x}}_{i}^{2} + \lambda }{SSR_{i} + \lambda + a_{i}(\lambda )} = \frac{\frac{SST_{i}}{SSR_{i}}+\frac{n\overline{\mathbf {x}}_{i}^{2}}{SSR_{i}} + \frac{\lambda }{SSR_{i}}}{1 + \frac{\lambda + a_{i}(\lambda )}{SSR_{i}}}\nonumber \\= & {} \frac{1}{1 + b_{i}(\lambda )} \left( VIF_{i} + \frac{n\overline{\mathbf {x}}_{i}^{2}}{SSR_{i}} + \frac{\lambda }{SSR_{i}}\right) = \frac{1}{1 + b_{i}(\lambda )} \left( k_{i}^{2}+ \frac{\lambda }{SSR_{i}}\right) . \end{aligned}$$
(16)

\(\square \)

Proposition 5

For standardized data, \(k_{i}^{2}(\lambda )\) coincide with \(VIF(i,\lambda )\).

Proof

On the one hand, expression (14) can be expressed as:

$$\begin{aligned} k_{i}^{2}(\lambda ) = \frac{1}{1 - \frac{\mathbf {x}_{i}^{t} \mathbf {X}_{-i} \left( \mathbf {X}_{-i}^{t}\mathbf {X}_{-i} + \lambda \mathbf {I}_{p-1} \right) ^{-1} \mathbf {X}_{-i}^{t}\mathbf {x}_{i}}{\mathbf {x}_{i}^{t}\mathbf {x}_{i} + \lambda }}, \end{aligned}$$
(17)

and, on the other hand, starting from the augmented model (4), for \(\widetilde{\mathbf {x}}_{i} = \widetilde{\mathbf {X}}_{-i} \varvec{\delta }+\mathbf {v}\), the following sum of squares explained and total are obtained:

$$\begin{aligned} \widetilde{SSE}= & {} \widehat{\varvec{\delta }}^{t} \widetilde{\mathbf {X}}_{-i} \widetilde{\mathbf {x}}_{i} - (n+p) \overline{\widetilde{\mathbf {x}}}_{i}^{2}\\= & {} \mathbf {x}_{i}^{t} \mathbf {X}_{-i} \left( \mathbf {X}_{-i}^{t}\mathbf {X}_{-i} + \lambda \mathbf {I}_{p-1} \right) ^{-1} \mathbf {X}_{-i}^{t}\mathbf {x}_{i} - (n+p) \overline{\widetilde{\mathbf {x}}}_{i}^{2}, \\ \widetilde{SST}= & {} \widetilde{\mathbf {x}}_{i}^{t} \widetilde{\mathbf {x}}_{i} - (n+p) \overline{\widetilde{\mathbf {x}}}_{i}^{2} = \mathbf {x}_{i}^{t} \mathbf {x}_{i} + \lambda - (n+p) \overline{\widetilde{\mathbf {x}}}_{i}^{2}. \end{aligned}$$

Then, due to:

$$\begin{aligned} VIF(i,\lambda ) = \frac{1}{1 - \frac{\widetilde{SSE}}{\widetilde{SST}}}, \end{aligned}$$
(18)

it can be concluded that expressions (17) and (18) coincide if \(\overline{\widetilde{\mathbf {x}}}_{i} = 0\). Thus, because \(\overline{\widetilde{\mathbf {x}}}_{i} = \frac{n \overline{\mathbf {x}}_{i} + \sqrt{\lambda }}{n+p}\), to be equal to zero, it is necessary to verify that \(\overline{\mathbf {x}}_{i} = 0 = \lambda \). That is, the ridge regression coincides with OLS.

Alternatively, the matrix \(\widetilde{\mathbf {X}}\) can be standardized, since in that case, all the implied variables have a mean equal to zero. This possibility is in agreement with the conclusion presented in García et al. (2016), where it was established that standardized data must be used for a correct application of the VIF in ridge regression. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sánchez, A.R., Gómez, R.S. & García, C.G. Obtaining a threshold for the stewart index and its extension to ridge regression. Comput Stat 36, 1011–1029 (2021). https://doi.org/10.1007/s00180-020-01047-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-020-01047-2

Keywords

Navigation