Obtaining a threshold for the stewart index and its extension to ridge regression

Sánchez, Ainara Rodríguez; Gómez, Román Salmerón; García, Catalina García

doi:10.1007/s00180-020-01047-2

Obtaining a threshold for the stewart index and its extension to ridge regression

Original paper
Published: 20 November 2020

Volume 36, pages 1011–1029, (2021)
Cite this article

Computational Statistics Aims and scope Submit manuscript

Ainara Rodríguez Sánchez ORCID: orcid.org/0000-0001-9925-9802¹,
Román Salmerón Gómez² &
Catalina García García²

234 Accesses
3 Citations
Explore all metrics

Abstract

The linear regression model is widely applied to measure the relationship between a dependent variable and a set of independent variables. When the independent variables are related to each other, it is said that the model presents collinearity. If the relationship is between the intercept and at least one of the independent variables, the collinearity is nonessential, while if the relationship is between the independent variables (excluding the intercept), the collinearity is essential. The Stewart index allows the detection of both types of near multicollinearity. However, to the best of our knowledge, there are no established thresholds for this measure from which to consider that the multicollinearity is worrying. This is the main goal of this paper, which presents a Monte Carlo simulation to relate this measure to the condition number. An additional goal of this paper is to extend the Stewart index for its application after the estimation by ridge regression that is widely applied to estimate model with multicollinearity as an alternative to ordinary least squares (OLS). This extension could be also applied to determine the appropriate value for the ridge factor.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Transformation of variables and the condition number in ridge estimation

Article 05 October 2017

Recent results in ridge regression methods

Article 15 July 2015

Ridge-Type MML Estimator in the Linear Regression Model

Article 02 March 2018

Notes

A variable is standardized by resting its mean and dividing by the square root of n times the variance.
To determine whether there is nonessential collinearity from the CV, it is necessary to check the thresholds provided by Salmerón et al. (2020c): $CV (\mathbf {x}_{i}) < 0.1002506$, $CV (\mathbf {x}_{i}) < 0.06674082$. In addition, VIF values higher than 10 should be obtained to verify that the collinearity is essential.
This will be the default value used throughout the work.

References

Belsley DA, Kuh E, Welsch RE (2005) Regression diagnostics: Identifying influential data and sources of collinearity, vol 571. John Wiley & Sons, London
MATH Google Scholar
García J, Salmerón R, García C, López Martín MdM (2016) Standardization of variables and collinearity diagnostic in ridge regression. International Statistical Review 84(2):245–266
Article MathSciNet Google Scholar
Gorman J, Toman R (1970) Selection of variables for fitting equations to data. Technometrics 8:27–51
Article Google Scholar
Hoerl A, Kennard R (1970a) Ridge regression: applications to nonorthogonal problems. Technometrics 12(1):69–82
Article Google Scholar
Hoerl A, Kennard R (1970b) Ridge regression: biased estimation for nnorthogonal problems. Technometrics 12(1):55–67
Article Google Scholar
Klein LR, Goldberger AS (1955) An economic model of the Untied States 1929–1952. North-Holland Publishing Company, Amsterdam
Google Scholar
Marquardt DW (1970) Generalized inverses, ridge regression, biased linear estimation, and nonlinear estimation. Technometrics 12(3):591–612
Article Google Scholar
Marquardt DW, Snee RD (1975) Ridge regression in practice. The American Statistician 29(1):3–20
MATH Google Scholar
McDonald GC (2010) Tracing ridge regression coefficients. Wiley Interdisciplinary Reviews: Computational Statistics 2(6):695–703
Article Google Scholar
R Core Team (2019) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, https://www.R-project.org/
Sánchez AR, Gómez RS, García C (2019) The coefficient of determination in the ridge regression. Commun Stat-Simul Comput. https://doi.org/10.1080/03610918.2019.1649421
Salmerón R, García C, García J (2018a) Variance inflation factor and condition number in multiple linear regression. Journal of Statistical Computation and Simulation 88(12):2365–2384
Article MathSciNet Google Scholar
Salmerón R, García J, García C, del Mar López M (2018b) Transformation of variables and the condition number in ridge estimation. Computational Statistics 33(3):1497–1524
Article MathSciNet Google Scholar
Salmerón R, García C, García J (2020) Comment on “a note on collinearity diagnostics and centering” by velilla (2018). The American Statistician 74(1):68–71
Article Google Scholar
Salmerón R, García C, García J (2020a) Detection of near-multicollinearity through centered and noncentered regression. Mathematics 8(6):931–948
Article Google Scholar
Salmerón R, García C, García J (2020b) A guide to using the r package “multicoll” for detecting multicollinearity. Computational Economics. https://doi.org/10.1007/s10614-019-09967-y
Article Google Scholar
Salmerón R, Rodríguez A, García C (2020c) Diagnosis and quantification of the non-essential collinearity. Computational Statistics 35:647–66
Article MathSciNet Google Scholar
Silva T, Ribeiro A (2018) A new accelerated algorithm for ill-conditioned ridge regression problems. Computational and Applied Mathematics 37:1941–1958
Article MathSciNet Google Scholar
Stewart GW (1987) Collinearity and least squares regression. Statistical Science 2(1):68–84
MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Economic Theory and History, University of Granada, Granada, Spain
Ainara Rodríguez Sánchez
Department of Quantitative Methods for Economics and Business, University of Granada, Granada, Spain
Román Salmerón Gómez & Catalina García García

Authors

Ainara Rodríguez Sánchez
View author publications
You can also search for this author in PubMed Google Scholar
Román Salmerón Gómez
View author publications
You can also search for this author in PubMed Google Scholar
Catalina García García
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ainara Rodríguez Sánchez.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Main properties of $k_{i}^{2} (\lambda )$

In this section the main properties of $k_{i}^{2} (\lambda )$ are demonstrated.

Proposition 1

$k_{i}^{2} (\lambda )$ is continuous at zero, that is, $k_{i}^{2} (0) = k_{i}^{2}$.

Proof

Evident comparing expressions (7) and (14). Also from expression (16). $\square $

Proposition 2

$k_{i}^{2} (\lambda )$ decreases as a function of $\lambda $.

Proof

To study the monotony, McDonald (2010) is considered, where $\mathbf {X}_{-i}^{t}\mathbf {X}_{-i} + \lambda \mathbf {I}_{p-1} = \varvec{\varGamma } \mathbf {D}_{\mu + \lambda } \varvec{\varGamma }^{t}$ with $\mathbf {D}_{\mu + \lambda } = diag ((\mu _{1} + \lambda ), \dots , (\mu _{p-1} + \lambda ))$, $\varvec{\varGamma }$ is the $(p-1) \times (p-1)$ orthogonal matrix of eigenvectors of the $(p-1) \times (p-1)$ matrix $\mathbf {X}_{-i}^{t}\mathbf {X}_{-i}$ and $\mu _{j}$ represents the eigenvalues of this matrix. Taking into account that $\varvec{\gamma } = \mathbf {X}_{-i}^{t}\mathbf {x}_{i}$ and $\varvec{\alpha } =\varvec{\gamma }^{t}\varvec{\varGamma }$:

$$\begin{aligned} k_{i}^{2} (\lambda )= & {} \frac{\mathbf {x}_{i}^{t}\mathbf {x}_{i} + \lambda }{(\mathbf {x}_{i}^{t}\mathbf {x}_{i} + \lambda ) - \varvec{\gamma }^{t}\varvec{\varGamma }\mathbf {D}_{\frac{1}{\mu + \lambda }}\varvec{\varGamma }^{t}\varvec{\gamma }} = \frac{\mathbf {x}_{i}^{t}\mathbf {x}_{i} + \lambda }{(\mathbf {x}_{i}^{t}\mathbf {x}_{i} + \lambda ) - \varvec{\alpha }\mathbf {D}_{\frac{1}{\mu + \lambda }}\varvec{\alpha }^{t}} \nonumber \\= & {} \frac{\mathbf {x}_{i}^{t}\mathbf {x}_{i} + \lambda }{(\mathbf {x}_{i}^{t}\mathbf {x}_{i} + \lambda ) - \sum \limits ^{p-1}_{j=1} \frac{\alpha ^{2}_{j}}{\mu _{j} + \lambda }}= \frac{1}{ 1 - \sum \limits ^{p-1}_{j=1} \frac{\alpha ^{2}_{j}}{(\mathbf {x}_{i}^{t}\mathbf {x}_{i} + \lambda )(\mu _{j} + \lambda )}} \nonumber \\= & {} \frac{1}{ 1 - \sum \limits ^{p-1}_{j=1} \frac{\alpha ^{2}_{j}}{ \lambda ^{2} + \lambda (\mathbf {x}_{i}^{t}\mathbf {x}_{i} + \mu _{j}) +\mathbf {x}_{i}^{t}\mathbf {x}_{i} \mu _{j}}}, \quad i = 1,\dots , p. \end{aligned}$$

(15)

From this expression, it is verified that $k_{i}^{2} (\lambda )$ decreases as a function of $\lambda $ because $\lambda \ge 0$ and $\mu _{j} > 0$, and it is verified that:

$$\begin{aligned} \frac{\partial k_{i}^{2} (\lambda )}{\partial \lambda } = \frac{-\sum \nolimits ^{p-1}_{j=1} \frac{\alpha ^{2}_{j} (2\lambda + \mathbf {x}_{i}^{t}\mathbf {x}_{i} + \mu _{j})}{(\lambda ^{2} + \lambda (\mathbf {x}_{i}^{t}\mathbf {x}_{i} + \mu _{j}) +\mathbf {x}_{i}^{t}\mathbf {x}_{i} \mu _{j})^{2}}}{\left( 1 - \sum \nolimits ^{p-1}_{j=1} \frac{\alpha ^{2}_{j}}{ \lambda ^{2} + \lambda (\mathbf {x}_{i}^{t}\mathbf {x}_{i} + \mu _{j}) +\mathbf {x}_{i}^{t}\mathbf {x}_{i} \mu _{j}}\right) ^{2}} < 0. \end{aligned}$$

$\square $

Proposition 3

$k_{i}^{2} (\lambda ) \ge 1$, $\forall \lambda $.

Proof

Suppose that:

$$\begin{aligned}&\frac{\mathbf {x}_{i}^{t}\mathbf {x}_{i} + \lambda }{(\mathbf {x}_{i}^{t}\mathbf {x}_{i} + \lambda ) - \sum \nolimits ^{p-1}_{j=1} \frac{\alpha ^{2}_{j}}{\mu _{j} + \lambda }}< 1 \Leftrightarrow \mathbf {x}_{i}^{t}\mathbf {x}_{i} + \lambda< (\mathbf {x}_{i}^{t}\mathbf {x}_{i} + \lambda ) - \sum \limits ^{p-1}_{j=1} \frac{\alpha ^{2}_{j}}{\mu _{j} + \lambda } \\&\Leftrightarrow 0 < - \sum \limits ^{p-1}_{j=1} \frac{\alpha ^{2}_{j}}{\mu _{j} + \lambda }. \end{aligned}$$

Because the last condition is not possible since $\lambda \ge 0$ and $\mu _{j} > 0$, it can be concluded that the first supposition is not true, and consequently, $k_{i}^{2} (\lambda ) \ge 1$, $\forall \lambda .$ In addition, from expression (15), it is evident that $\lim \nolimits _{\lambda \rightarrow +\infty } k_{i}^{2}(\lambda ) = 1$. $\square $

Proposition 4

$k_{i}^{2}(\lambda )$ is related to $k_{i}^{2}$.

Proof

Starting from the expression (14) and taking into account that the augmented model (4) is given as $\widetilde{\mathbf {x}}_{i} = \widetilde{\mathbf {X}}_{-i} \varvec{\delta }+\mathbf {v}$ being $\widetilde{\mathbf {X}} = \left( \begin{array}{c} \mathbf {X}\\ \sqrt{\lambda }\mathbf {I}_{p} \end{array}\right) $, it is obtained that:

$$\begin{aligned} \widehat{\varvec{\delta }} = (\widetilde{\mathbf {X}}^{t}_{-i}\widetilde{\mathbf {X}}_{-i})^{-1} \widetilde{\mathbf {X}}^{t}_{-i}\widetilde{\mathbf {x}}_{i}= (\mathbf {X}^{t}_{-i}\mathbf {X}_{-i} + \lambda \mathbf {I}_{p-1})^{-1} \mathbf {X}^{t}_{-i}\mathbf {x}_{i}, \end{aligned}$$

and consequently:

$$\begin{aligned} SSR_{i} (\lambda )= & {} ( \widetilde{\mathbf {x}}_{i} - \widetilde{\mathbf {X}}_{-i} \widehat{\varvec{\delta }})^{t} (\widetilde{\mathbf {x}}_{i} - \widetilde{\mathbf {X}}_{-i} \widehat{\varvec{\delta }}) \\= & {} \mathbf {x}_{i}^{t}\mathbf {x}_{i} + \lambda - \mathbf {x}_{i}^{t}\mathbf {X}_{-i} (\mathbf {X}^{t}_{-i}\mathbf {X}_{-i} + \lambda \mathbf {I}_{p-1})^{-1} \mathbf {X}^{t}_{-i}\mathbf {x}_{i}. \end{aligned}$$

Then, adding and subtracting the same quantity $\mathbf {x}_{i}^{t}\mathbf {X}_{-i}(\mathbf {X}_{-i}^{t}\mathbf {X}_{-i})^{-1}\mathbf {X}_{-i}^{t}\mathbf {x}_{i}$, we obtain:

$$\begin{aligned} SSR_{i} (\lambda )= & {} \mathbf {x}_{i}^{t}\mathbf {x}_{i} - \mathbf {x}_{i}^{t}\mathbf {X}_{-i}(\mathbf {X}_{-i}^{t}\mathbf {X}_{-i})^{-1}\mathbf {X}_{-i}^{t}\mathbf {x}_{i} + \lambda + \mathbf {x}_{i}^{t}\mathbf {X}_{-i}(\mathbf {X}_{-i}^{t}\mathbf {X}_{-i})^{-1}\mathbf {X}_{-i}^{t}\mathbf {x}_{i} \\&- \mathbf {x}_{i}^{t}\mathbf {X}_{-i} (\mathbf {X}^{t}_{-i}\mathbf {X}_{-i} + \lambda \mathbf {I}_{p-1})^{-1} \mathbf {X}^{t}_{-i}\mathbf {x}_{i} \\= & {} SSR_{i} + \lambda + \mathbf {x}_{i}^{t}\mathbf {X}_{-i} [( \mathbf {X}_{-i}^{t}\mathbf {X}_{-i})^{-1} - ( \mathbf {X}_{-i}^{t}\mathbf {X}_{-i} +\lambda \mathbf {I}_{p-1})^{-1}] \mathbf {X}_{-i}^{t}\mathbf {x}_{i} \\= & {} SSR_{i} + \lambda + a_{i}(\lambda ), \end{aligned}$$

where $SSR_{i}$ denotes the sum of squares residuals of the auxiliary regression (8).

Then, because $\mathbf {x}_{i}^{t} \mathbf {x}_{i} = SST_{i} + n \overline{\mathbf {x}}_{i}^{2}$, where $SST_{i}$ denotes the sum of squares totals of the auxiliary regression (8), we obtain:

$$\begin{aligned} k_{i}^{2} (\lambda )= & {} \frac{SST_{i} + n \overline{\mathbf {x}}_{i}^{2} + \lambda }{SSR_{i} + \lambda + a_{i}(\lambda )} = \frac{\frac{SST_{i}}{SSR_{i}}+\frac{n\overline{\mathbf {x}}_{i}^{2}}{SSR_{i}} + \frac{\lambda }{SSR_{i}}}{1 + \frac{\lambda + a_{i}(\lambda )}{SSR_{i}}}\nonumber \\= & {} \frac{1}{1 + b_{i}(\lambda )} \left( VIF_{i} + \frac{n\overline{\mathbf {x}}_{i}^{2}}{SSR_{i}} + \frac{\lambda }{SSR_{i}}\right) = \frac{1}{1 + b_{i}(\lambda )} \left( k_{i}^{2}+ \frac{\lambda }{SSR_{i}}\right) . \end{aligned}$$

(16)

$\square $

Proposition 5

For standardized data, $k_{i}^{2}(\lambda )$ coincide with $VIF(i,\lambda )$.

Proof

On the one hand, expression (14) can be expressed as:

$$\begin{aligned} k_{i}^{2}(\lambda ) = \frac{1}{1 - \frac{\mathbf {x}_{i}^{t} \mathbf {X}_{-i} \left( \mathbf {X}_{-i}^{t}\mathbf {X}_{-i} + \lambda \mathbf {I}_{p-1} \right) ^{-1} \mathbf {X}_{-i}^{t}\mathbf {x}_{i}}{\mathbf {x}_{i}^{t}\mathbf {x}_{i} + \lambda }}, \end{aligned}$$

(17)

and, on the other hand, starting from the augmented model (4), for $\widetilde{\mathbf {x}}_{i} = \widetilde{\mathbf {X}}_{-i} \varvec{\delta }+\mathbf {v}$, the following sum of squares explained and total are obtained:

$$\begin{aligned} \widetilde{SSE}= & {} \widehat{\varvec{\delta }}^{t} \widetilde{\mathbf {X}}_{-i} \widetilde{\mathbf {x}}_{i} - (n+p) \overline{\widetilde{\mathbf {x}}}_{i}^{2}\\= & {} \mathbf {x}_{i}^{t} \mathbf {X}_{-i} \left( \mathbf {X}_{-i}^{t}\mathbf {X}_{-i} + \lambda \mathbf {I}_{p-1} \right) ^{-1} \mathbf {X}_{-i}^{t}\mathbf {x}_{i} - (n+p) \overline{\widetilde{\mathbf {x}}}_{i}^{2}, \\ \widetilde{SST}= & {} \widetilde{\mathbf {x}}_{i}^{t} \widetilde{\mathbf {x}}_{i} - (n+p) \overline{\widetilde{\mathbf {x}}}_{i}^{2} = \mathbf {x}_{i}^{t} \mathbf {x}_{i} + \lambda - (n+p) \overline{\widetilde{\mathbf {x}}}_{i}^{2}. \end{aligned}$$

Then, due to:

$$\begin{aligned} VIF(i,\lambda ) = \frac{1}{1 - \frac{\widetilde{SSE}}{\widetilde{SST}}}, \end{aligned}$$

(18)

it can be concluded that expressions (17) and (18) coincide if $\overline{\widetilde{\mathbf {x}}}_{i} = 0$. Thus, because $\overline{\widetilde{\mathbf {x}}}_{i} = \frac{n \overline{\mathbf {x}}_{i} + \sqrt{\lambda }}{n+p}$, to be equal to zero, it is necessary to verify that $\overline{\mathbf {x}}_{i} = 0 = \lambda $. That is, the ridge regression coincides with OLS.

Alternatively, the matrix $\widetilde{\mathbf {X}}$ can be standardized, since in that case, all the implied variables have a mean equal to zero. This possibility is in agreement with the conclusion presented in García et al. (2016), where it was established that standardized data must be used for a correct application of the VIF in ridge regression. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sánchez, A.R., Gómez, R.S. & García, C.G. Obtaining a threshold for the stewart index and its extension to ridge regression. Comput Stat 36, 1011–1029 (2021). https://doi.org/10.1007/s00180-020-01047-2

Download citation

Received: 24 April 2020
Accepted: 07 November 2020
Published: 20 November 2020
Issue Date: June 2021
DOI: https://doi.org/10.1007/s00180-020-01047-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Obtaining a threshold for the stewart index and its extension to ridge regression

Abstract

Access this article

Similar content being viewed by others

Transformation of variables and the condition number in ridge estimation

Recent results in ridge regression methods

Ridge-Type MML Estimator in the Linear Regression Model

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Main properties of \(k_{i}^{2} (\lambda )\)

Proposition 1

Proof

Proposition 2

Proof

Proposition 3

Proof

Proposition 4

Proof

Proposition 5

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Obtaining a threshold for the stewart index and its extension to ridge regression

Abstract

Access this article

Similar content being viewed by others

Transformation of variables and the condition number in ridge estimation

Recent results in ridge regression methods

Ridge-Type MML Estimator in the Linear Regression Model

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Main properties of \(k_{i}^{2} (\lambda )\)

Main properties of \(k_{i}^{2} (\lambda )\)

Proposition 1

Proof

Proposition 2

Proof

Proposition 3

Proof

Proposition 4

Proof

Proposition 5

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation