Skip to main content

A Fast Optimization Method for Additive Model via Partial Generalized Ridge Regression

  • Conference paper
  • First Online:
  • 444 Accesses

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 193))

Abstract

Although many statistical tools for analyzing the additive model apply the back-fitting algorithm, it is well-known that it is not guaranteed that the algorithm will converge or obtain a unique solution. Furthermore, running the algorithm on large datasets is computationally challenging. In order to address these issues, we propose a new optimization method for the additive model via partial generalized ridge regression. Using the proposed method, all trends of the additive model are estimated simultaneously, and closed-form expressions for the smoothing parameters that minimize the GCV criterion are derived. From a numerical study, compared to the back-fitting algorithm, our new method achieved higher performance with respect to both predictive accuracy and computational efficiency.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Bartolino, V., Colloca, F., Sartor, P., Ardizzone, G.: Modelling recruitment dynamics of hake, merluccius merluccius, in the central mediterranean in relation to key environmental variables. Fish. Res. 92, 277–288 (2008). https://doi.org/10.1016/j.fishres.2008.01.007

    Article  Google Scholar 

  2. Craven, P., Wahba, G.: Smoothing noisy data with spline functions: estimating the correct degree of smoothing by the method of generalized cross-validation. Numer. Math. 31, 377–403 (1979). https://doi.org/10.1007/BF01404567

    Article  MathSciNet  MATH  Google Scholar 

  3. Hastie, T., Friedman, J., Tibshirani, R.: The elements of statistical learning. Springer, New York (2001)

    Book  Google Scholar 

  4. Hastie, T., Tibshirani, R.: Generalized additive models. Chapman & Hall, London (1990)

    MATH  Google Scholar 

  5. Huang, L.S., Cox, C., Myers, G.J., Davidson, P.W., Cernichiari, E., Shamlaye, C.F., Sloane-Reeves, J., Clarkson, T.W.: Exploring nonlinear association between prenatal methylmercury exposure from fish consumption and child development: evaluation of the seychelles child development study nine-year data using semiparametric additive models. Environ. Res. 97, 100–108 (2005). https://doi.org/10.1016/j.envres.2004.05.004

    Article  Google Scholar 

  6. Mallows, C.L.: Some comments on \(C_p\). Technometrics 15, 661–675 (1973). https://doi.org/10.2307/1267380

    Article  MATH  Google Scholar 

  7. Stone, M.: Cross-validatory choice and assessment of statistical predictions. J. Roy. Statist. Soc. Ser. B 36, 111–147 (1974). https://doi.org/10.1111/j.2517-6161.1974.tb00994.x

    Article  MathSciNet  MATH  Google Scholar 

  8. Wand, M.: A comparison of regression spline smoothing procedures. Comput. Statist. 15, 443–462 (2000). https://doi.org/10.1007/s001800000047

    Article  MathSciNet  MATH  Google Scholar 

  9. Wood, S.N.: Stable and efficient multiple smoothing parameter estimation for generalized additive models. J. Amer. Statist. Assoc. 99, 673–686 (2004). https://doi.org/10.1198/016214504000000980

    Article  MathSciNet  MATH  Google Scholar 

  10. Wood, S.N.: Fast stable direct fitting and smoothness selection for generalized additive models. J. R. Stat. Soc. Ser. B. Stat. Methodol. 70, 495–518 (2008). https://doi.org/10.1111/j.1467-9868.2007.00646.x

    Article  MathSciNet  MATH  Google Scholar 

  11. Yanagihara, H.: A non-iterative optimization method for smoothness in penalized spline regression. Stat. Comput. 22, 527–544 (2012). https://doi.org/10.1007/s11222-011-9245-0

    Article  MathSciNet  MATH  Google Scholar 

  12. Yanagihara, H.: Explicit solution to the minimization problem of generalized cross-validation criterion for selecting ridge parameters in generalized ridge regression. Hiroshima Math. J. 48, 203–222 (2018). 10.32917/hmj/1533088835

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Keisuke Fukui .

Editor information

Editors and Affiliations

Appendix: The Proof of Equation (13)

Appendix: The Proof of Equation (13)

A singular value decomposition of \((\varvec{I}_n - \varvec{P}_{{\varvec{W}}_1}) {\varvec{W}}_2\) is expressed as

$$\begin{aligned} (\varvec{I}_n - \varvec{P}_{{\varvec{W}}_1}) {\varvec{W}}_2 = \varvec{G} \left( \begin{array}{c} \varvec{D}^{1/2}\\ \varvec{O}_{n-k,k} \end{array}\right) \varvec{Q}' = {\varvec{G}}_1 {\varvec{D}}^{1/2} {\varvec{Q}}', \end{aligned}$$
(14)

where \(\varvec{G}\) is an \(n\times n\) orthogonal matrix and \({\varvec{G}}_1\) is an \(n \times k\) matrix derived from the partition \({\varvec{G}}= ({\varvec{G}}_1, {\varvec{G}}_2)\). Note that \({\varvec{D}}^{1/2}\) is diagonal matrix. From Eqs. (8) and (14), we can see that

$$\begin{aligned} {\varvec{H}}_{{\varvec{\Lambda }}}&= \varvec{P}_{{\varvec{W}}_1} + {\varvec{G}}\left( \begin{array}{cc} (\varvec{D} + \varvec{\Lambda })^{-1} \varvec{D} &{}\varvec{O}_{k,n-k} \\ \varvec{O}_{n-k,k} &{}\varvec{O}_{n-k,n-k} \end{array}\right) {\varvec{G}}'. \end{aligned}$$

Hence, \(\mathrm{tr}({\varvec{H}}_{{\varvec{\Lambda }}})\) can be calculated as

$$\begin{aligned} \mathrm{tr}({\varvec{H}}_{{\varvec{\Lambda }}}) = \mathrm{tr}({\varvec{P}}_{{\varvec{W}}1}) + \mathrm{tr}\{({\varvec{D}}+{\varvec{\Lambda }})^{-1} {\varvec{D}}\} = 3p + k + 1 - \sum _{j=1}^k \frac{\lambda _j}{d_j + \lambda _j}. \end{aligned}$$
(15)

Notice that \({\varvec{G}}_1 = ({\varvec{I}}_n - {\varvec{P}}_{{\varvec{W}}_1}) {\varvec{W}}_2 {\varvec{Q}}{\varvec{D}}^{-1/2}\) from (14), and \(\varvec{I}_n = \varvec{G} \varvec{G}' = \varvec{G}_1 \varvec{G}_1'+\varvec{G}_2 \varvec{G}_2' \). Hence, we have \(\varvec{G}_1 ' (\varvec{I}_n - {\varvec{P}}_{{\varvec{W}}_1}){\varvec{y}}= {\varvec{z}}\) and

$$\begin{aligned}&\quad ~ {\varvec{y}}'(\varvec{I}_n - \varvec{P}_{{\varvec{W}}_1})\varvec{G}_2 \varvec{G}_2'(\varvec{I}_n - \varvec{P}_{{\varvec{W}}_1}){\varvec{y}}\nonumber \\&= {\varvec{y}}'(\varvec{I}_n - \varvec{P}_{{\varvec{W}}_1})(\varvec{I}_n - \varvec{G}_1 \varvec{G}_1')(\varvec{I}_n - \varvec{P}_{{\varvec{W}}_1}){\varvec{y}}\nonumber \\&= {\varvec{y}}'\left\{ \varvec{I}_n - \varvec{P}_{{\varvec{W}}_1}- (\varvec{I}_n - \varvec{P}_{{\varvec{W}}_1}) {\varvec{W}}_2 \varvec{Q} \varvec{D}^{-1}\varvec{Q}' {\varvec{W}}_2'(\varvec{I}_n - \varvec{P}_{{\varvec{W}}_1})\right\} {\varvec{y}}\nonumber \\&= {\varvec{y}}' ({\varvec{I}}_n - {\varvec{H}}){\varvec{y}}= (n - 3p - k - 1) s_0^2. \end{aligned}$$

Using the above results and noting \(P_{W_1}P_1=O_{n,k}\), we can derive the following equation:

$$\begin{aligned}&\quad ~ \Vert (\varvec{I}_n-\varvec{H}_{\varvec{\Lambda }}) \varvec{y}\Vert ^2 \nonumber \\&= \varvec{y} '(\varvec{I}_n - \varvec{H}_{\varvec{\Lambda }}) ^2 \varvec{y} \nonumber \\&= \varvec{y}'(\varvec{I}_n - \varvec{P}_{{\varvec{W}}_1}) {\varvec{G}}\left\{ \varvec{I}_n - \left( \begin{array}{cc} (\varvec{D} + \varvec{\Lambda })^{-1} \varvec{D} &{}\varvec{O}_{k,n-k}\\ \varvec{O}_{n-k,k} &{}\varvec{O}_{n-k,n-k} \end{array} \right) \right\} ^2 {\varvec{G}}' (\varvec{I}_n - \varvec{P}_{{\varvec{W}}_1})\varvec{y} \nonumber \\&= {\varvec{z}}'\{ \varvec{I}_k - (\varvec{D} + \varvec{\Lambda })^{-1} \varvec{D} \}^2 {\varvec{z}}+ (n - 3p - k-1) s_0^2 \nonumber \\&= (n-3p - k-1) s_0^2+ \sum _{j=1}^k \left( \frac{\lambda _j}{\lambda _j + d_j}z_j\right) ^2. \end{aligned}$$

By substituting (15) and above result into (9), we can obtain

$$\begin{aligned} \mathrm{GCV}({\varvec{\Lambda }}) = \dfrac{(n - 3p - k - 1) s_0^2 + \sum _{j=1}^{k} \{ \lambda _j z_j / (\lambda _j + d_j) \}^2 }{n [1 - \{ 3p + k + 1 -\sum _{j=1}^k \lambda _j / (\lambda _j + d_j) \}/n ] ^2 } . \end{aligned}$$
(16)

Let \({\varvec{\delta }}= (\delta _1, \dots , \delta _k)'\) be a k-dimensional vector defined as

$$ \delta _j = \frac{\lambda _j}{d_j + \lambda _j}. $$

Note that \(\delta _j \in [0,1]\) since \(d_j \ge 0\) and \(\lambda _j \ge 0\). Then, the GCV criterion in (16) is expressed as the following function with respect to \({\varvec{\delta }}\):

$$\begin{aligned} \mathrm{GCV}({\varvec{\Lambda }}) = f ({\varvec{\delta }}) = \frac{r({\varvec{\delta }})}{c({\varvec{\delta }})^2}, \end{aligned}$$
(17)

where the functions \(r({\varvec{\delta }})\) and \(c({\varvec{\delta }})\) are given by

$$\begin{aligned} r({\varvec{\delta }}) = \frac{(n-3p - k-1) s_0^2 + \sum _{j=1}^k z_j^2 \delta _j^2 }{n}, \ c({\varvec{\delta }}) = 1- \frac{1}{n}\left( 3p + k + 1 - \sum _{j=1}^k \delta _j\right) . \end{aligned}$$

Here, \(z_1,\dots ,z_k\) are given in (10). Let \(\hat{{\varvec{\delta }}} = (\hat{\delta }_1, \dots , \hat{\delta }_k)'\) be the minimizer of \(f({\varvec{\delta }})\) in (17), i.e.,

$$ \hat{{\varvec{\delta }}} = \arg \min _{{\varvec{\delta }}\in [0,1]^k} f({\varvec{\delta }}), $$

where \([0,1]^k\) is the kth Cartesian power of the set [0, 1]. Notice that \(r(\delta )\) and \(c(\delta )\) are differentiable functions with respect to \(\delta _j\). Thus, we obtain

$$ \frac{\partial }{\partial \delta _j} f({\varvec{\delta }}) = \frac{2}{nc({\varvec{\delta }})^3}\left\{ c({\varvec{\delta }}) z_j^2 \delta _j - r({\varvec{\delta }})\right\} . $$

Hence, noting \(c(\delta )\) is a finite function, we find a necessary condition for \(\hat{{\varvec{\delta }}}\) as

$$\begin{aligned} \hat{\delta }_j = \left\{ \begin{array}{ll} 1 &{} (h(\hat{{\varvec{\delta }}}) \ge z_j^2) \\ h(\hat{{\varvec{\delta }}})/z_j^2 &{}(h(\hat{{\varvec{\delta }}}) < z_j^2) \end{array} \right. , \end{aligned}$$

where \(h({\varvec{\delta }}) = r({\varvec{\delta }})/c({\varvec{\delta }}) > 0\).

On the other hand, let \({\varvec{H}}= \{ {\varvec{\delta }}\in [0, 1]^k| {\varvec{\delta }}= {\varvec{\delta }}^{\star } (h), \forall h \in \mathbb {R}_+ \}\), where \({\varvec{\delta }}^\star (h)\) is the k-dimensional vector for which the jth element is defined as

$$ \delta _j^\star (h) = \left\{ \begin{array}{ll} 1 &{} (h \ge z_j^2) \\ h /z_j^2 &{}(h < z_j^2) \end{array} \right. , $$

and \(\mathbb {R}_+\) is a set of nonnegative real numbers. Then, it follows from \(\mathcal {H}\subseteq [0, 1]^k\) and \(\hat{{\varvec{\delta }}} \in \mathcal {H}\) that

$$ f (\hat{{\varvec{\delta }}}) = \min _{{\varvec{\delta }}\in [0, 1]^k} f ({\varvec{\delta }}) \le \min _{{\varvec{\delta }}\in \mathcal {H}} f ({\varvec{\delta }}) = \min _{h \in \mathbb {R}_+} f ({\varvec{\delta }}^\star (h)), \quad f (\hat{{\varvec{\delta }}}) \ge \min _{h \in \mathbb {R}_+} f ({\varvec{\delta }}^\star (h)). $$

Hence, we have

$$\begin{aligned} \hat{{\varvec{\delta }}} = {\varvec{\delta }}^\star (\hat{h})\quad \left( \hat{h} = \arg \min _{h \in \mathbb {R}_+} f ({\varvec{\delta }}^\star (h)) \right) . \end{aligned}$$

Using this result, we minimize the following function:

$$\begin{aligned} f ({\varvec{\delta }}^\star (h)) = f_1 (h) = \dfrac{ r_1 (h) }{ c_1 (h)^2 }, \end{aligned}$$

where \(r_1(h)=r({\varvec{\delta }}^\star (h))\) and \(c_1(h)=c({\varvec{\delta }}^\star (h))\), which can be calculated as

$$\begin{aligned} r_1 (h)&= \frac{1}{n} \left[ (n-3p - k-1) s_0^2 + \sum _{j=1}^k \left\{ I (h<z_j^2 ) \left( \dfrac{h}{z_j^2} - 1 \right) + 1 \right\} ^2 z_j^2 \right] , \\ c_1 (h)&= 1- \frac{1}{n} \left[ 3p + k + 1 - \sum _{j=1}^k \left\{ I (h <z_j^2 ) \left( \dfrac{h}{z_j^2} - 1 \right) + 1 \right\} \right] . \end{aligned}$$

Suppose that \(h \in R_a\), where \(a \in \{0,1,\ldots ,k\}\) and \(R_a\) is a range defined by (12). Notice that \(u_j\) are the jth-order statistics of \(z_1^2, \ldots , z_k^2\). Then, we have

$$ f_1 (h) = f_{1a} (h) = \dfrac{ r_{1a} (h) }{ c_{1a} (h)^2 }\quad (h \in R_a), $$

where functions \(r_{1a}(h)\) and \(c_{1a}(h)\) are given by

$$\begin{aligned} r_1 (h)&= r_{1a} (h) = \frac{1}{n} \left\{ (n-3p - k-1+a) s_a^2 + \ell _a h^2 \right\} , \\ c_1 (h)&= c_{1a} (h) = 1 - \frac{1}{n} ( 3p + k + 1 - a - \ell _a h ), \end{aligned}$$

where \(s_a^2\) is given in (11) and \(\ell _a = \sum _{j=a+1}^k 1/u_j\). By simple calculation, let \(g_a (h) = (n - 3p - k - 1 + a) (h - s_a^2)\), we can obtain

$$\begin{aligned} \dfrac{d}{dh} f_{1a} (h)&= \dfrac{2 \ell _a}{n^2 c_{1a} (h)^3} g_a (h). \end{aligned}$$

Here, we note \(g_a (u_{a+1})= g_{a+1} (u_{a+1}) \quad (a = 0, \dots , k-1).\) Moreover, \(2 \ell _a/\{n^2 c_{1a} (h)^3\}\) is positive, \(\lim _{h \rightarrow 0} g_0 (h) < 0\), and \(g_a (h)\ (h \in R_a)\) is a monotonically increasing function in \(h \in \mathbb {R}_+\), since \(n-3p-k-1>0\) and \(a \ge 0\). Consequently, the Eq. (13) is obtained by combining \(\hat{\delta }_j\) and \(h^*=s_a^2\) where \(d f_{1a}(h)/d h|_{h=h^*}=0\), and some calculation.

Rights and permissions

Reprints and permissions

Copyright information

© 2020 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Fukui, K., Ohishi, M., Yamamura, M., Yanagihara, H. (2020). A Fast Optimization Method for Additive Model via Partial Generalized Ridge Regression. In: Czarnowski, I., Howlett, R., Jain, L. (eds) Intelligent Decision Technologies. IDT 2020. Smart Innovation, Systems and Technologies, vol 193. Springer, Singapore. https://doi.org/10.1007/978-981-15-5925-9_24

Download citation

Publish with us

Policies and ethics