A Fast Optimization Method for Additive Model via Partial Generalized Ridge Regression

Fukui, Keisuke; Ohishi, Mineaki; Yamamura, Mariko; Yanagihara, Hirokazu

doi:10.1007/978-981-15-5925-9_24

A Fast Optimization Method for Additive Model via Partial Generalized Ridge Regression

Keisuke Fukui⁶,
Mineaki Ohishi⁷,
Mariko Yamamura⁸ &
…
Hirokazu Yanagihara⁷

Conference paper
First Online: 12 June 2020

444 Accesses

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 193))

Abstract

Although many statistical tools for analyzing the additive model apply the back-fitting algorithm, it is well-known that it is not guaranteed that the algorithm will converge or obtain a unique solution. Furthermore, running the algorithm on large datasets is computationally challenging. In order to address these issues, we propose a new optimization method for the additive model via partial generalized ridge regression. Using the proposed method, all trends of the additive model are estimated simultaneously, and closed-form expressions for the smoothing parameters that minimize the GCV criterion are derived. From a numerical study, compared to the back-fitting algorithm, our new method achieved higher performance with respect to both predictive accuracy and computational efficiency.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Bartolino, V., Colloca, F., Sartor, P., Ardizzone, G.: Modelling recruitment dynamics of hake, merluccius merluccius, in the central mediterranean in relation to key environmental variables. Fish. Res. 92, 277–288 (2008). https://doi.org/10.1016/j.fishres.2008.01.007
Article Google Scholar
Craven, P., Wahba, G.: Smoothing noisy data with spline functions: estimating the correct degree of smoothing by the method of generalized cross-validation. Numer. Math. 31, 377–403 (1979). https://doi.org/10.1007/BF01404567
Article MathSciNet MATH Google Scholar
Hastie, T., Friedman, J., Tibshirani, R.: The elements of statistical learning. Springer, New York (2001)
Book Google Scholar
Hastie, T., Tibshirani, R.: Generalized additive models. Chapman & Hall, London (1990)
MATH Google Scholar
Huang, L.S., Cox, C., Myers, G.J., Davidson, P.W., Cernichiari, E., Shamlaye, C.F., Sloane-Reeves, J., Clarkson, T.W.: Exploring nonlinear association between prenatal methylmercury exposure from fish consumption and child development: evaluation of the seychelles child development study nine-year data using semiparametric additive models. Environ. Res. 97, 100–108 (2005). https://doi.org/10.1016/j.envres.2004.05.004
Article Google Scholar
Mallows, C.L.: Some comments on $C_p$. Technometrics 15, 661–675 (1973). https://doi.org/10.2307/1267380
Article MATH Google Scholar
Stone, M.: Cross-validatory choice and assessment of statistical predictions. J. Roy. Statist. Soc. Ser. B 36, 111–147 (1974). https://doi.org/10.1111/j.2517-6161.1974.tb00994.x
Article MathSciNet MATH Google Scholar
Wand, M.: A comparison of regression spline smoothing procedures. Comput. Statist. 15, 443–462 (2000). https://doi.org/10.1007/s001800000047
Article MathSciNet MATH Google Scholar
Wood, S.N.: Stable and efficient multiple smoothing parameter estimation for generalized additive models. J. Amer. Statist. Assoc. 99, 673–686 (2004). https://doi.org/10.1198/016214504000000980
Article MathSciNet MATH Google Scholar
Wood, S.N.: Fast stable direct fitting and smoothness selection for generalized additive models. J. R. Stat. Soc. Ser. B. Stat. Methodol. 70, 495–518 (2008). https://doi.org/10.1111/j.1467-9868.2007.00646.x
Article MathSciNet MATH Google Scholar
Yanagihara, H.: A non-iterative optimization method for smoothness in penalized spline regression. Stat. Comput. 22, 527–544 (2012). https://doi.org/10.1007/s11222-011-9245-0
Article MathSciNet MATH Google Scholar
Yanagihara, H.: Explicit solution to the minimization problem of generalized cross-validation criterion for selecting ridge parameters in generalized ridge regression. Hiroshima Math. J. 48, 203–222 (2018). 10.32917/hmj/1533088835
Google Scholar

Download references

Author information

Authors and Affiliations

Osaka Medical College, Osaka, 569-8686, Japan
Keisuke Fukui
Hiroshima University, Hiroshima, 739-8526, Japan
Mineaki Ohishi & Hirokazu Yanagihara
Radiation Effects Research Foundation, Hiroshima, 732-0815, Japan
Mariko Yamamura

Authors

Keisuke Fukui
View author publications
You can also search for this author in PubMed Google Scholar
Mineaki Ohishi
View author publications
You can also search for this author in PubMed Google Scholar
Mariko Yamamura
View author publications
You can also search for this author in PubMed Google Scholar
Hirokazu Yanagihara
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Keisuke Fukui .

Editor information

Editors and Affiliations

Gdynia Maritime University, Gdynia, Poland
Ireneusz Czarnowski
KES International Research, UK
Robert J. Howlett
Faculty of Engineering and Information Technology, Centre for Artificial Intelligence, University of Technology Sydney, Sydney, NSW, Australia
Lakhmi C. Jain

Appendix: The Proof of Equation (13)

A singular value decomposition of $(\varvec{I}_n - \varvec{P}_{{\varvec{W}}_1}) {\varvec{W}}_2$ is expressed as

$$\begin{aligned} (\varvec{I}_n - \varvec{P}_{{\varvec{W}}_1}) {\varvec{W}}_2 = \varvec{G} \left( \begin{array}{c} \varvec{D}^{1/2}\\ \varvec{O}_{n-k,k} \end{array}\right) \varvec{Q}' = {\varvec{G}}_1 {\varvec{D}}^{1/2} {\varvec{Q}}', \end{aligned}$$

(14)

where $\varvec{G}$ is an $n\times n$ orthogonal matrix and ${\varvec{G}}_1$ is an $n \times k$ matrix derived from the partition ${\varvec{G}}= ({\varvec{G}}_1, {\varvec{G}}_2)$. Note that ${\varvec{D}}^{1/2}$ is diagonal matrix. From Eqs. (8) and (14), we can see that

$$\begin{aligned} {\varvec{H}}_{{\varvec{\Lambda }}}&= \varvec{P}_{{\varvec{W}}_1} + {\varvec{G}}\left( \begin{array}{cc} (\varvec{D} + \varvec{\Lambda })^{-1} \varvec{D} &{}\varvec{O}_{k,n-k} \\ \varvec{O}_{n-k,k} &{}\varvec{O}_{n-k,n-k} \end{array}\right) {\varvec{G}}'. \end{aligned}$$

Hence, $\mathrm{tr}({\varvec{H}}_{{\varvec{\Lambda }}})$ can be calculated as

$$\begin{aligned} \mathrm{tr}({\varvec{H}}_{{\varvec{\Lambda }}}) = \mathrm{tr}({\varvec{P}}_{{\varvec{W}}1}) + \mathrm{tr}\{({\varvec{D}}+{\varvec{\Lambda }})^{-1} {\varvec{D}}\} = 3p + k + 1 - \sum _{j=1}^k \frac{\lambda _j}{d_j + \lambda _j}. \end{aligned}$$

(15)

Notice that ${\varvec{G}}_1 = ({\varvec{I}}_n - {\varvec{P}}_{{\varvec{W}}_1}) {\varvec{W}}_2 {\varvec{Q}}{\varvec{D}}^{-1/2}$ from (14), and $\varvec{I}_n = \varvec{G} \varvec{G}' = \varvec{G}_1 \varvec{G}_1'+\varvec{G}_2 \varvec{G}_2' $. Hence, we have $\varvec{G}_1 ' (\varvec{I}_n - {\varvec{P}}_{{\varvec{W}}_1}){\varvec{y}}= {\varvec{z}}$ and

$$\begin{aligned}&\quad ~ {\varvec{y}}'(\varvec{I}_n - \varvec{P}_{{\varvec{W}}_1})\varvec{G}_2 \varvec{G}_2'(\varvec{I}_n - \varvec{P}_{{\varvec{W}}_1}){\varvec{y}}\nonumber \\&= {\varvec{y}}'(\varvec{I}_n - \varvec{P}_{{\varvec{W}}_1})(\varvec{I}_n - \varvec{G}_1 \varvec{G}_1')(\varvec{I}_n - \varvec{P}_{{\varvec{W}}_1}){\varvec{y}}\nonumber \\&= {\varvec{y}}'\left\{ \varvec{I}_n - \varvec{P}_{{\varvec{W}}_1}- (\varvec{I}_n - \varvec{P}_{{\varvec{W}}_1}) {\varvec{W}}_2 \varvec{Q} \varvec{D}^{-1}\varvec{Q}' {\varvec{W}}_2'(\varvec{I}_n - \varvec{P}_{{\varvec{W}}_1})\right\} {\varvec{y}}\nonumber \\&= {\varvec{y}}' ({\varvec{I}}_n - {\varvec{H}}){\varvec{y}}= (n - 3p - k - 1) s_0^2. \end{aligned}$$

Using the above results and noting $P_{W_1}P_1=O_{n,k}$, we can derive the following equation:

$$\begin{aligned}&\quad ~ \Vert (\varvec{I}_n-\varvec{H}_{\varvec{\Lambda }}) \varvec{y}\Vert ^2 \nonumber \\&= \varvec{y} '(\varvec{I}_n - \varvec{H}_{\varvec{\Lambda }}) ^2 \varvec{y} \nonumber \\&= \varvec{y}'(\varvec{I}_n - \varvec{P}_{{\varvec{W}}_1}) {\varvec{G}}\left\{ \varvec{I}_n - \left( \begin{array}{cc} (\varvec{D} + \varvec{\Lambda })^{-1} \varvec{D} &{}\varvec{O}_{k,n-k}\\ \varvec{O}_{n-k,k} &{}\varvec{O}_{n-k,n-k} \end{array} \right) \right\} ^2 {\varvec{G}}' (\varvec{I}_n - \varvec{P}_{{\varvec{W}}_1})\varvec{y} \nonumber \\&= {\varvec{z}}'\{ \varvec{I}_k - (\varvec{D} + \varvec{\Lambda })^{-1} \varvec{D} \}^2 {\varvec{z}}+ (n - 3p - k-1) s_0^2 \nonumber \\&= (n-3p - k-1) s_0^2+ \sum _{j=1}^k \left( \frac{\lambda _j}{\lambda _j + d_j}z_j\right) ^2. \end{aligned}$$

By substituting (15) and above result into (9), we can obtain

$$\begin{aligned} \mathrm{GCV}({\varvec{\Lambda }}) = \dfrac{(n - 3p - k - 1) s_0^2 + \sum _{j=1}^{k} \{ \lambda _j z_j / (\lambda _j + d_j) \}^2 }{n [1 - \{ 3p + k + 1 -\sum _{j=1}^k \lambda _j / (\lambda _j + d_j) \}/n ] ^2 } . \end{aligned}$$

(16)

Let ${\varvec{\delta }}= (\delta _1, \dots , \delta _k)'$ be a k-dimensional vector defined as

$$ \delta _j = \frac{\lambda _j}{d_j + \lambda _j}. $$

Note that $\delta _j \in [0,1]$ since $d_j \ge 0$ and $\lambda _j \ge 0$. Then, the GCV criterion in (16) is expressed as the following function with respect to ${\varvec{\delta }}$:

$$\begin{aligned} \mathrm{GCV}({\varvec{\Lambda }}) = f ({\varvec{\delta }}) = \frac{r({\varvec{\delta }})}{c({\varvec{\delta }})^2}, \end{aligned}$$

(17)

where the functions $r({\varvec{\delta }})$ and $c({\varvec{\delta }})$ are given by

$$\begin{aligned} r({\varvec{\delta }}) = \frac{(n-3p - k-1) s_0^2 + \sum _{j=1}^k z_j^2 \delta _j^2 }{n}, \ c({\varvec{\delta }}) = 1- \frac{1}{n}\left( 3p + k + 1 - \sum _{j=1}^k \delta _j\right) . \end{aligned}$$

Here, $z_1,\dots ,z_k$ are given in (10). Let $\hat{{\varvec{\delta }}} = (\hat{\delta }_1, \dots , \hat{\delta }_k)'$ be the minimizer of $f({\varvec{\delta }})$ in (17), i.e.,

$$ \hat{{\varvec{\delta }}} = \arg \min _{{\varvec{\delta }}\in [0,1]^k} f({\varvec{\delta }}), $$

where $[0,1]^k$ is the kth Cartesian power of the set [0, 1]. Notice that $r(\delta )$ and $c(\delta )$ are differentiable functions with respect to $\delta _j$. Thus, we obtain

$$ \frac{\partial }{\partial \delta _j} f({\varvec{\delta }}) = \frac{2}{nc({\varvec{\delta }})^3}\left\{ c({\varvec{\delta }}) z_j^2 \delta _j - r({\varvec{\delta }})\right\} . $$

Hence, noting $c(\delta )$ is a finite function, we find a necessary condition for $\hat{{\varvec{\delta }}}$ as

$$\begin{aligned} \hat{\delta }_j = \left\{ \begin{array}{ll} 1 &{} (h(\hat{{\varvec{\delta }}}) \ge z_j^2) \\ h(\hat{{\varvec{\delta }}})/z_j^2 &{}(h(\hat{{\varvec{\delta }}}) < z_j^2) \end{array} \right. , \end{aligned}$$

where $h({\varvec{\delta }}) = r({\varvec{\delta }})/c({\varvec{\delta }}) > 0$.

On the other hand, let ${\varvec{H}}= \{ {\varvec{\delta }}\in [0, 1]^k| {\varvec{\delta }}= {\varvec{\delta }}^{\star } (h), \forall h \in \mathbb {R}_+ \}$, where ${\varvec{\delta }}^\star (h)$ is the k-dimensional vector for which the jth element is defined as

$$ \delta _j^\star (h) = \left\{ \begin{array}{ll} 1 &{} (h \ge z_j^2) \\ h /z_j^2 &{}(h < z_j^2) \end{array} \right. , $$

and $\mathbb {R}_+$ is a set of nonnegative real numbers. Then, it follows from $\mathcal {H}\subseteq [0, 1]^k$ and $\hat{{\varvec{\delta }}} \in \mathcal {H}$ that

$$ f (\hat{{\varvec{\delta }}}) = \min _{{\varvec{\delta }}\in [0, 1]^k} f ({\varvec{\delta }}) \le \min _{{\varvec{\delta }}\in \mathcal {H}} f ({\varvec{\delta }}) = \min _{h \in \mathbb {R}_+} f ({\varvec{\delta }}^\star (h)), \quad f (\hat{{\varvec{\delta }}}) \ge \min _{h \in \mathbb {R}_+} f ({\varvec{\delta }}^\star (h)). $$

Hence, we have

$$\begin{aligned} \hat{{\varvec{\delta }}} = {\varvec{\delta }}^\star (\hat{h})\quad \left( \hat{h} = \arg \min _{h \in \mathbb {R}_+} f ({\varvec{\delta }}^\star (h)) \right) . \end{aligned}$$

Using this result, we minimize the following function:

$$\begin{aligned} f ({\varvec{\delta }}^\star (h)) = f_1 (h) = \dfrac{ r_1 (h) }{ c_1 (h)^2 }, \end{aligned}$$

where $r_1(h)=r({\varvec{\delta }}^\star (h))$ and $c_1(h)=c({\varvec{\delta }}^\star (h))$, which can be calculated as

$$\begin{aligned} r_1 (h)&= \frac{1}{n} \left[ (n-3p - k-1) s_0^2 + \sum _{j=1}^k \left\{ I (h<z_j^2 ) \left( \dfrac{h}{z_j^2} - 1 \right) + 1 \right\} ^2 z_j^2 \right] , \\ c_1 (h)&= 1- \frac{1}{n} \left[ 3p + k + 1 - \sum _{j=1}^k \left\{ I (h <z_j^2 ) \left( \dfrac{h}{z_j^2} - 1 \right) + 1 \right\} \right] . \end{aligned}$$

Suppose that $h \in R_a$, where $a \in \{0,1,\ldots ,k\}$ and $R_a$ is a range defined by (12). Notice that $u_j$ are the jth-order statistics of $z_1^2, \ldots , z_k^2$. Then, we have

$$ f_1 (h) = f_{1a} (h) = \dfrac{ r_{1a} (h) }{ c_{1a} (h)^2 }\quad (h \in R_a), $$

where functions $r_{1a}(h)$ and $c_{1a}(h)$ are given by

$$\begin{aligned} r_1 (h)&= r_{1a} (h) = \frac{1}{n} \left\{ (n-3p - k-1+a) s_a^2 + \ell _a h^2 \right\} , \\ c_1 (h)&= c_{1a} (h) = 1 - \frac{1}{n} ( 3p + k + 1 - a - \ell _a h ), \end{aligned}$$

where $s_a^2$ is given in (11) and $\ell _a = \sum _{j=a+1}^k 1/u_j$. By simple calculation, let $g_a (h) = (n - 3p - k - 1 + a) (h - s_a^2)$, we can obtain

$$\begin{aligned} \dfrac{d}{dh} f_{1a} (h)&= \dfrac{2 \ell _a}{n^2 c_{1a} (h)^3} g_a (h). \end{aligned}$$

Here, we note $g_a (u_{a+1})= g_{a+1} (u_{a+1}) \quad (a = 0, \dots , k-1).$ Moreover, $2 \ell _a/\{n^2 c_{1a} (h)^3\}$ is positive, $\lim _{h \rightarrow 0} g_0 (h) < 0$, and $g_a (h)\ (h \in R_a)$ is a monotonically increasing function in $h \in \mathbb {R}_+$, since $n-3p-k-1>0$ and $a \ge 0$. Consequently, the Eq. (13) is obtained by combining $\hat{\delta }_j$ and $h^*=s_a^2$ where $d f_{1a}(h)/d h|_{h=h^*}=0$, and some calculation.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fukui, K., Ohishi, M., Yamamura, M., Yanagihara, H. (2020). A Fast Optimization Method for Additive Model via Partial Generalized Ridge Regression. In: Czarnowski, I., Howlett, R., Jain, L. (eds) Intelligent Decision Technologies. IDT 2020. Smart Innovation, Systems and Technologies, vol 193. Springer, Singapore. https://doi.org/10.1007/978-981-15-5925-9_24

Download citation

DOI: https://doi.org/10.1007/978-981-15-5925-9_24
Published: 12 June 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-5924-2
Online ISBN: 978-981-15-5925-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Abstract

Buying options

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix: The Proof of Equation (13)

Appendix: The Proof of Equation (13)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation