Abstract
Although many statistical tools for analyzing the additive model apply the back-fitting algorithm, it is well-known that it is not guaranteed that the algorithm will converge or obtain a unique solution. Furthermore, running the algorithm on large datasets is computationally challenging. In order to address these issues, we propose a new optimization method for the additive model via partial generalized ridge regression. Using the proposed method, all trends of the additive model are estimated simultaneously, and closed-form expressions for the smoothing parameters that minimize the GCV criterion are derived. From a numerical study, compared to the back-fitting algorithm, our new method achieved higher performance with respect to both predictive accuracy and computational efficiency.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Bartolino, V., Colloca, F., Sartor, P., Ardizzone, G.: Modelling recruitment dynamics of hake, merluccius merluccius, in the central mediterranean in relation to key environmental variables. Fish. Res. 92, 277–288 (2008). https://doi.org/10.1016/j.fishres.2008.01.007
Craven, P., Wahba, G.: Smoothing noisy data with spline functions: estimating the correct degree of smoothing by the method of generalized cross-validation. Numer. Math. 31, 377–403 (1979). https://doi.org/10.1007/BF01404567
Hastie, T., Friedman, J., Tibshirani, R.: The elements of statistical learning. Springer, New York (2001)
Hastie, T., Tibshirani, R.: Generalized additive models. Chapman & Hall, London (1990)
Huang, L.S., Cox, C., Myers, G.J., Davidson, P.W., Cernichiari, E., Shamlaye, C.F., Sloane-Reeves, J., Clarkson, T.W.: Exploring nonlinear association between prenatal methylmercury exposure from fish consumption and child development: evaluation of the seychelles child development study nine-year data using semiparametric additive models. Environ. Res. 97, 100–108 (2005). https://doi.org/10.1016/j.envres.2004.05.004
Mallows, C.L.: Some comments on \(C_p\). Technometrics 15, 661–675 (1973). https://doi.org/10.2307/1267380
Stone, M.: Cross-validatory choice and assessment of statistical predictions. J. Roy. Statist. Soc. Ser. B 36, 111–147 (1974). https://doi.org/10.1111/j.2517-6161.1974.tb00994.x
Wand, M.: A comparison of regression spline smoothing procedures. Comput. Statist. 15, 443–462 (2000). https://doi.org/10.1007/s001800000047
Wood, S.N.: Stable and efficient multiple smoothing parameter estimation for generalized additive models. J. Amer. Statist. Assoc. 99, 673–686 (2004). https://doi.org/10.1198/016214504000000980
Wood, S.N.: Fast stable direct fitting and smoothness selection for generalized additive models. J. R. Stat. Soc. Ser. B. Stat. Methodol. 70, 495–518 (2008). https://doi.org/10.1111/j.1467-9868.2007.00646.x
Yanagihara, H.: A non-iterative optimization method for smoothness in penalized spline regression. Stat. Comput. 22, 527–544 (2012). https://doi.org/10.1007/s11222-011-9245-0
Yanagihara, H.: Explicit solution to the minimization problem of generalized cross-validation criterion for selecting ridge parameters in generalized ridge regression. Hiroshima Math. J. 48, 203–222 (2018). 10.32917/hmj/1533088835
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix: The Proof of Equation (13)
Appendix: The Proof of Equation (13)
A singular value decomposition of \((\varvec{I}_n - \varvec{P}_{{\varvec{W}}_1}) {\varvec{W}}_2\) is expressed as
where \(\varvec{G}\) is an \(n\times n\) orthogonal matrix and \({\varvec{G}}_1\) is an \(n \times k\) matrix derived from the partition \({\varvec{G}}= ({\varvec{G}}_1, {\varvec{G}}_2)\). Note that \({\varvec{D}}^{1/2}\) is diagonal matrix. From Eqs. (8) and (14), we can see that
Hence, \(\mathrm{tr}({\varvec{H}}_{{\varvec{\Lambda }}})\) can be calculated as
Notice that \({\varvec{G}}_1 = ({\varvec{I}}_n - {\varvec{P}}_{{\varvec{W}}_1}) {\varvec{W}}_2 {\varvec{Q}}{\varvec{D}}^{-1/2}\) from (14), and \(\varvec{I}_n = \varvec{G} \varvec{G}' = \varvec{G}_1 \varvec{G}_1'+\varvec{G}_2 \varvec{G}_2' \). Hence, we have \(\varvec{G}_1 ' (\varvec{I}_n - {\varvec{P}}_{{\varvec{W}}_1}){\varvec{y}}= {\varvec{z}}\) and
Using the above results and noting \(P_{W_1}P_1=O_{n,k}\), we can derive the following equation:
By substituting (15) and above result into (9), we can obtain
Let \({\varvec{\delta }}= (\delta _1, \dots , \delta _k)'\) be a k-dimensional vector defined as
Note that \(\delta _j \in [0,1]\) since \(d_j \ge 0\) and \(\lambda _j \ge 0\). Then, the GCV criterion in (16) is expressed as the following function with respect to \({\varvec{\delta }}\):
where the functions \(r({\varvec{\delta }})\) and \(c({\varvec{\delta }})\) are given by
Here, \(z_1,\dots ,z_k\) are given in (10). Let \(\hat{{\varvec{\delta }}} = (\hat{\delta }_1, \dots , \hat{\delta }_k)'\) be the minimizer of \(f({\varvec{\delta }})\) in (17), i.e.,
where \([0,1]^k\) is the kth Cartesian power of the set [0, 1]. Notice that \(r(\delta )\) and \(c(\delta )\) are differentiable functions with respect to \(\delta _j\). Thus, we obtain
Hence, noting \(c(\delta )\) is a finite function, we find a necessary condition for \(\hat{{\varvec{\delta }}}\) as
where \(h({\varvec{\delta }}) = r({\varvec{\delta }})/c({\varvec{\delta }}) > 0\).
On the other hand, let \({\varvec{H}}= \{ {\varvec{\delta }}\in [0, 1]^k| {\varvec{\delta }}= {\varvec{\delta }}^{\star } (h), \forall h \in \mathbb {R}_+ \}\), where \({\varvec{\delta }}^\star (h)\) is the k-dimensional vector for which the jth element is defined as
and \(\mathbb {R}_+\) is a set of nonnegative real numbers. Then, it follows from \(\mathcal {H}\subseteq [0, 1]^k\) and \(\hat{{\varvec{\delta }}} \in \mathcal {H}\) that
Hence, we have
Using this result, we minimize the following function:
where \(r_1(h)=r({\varvec{\delta }}^\star (h))\) and \(c_1(h)=c({\varvec{\delta }}^\star (h))\), which can be calculated as
Suppose that \(h \in R_a\), where \(a \in \{0,1,\ldots ,k\}\) and \(R_a\) is a range defined by (12). Notice that \(u_j\) are the jth-order statistics of \(z_1^2, \ldots , z_k^2\). Then, we have
where functions \(r_{1a}(h)\) and \(c_{1a}(h)\) are given by
where \(s_a^2\) is given in (11) and \(\ell _a = \sum _{j=a+1}^k 1/u_j\). By simple calculation, let \(g_a (h) = (n - 3p - k - 1 + a) (h - s_a^2)\), we can obtain
Here, we note \(g_a (u_{a+1})= g_{a+1} (u_{a+1}) \quad (a = 0, \dots , k-1).\) Moreover, \(2 \ell _a/\{n^2 c_{1a} (h)^3\}\) is positive, \(\lim _{h \rightarrow 0} g_0 (h) < 0\), and \(g_a (h)\ (h \in R_a)\) is a monotonically increasing function in \(h \in \mathbb {R}_+\), since \(n-3p-k-1>0\) and \(a \ge 0\). Consequently, the Eq. (13) is obtained by combining \(\hat{\delta }_j\) and \(h^*=s_a^2\) where \(d f_{1a}(h)/d h|_{h=h^*}=0\), and some calculation.
Rights and permissions
Copyright information
© 2020 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Fukui, K., Ohishi, M., Yamamura, M., Yanagihara, H. (2020). A Fast Optimization Method for Additive Model via Partial Generalized Ridge Regression. In: Czarnowski, I., Howlett, R., Jain, L. (eds) Intelligent Decision Technologies. IDT 2020. Smart Innovation, Systems and Technologies, vol 193. Springer, Singapore. https://doi.org/10.1007/978-981-15-5925-9_24
Download citation
DOI: https://doi.org/10.1007/978-981-15-5925-9_24
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-5924-2
Online ISBN: 978-981-15-5925-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)