Abstract
When input features are naturally grouped or generated by factors in a linear classification problem, it is more meaningful to identify important groups or factors rather than individual features. The F ∞-norm support vector machine (SVM) and the group lasso penalized SVM have been developed to perform simultaneous classification and factor selection. However, these group-wise penalized SVM methods may suffer from estimation inefficiency and model selection inconsistency because they cannot perform feature selection within an identified group. To overcome this limitation, we propose the hierarchically penalized SVM (H-SVM) that not only effectively identifies important groups but also removes irrelevant features within an identified group. Numerical results are presented to demonstrate the competitive performance of the proposed H-SVM over existing SVM methods.
Similar content being viewed by others
References
Bang S, Jhun M (2012) Simultaneous estimation and factor selection in quantile regression via adaptive sup-norm regularization. Comput Stat Data Anal 56:813–826
Bang S, Jhun M (2014) Adaptive sup-norm regularized simultaneous multiple quantiles regression. Statistics 48:17–33
Breiman L (1995) Better subset regression using the nonnegative garrote. Technometrics 37:373–384
Chapelle O, Keerthi S (2008) Multi-class feature selection with support vector machines. In: Proceedings of the Amercian Statistical Association
Frank I, Friedman J (1993) A statistical view of some chemometrics regression tools. Technometrics 35:109–148
Hastie T, Tibshirani R, Friedman J (2001) The Elements of Statistical Learning. Springer-Verlag, New York
Hoerl A, Kennard R (1970) Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12:55–67
Kim Y, Kim J, Kim Y (2006) Blockwise sparse regression. Stat Sin 16:375–390
Meier L, van de Geer S, Buhlmann P (2008) The group lasso for logistic regression. J Roy Stat Soc B 70:53–71
R Core Team (2013) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. http://www.R-project.org/
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J Roy Stat Soc B 58:267–288
Turlach B, Venables W, Wright S (2005) Simultaneous variable selection. Technometrics 47:349–363
Vapnik V (1995) The nature of statistical learning theory. Springer-Verlag, New York
Wang H, Leng C (2008) A note on adaptive group lasso. Comput Stat Data Anal 52:5277–5286
Wang S, Nan B, Zhou N, Zhu J (2009) Hierarchically penalized Cox regression with grouped variables. Biometrika 96:307–322
Yang Y, Zou H (2014) A fast unified algorithm for solving group-lasso penalize learning problems. Stat Comput. doi:10.1007/s11222-014-9498-5
Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J Roy Stat Soc B 68:49–67
Zhang H, Liu Y, Wu Y, Zhu J (2008) Variable selection for multicategory svm via sup-norm regularization. Electr J Stat 2:149–167
Zhao P, Rocha G, Yu B (2009) The composite absolute penalties family for grouped and hierarchical variable selection. Ann Stat 37:3468–3497
Zhou N, Zhu J (2010) Group variable selection via a hierarchical lasso and its oracle property. Stat Interf 3:557–574
Zhu J, Rosset S, Hastie T, Tibshirani R (2003) 1-norm support vector machine. Neural Inf Proc Syst 16
Zou H, Yuan M (2008) The F ∞-norm support vector machine. Stat Sin 18:379–398
Zou H, Yuan M (2008) Regularized simultaneous model selection in multiple quantiles regression. Comput Stat Data Anal 52:5296–5304
Acknowledgments
The authors are grateful to the editor and the reviewers for their constructive and insightful comments and suggestions, which helped to dramatically improve the quality of this paper. This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by (1) the Ministry of Science, ICT and Future Planning (NRF-2013R1A1A1007536) for S. Bang and (2) the Ministry of Education (NRF-2013R1A1A2A10007545) for M. Jhun.
Author information
Authors and Affiliations
Corresponding author
Appendix: Proofs
Appendix: Proofs
Proof of Lemma 1
Let \(Q^{*} (\lambda_{1} ,\lambda_{2} ,\,\varvec{\gamma},\theta_{0} ,\varvec{\theta})\) denote the criterion that we would like to minimize in problem (2.5), let \(Q^{\diamondsuit } (\lambda ,\varvec{\gamma},\theta_{0} ,\,\,\varvec{\theta})\) denote the corresponding criterion in problem (2.6), and let \((\hat{\varvec{\gamma }}^{*} ,\hat{\theta }_{0}^{*} ,\,\hat{\varvec{\theta }}^{*} )\) denote a local minimizer of \(Q^{*} (\lambda_{1} ,\lambda_{2} ,\,\varvec{\gamma},\theta_{0} ,\varvec{\theta})\). We will prove that \((\hat{\varvec{\gamma }}^{\diamondsuit } = \lambda_{1} \hat{\varvec{\gamma }}^{*} ,\;\hat{\theta }_{0}^{\diamondsuit } \varvec{ = }\hat{\theta }_{0}^{*} ,\,\hat{\varvec{\theta }}^{\diamondsuit } = \hat{\varvec{\theta }}^{*} /\lambda_{1} )\) is a local minimizer of \(Q^{\diamondsuit } (\lambda ,\varvec{\gamma},\theta_{0} ,\,\,\varvec{\theta})\).
We immediately have \(Q^{*} (\lambda_{1} ,\lambda_{2} ,\,\varvec{\gamma},\theta_{0} ,\varvec{\theta})_{{}}\) = \(Q^{\diamondsuit } (\lambda ,\varvec{\gamma},\theta_{0} ,\,\,\varvec{\theta})\). Since \((\hat{\varvec{\gamma }}^{*} ,\hat{\theta }_{0}^{*} ,\,\hat{\varvec{\theta }}^{*} )\) is a local minimizer of \(Q^{*} (\lambda_{1} ,\lambda_{2} ,\,\varvec{\gamma},\theta_{0} ,\varvec{\theta})\), there exists δ > 0 such that if \((\varvec{\gamma^{\prime}},\theta^{\prime}_{0} ,\,\varvec{\theta^{\prime}})\) satisfies \(\left\| {\varvec{\gamma^{\prime}} - \hat{\varvec{\gamma }}^{*} } \right\|_{1} + \left\| {\theta^{\prime}_{0} - \hat{\theta }_{0}^{*} } \right\|_{1} + \left\| {\varvec{\theta^{\prime}} - \hat{\varvec{\theta }}^{*} } \right\|_{1} < \delta\), then \(Q^{*} (\lambda_{1} ,\lambda_{2} ,\hat{\varvec{\gamma }}^{*} ,\hat{\theta }_{0}^{*} ,\,\hat{\varvec{\theta }}^{*} ) \le Q^{*} (\lambda_{1} ,\lambda_{2} ,\varvec{\gamma^{\prime}},\theta^{\prime}_{0} ,\,\varvec{\theta^{\prime}})\).
Choose δ ′ such that δ ′/ min (λ 1, 1/λ 1) ≤ δ. Then for any \((\varvec{\gamma^{\prime\prime}},\theta^{\prime\prime}_{0} ,\,\varvec{\theta^{\prime\prime}})\) satisfying \(\left\| {\varvec{\gamma^{\prime\prime}} - \hat{\varvec{\gamma }}^{\diamondsuit } } \right\|_{1}\) \(+ \left\| {\theta^{\prime\prime}_{0} - \hat{\theta }_{0}^{\diamondsuit } } \right\|_{1} + \left\| {\varvec{\theta^{\prime\prime}} - \hat{\varvec{\theta }}^{\diamondsuit } } \right\|_{1} < \delta^{'}\), we have
Hence
Therefore, \((\hat{\varvec{\gamma }}^{\diamondsuit } = \lambda_{1} \hat{\varvec{\gamma }}^{*} ,\;\hat{\theta }_{0}^{*} \varvec{ = }\hat{\theta }_{0}^{\diamondsuit } ,\,\hat{\varvec{\theta }}^{\diamondsuit } = \hat{\varvec{\theta }}^{*} /\lambda_{1} )\) is a local minimizer of \(Q^{\diamondsuit } (\lambda ,\varvec{\gamma},\theta_{0} ,\,\,\varvec{\theta})\).
Similarly, we can prove that for any local minimizer \((\hat{\varvec{\gamma }}^{\diamondsuit } ,\hat{\theta }_{0}^{\diamondsuit } ,\,\hat{\varvec{\theta }}^{\diamondsuit } )\) of \(Q^{\diamondsuit } (\lambda ,\varvec{\gamma},\theta_{0} ,\,\,\varvec{\theta})\), there is a corresponding local minimize \((\hat{\varvec{\gamma }}^{*} ,\hat{\theta }_{0}^{*} ,\,\hat{\varvec{\theta }}^{*} )\) of \(Q^{*} (\lambda_{1} ,\lambda_{2} ,\,\varvec{\gamma},\theta_{0} ,\varvec{\theta})\) satisfying \(\hat{\gamma }^{*}_{k} \hat{\theta }^{*}_{kj} = \hat{\gamma }^{\diamondsuit }_{k} \hat{\theta }^{\diamondsuit }_{kj}\) and \(\hat{\theta }^{*}_{0} = \hat{\theta }^{\diamondsuit }_{0}\). □
Proof of Lemma 2
Without loss of generality, let β 0 and \(\varvec{\beta}\) be fixed at \(\hat{\beta }_{0}\) and \(\hat{\varvec{\beta }}\), respectively and let \(Q^{\diamondsuit } (\lambda ,\,\varvec{\gamma},\,\theta_{0} ,\,\varvec{\theta})\) be the corresponding criterion that we would like to minimize in problem (2.6). Then \(Q^{\diamondsuit } (\lambda ,\,\varvec{\gamma},\,\theta_{0} ,\,\varvec{\theta})\) only depends on the penalty term \(\sum\limits_{k = 1}^{K} {\gamma_{k} } + \lambda \sum\limits_{k = 1}^{K} {||\varvec{\theta}_{(k)} ||_{1} }\). For some k with \(\left\| {\hat{\varvec{\beta }}_{(k)} } \right\|_{1} \ne 0\), the corresponding penalty term is \(\gamma_{k} + {{\lambda \left\| {\hat{\varvec{\beta }}_{(k)} } \right\|_{1} } \mathord{\left/ {\vphantom {{\lambda \left\| {\hat{\varvec{\beta }}_{(k)} } \right\|_{1} } {\gamma_{k} }}} \right. \kern-0pt} {\gamma_{k} }}\), which is minimized at \(\hat{\gamma }_{k} = \lambda^{1/2} \left\| {\hat{\varvec{\beta }}_{(k)} } \right\|_{1}^{1/2} = \left( {\lambda \sum\nolimits_{j = 1}^{{p_{k} }} {|\hat{\beta }_{kj} |} } \right)^{1/2}\). Denote \(\Delta\varvec{\beta}\) \(= (\Delta\varvec{\beta}_{(1)}^{T} ,\ldots,\Delta\varvec{\beta}_{(K)}^{T} )^{T}\),\(\Delta\varvec{\beta}^{(1)} = (\Delta\varvec{\beta}_{(1)}^{(1)T} ,\ldots,\Delta\varvec{\beta}_{(K)}^{(1)T} )^{T}\) and \(\Delta\varvec{\beta}^{(2)} = (\Delta\varvec{\beta}_{(1)}^{(2)T} ,\ldots,\,\Delta\varvec{\beta}_{(K)}^{(2)T} )^{T}\), where \(\Delta\varvec{\beta}_{(k)}^{(1)} = {\mathbf{0}}\) if \(\left\| {\hat{\varvec{\beta }}_{(k)} } \right\|_{1} = 0\) and \(\Delta\varvec{\beta}_{(k)}^{(2)} = {\mathbf{0}}\) if \(\left\| {\hat{\varvec{\beta }}_{(k)} } \right\|_{1} \ne 0\) for k = 1, …, K. We thus have \(\left\| {\Delta\varvec{\beta}} \right\|_{1}\) \(= \left\| {\Delta\varvec{\user2\beta}^{(1)} } \right\|_{1} + \left\| {\Delta\varvec{\user2\beta}^{(2)} } \right\|_{1}\). Let \(Q(\lambda ,\beta_{0} ,\,\varvec{\beta})\) be the corresponding criterion in problem (2.7). Now we show that there exists a δ ′ > 0 such that if \(\text{max}\{ |\Delta \beta_{0} |,||\Delta\varvec{\beta}||_{1} \} < \delta^{\prime}\), then \(Q(\lambda ,\hat{\beta }_{0} ,\,\hat{\varvec{\beta }}) \le Q(\lambda ,\hat{\beta }_{0} + \Delta \beta ,\,\hat{\varvec{\beta }} + \Delta\varvec{\beta})\).
We first show \(Q(\lambda ,\hat{\beta }_{0} ,\,\hat{\varvec{\beta }}) \le Q(\lambda ,\hat{\beta }_{0} + \Delta \beta_{0} ,\,\hat{\varvec{\beta }} + \Delta\varvec{\beta}^{(1)} )\). By the argument given at the beginning of the proof, we have \(\hat{\gamma }_{k} = \lambda^{1/2} \left\| {\hat{\varvec{\beta }}_{(k)} } \right\|_{1}^{1/2}\) and \(\hat{\varvec{\theta }}_{(k)} = \hat{\varvec{\beta }}_{(k)} /\hat{\gamma }_{k}\) if \(|\hat{\gamma }_{k} | \ne 0\), and \(\hat{\varvec{\theta }}_{(k)} = {\mathbf{0}}\) if \(|\hat{\gamma }_{k} | = 0\). Clearly \(\hat{\theta }_{0} = \hat{\beta }_{0}\). Furthermore, let \(\hat{\gamma }^{\prime}_{k} = \lambda^{1/2} \left\| {\hat{\varvec{\beta }}_{(k)} + \Delta\varvec{\beta}_{(k)}^{(1)} } \right\|_{1}^{1/2}\) and \(\varvec{\hat{\theta }^{\prime}}_{\left( k \right)} = (\hat{\varvec{\beta }}_{(k)} + \Delta\varvec{\beta}_{(k)}^{(1)} )/\hat{\gamma }^{\prime}_{k}\) if \(|\hat{\gamma }_{k} | \ne 0\), and let \(\hat{\gamma }^{\prime}_{k} = 0\) and \(\varvec{\hat{\theta }^{\prime}}_{(k)} = {\mathbf{0}}\) if \(|\hat{\gamma }_{k} | = 0\) and let \(\hat{\theta }^{\prime}_{0} = \hat{\beta }_{0} + \Delta \beta_{0}\). Then we have \(Q^{\diamondsuit } (\lambda ,\,\varvec{\hat{\gamma }^{\prime}},\,\hat{\theta }^{\prime}_{0} ,\,\varvec{\hat{\theta }^{\prime}}) = Q(\lambda ,\hat{\beta }_{0} + \Delta \beta_{0} ,\hat{\varvec{\beta }} + \Delta\varvec{\beta}^{(1)} )\) and \(Q^{\diamondsuit } (\lambda ,\hat{\varvec{\gamma }},\hat{\theta }_{0} ,\,\hat{\varvec{\theta }})\) \(= Q(\lambda ,\hat{\beta }_{0} ,\,\hat{\varvec{\beta }})\). As \((\hat{\varvec{\gamma }},\hat{\theta }_{0} ,\,\hat{\varvec{\theta }})\) is a local minimizer of \(Q^{\diamondsuit } (\lambda ,\,\varvec{\gamma},\,\theta_{0} ,\,\varvec{\theta})\), there exists a δ > 0 such that for any \((\varvec{\gamma^{\prime}},\theta^{\prime}_{0} ,\,\varvec{\theta^{\prime}})\) satisfying \(\left\| {\varvec{\gamma^{\prime}} - \hat{\varvec{\gamma }}} \right\|_{1} + \left\| {\theta_{0}^{\prime } - \hat{\theta }_{0} } \right\|_{1} + \left\| {\varvec{\theta^{\prime}} - \hat{\varvec{\theta }}} \right\|_{1} < \delta\), we have \(Q^{\diamondsuit } (\lambda ,\hat{\varvec{\gamma }},\hat{\theta }_{0} ,\,\hat{\varvec{\theta }})\) \(\le Q^{\diamondsuit } (\lambda ,\varvec{\gamma^{\prime}},\theta_{0}^{\prime } ,\,\varvec{\theta^{\prime}})\). For \(a = \hbox{min} \{ \left\| {\hat{\varvec{\beta }}_{(k)} } \right\|_{1} :\left\| {\hat{\varvec{\beta }}_{(k)} } \right\|_{1} \ne 0,\,\,k = 1,\ldots,K\}\) and δ ′ < a/2, we have \(|\hat{\gamma^{\prime}}_{k} - \hat{\gamma }_{k} | = \sqrt \lambda \left| {\left\| {\hat{\varvec{\beta }}_{(k)} + \Delta\varvec{\beta}_{(k)}^{(1)} } \right\|_{1}^{1/2} - \left\| {\hat{\varvec{\beta }}_{(k)} } \right\|_{1}^{1/2} } \right| \le \sqrt \lambda \left| {\left( {\left\| {\hat{\varvec{\beta }}_{(k)} } \right\|_{1} + \left\| {\Delta\varvec{\beta}_{(k)}^{(1)} } \right\|_{1} } \right)^{1/2} - \left\| {\hat{\varvec{\beta }}_{(k)} } \right\|_{1}^{1/2} } \right| \le \frac{{\sqrt \lambda \left\| {\Delta\varvec{\beta}_{(k)}^{(1)} } \right\|_{1} }}{{2\left\| {\hat{\varvec{\beta }}_{(k)} } \right\|_{1}^{1/2} }} \le \frac{{\sqrt \lambda \left\| {\Delta\varvec{\beta}_{(k)}^{(1)} } \right\|_{1} }}{2\sqrt a },\) by the triangular inequality \(||\varvec{a} + \varvec{b}||_{1}^{1/2} \le (||\varvec{a}||_{1} + ||\varvec{b}||_{1} )^{1/2}\) and \((||\varvec{a}||_{1} + ||\varvec{b}||_{1} )^{1/2} - ||\varvec{a}||_{1}^{1/2} \le \frac{{||\varvec{b}||_{1} }}{{2||\varvec{a}||_{1}^{1/2} }}\). Furthermore, by using the inequality \(\sqrt \lambda \left| {\left( {\left\| {\hat{\varvec{\beta }}_{(k)} } \right\|_{1} + \left\| {\Delta\varvec{\beta}_{(k)}^{(1)} } \right\|_{1} } \right)^{1/2} - \left\| {\hat{\varvec{\beta }}_{(k)} } \right\|_{1}^{1/2} } \right|\) \(\le \frac{{\sqrt \lambda \left\| {\Delta\varvec{\beta}_{(k)}^{(1)} } \right\|_{1} }}{{2\left\| {\hat{\varvec{\beta }}_{(k)} } \right\|_{1}^{1/2} }}\), we have
Therefore, we are able to choose a δ ′ satisfying δ ′ < a/2 such that \(\left\| {\varvec{\hat{\gamma}^{\prime}} - \hat{\varvec{\gamma }}} \right\|_{1} + \left\| {\hat{\theta }_{0}^{\prime } - \hat{\theta }_{0} } \right\|_{1} + \left\| {\varvec{{\hat{\theta}}^{\prime}} - \hat{\varvec{\theta }}} \right\|_{1} < \delta\) when \(\left\| {\Delta\varvec{\beta}^{(1)} } \right\|_{1} < \delta^{\prime}\). Hence we have \(Q^{\diamondsuit } (\lambda ,\hat{\varvec{\gamma }},\hat{\theta }_{0} ,\,\hat{\varvec{\theta }}) \le Q^{\diamondsuit } (\lambda ,\hat{\varvec{\gamma }^{\prime}},\hat{\theta }^{\prime}_{0} ,\,\hat{\varvec{\theta }^{\prime}})\) due to the local minimality. Hence \(Q(\lambda ,\hat{\beta }_{0} ,\,\hat{\varvec{\beta }})\) \(\le Q(\lambda ,\hat{\beta }_{0} + \Delta \beta_{0} ,\,\hat{\varvec{\beta }} + \Delta\varvec{\beta}^{(1)} )\).
Next we show \(Q(\lambda ,\hat{\beta }_{0} + \Delta \beta_{0} ,\,\hat{\varvec{\beta }} + \Delta\varvec{\beta}^{(1)} ) \le Q(\lambda ,\,\hat{\beta }_{0} + \Delta \beta_{0} ,\,\hat{\varvec{\beta }} + \Delta\varvec{\beta}^{(1)} + \Delta\varvec{\beta}^{(2)} )\). Note that the Lipschitz continuity of the hinge loss function \(\left\lfloor {h(\beta_{0} ,\,\,\varvec{\beta})} \right\rfloor_{ + }\) implies that
for some L ′ > 0, where \(h(\beta_{0} ,\varvec{\beta})\) = 1 − y i (β 0 + ∑ K k=1 x T i,(k) β (k)). Moreover, we can choose a real number L such that
Hence, there exists a number L in \({\mathbb{R}}\) such that
Since \(\left\| {\Delta\varvec{\beta}^{(2)} } \right\|_{1} < \delta^{\prime}\) for a small enough δ ′, the second term in the right side of the above equality dominates the first term, hence we have \(Q(\lambda ,\hat{\beta }_{0} + \Delta \beta_{0} ,\hat{\varvec{\beta }} + \Delta\varvec{\beta}^{(1)} )\) \(\le Q(\lambda ,\hat{\beta }_{0} + \Delta \beta_{0} ,\hat{\varvec{\beta }} + \Delta\varvec{\beta}^{(1)} + \Delta\varvec{\beta}^{(2)} )\). Thus we have shown that there exists a δ ′ > 0 such that if \(\left\| {\Delta\varvec{\beta}} \right\|_{1} < \delta^{\prime}\), then \(Q(\lambda ,\hat{\beta }_{0} + \Delta \beta_{0} ,\hat{\varvec{\beta }}) \le Q(\lambda ,\hat{\beta }_{0} + \Delta \beta_{0} ,\hat{\varvec{\beta }} + \Delta\varvec{\beta})\), which implies that \((\hat{\beta }_{0} ,\hat{\varvec{\beta }})\) is a local minimizer of \(Q(\lambda ,\,\beta_{0} ,\,\varvec{\beta})\).
Similarly we can prove that if \((\hat{\beta }_{0} ,\hat{\varvec{\beta }})\) is a local minimizer of \(Q(\lambda ,\,\beta_{0} ,\,\varvec{\beta})\), then \((\hat{\varvec{\gamma }},\hat{\theta }_{0} ,\,\hat{\varvec{\theta }})\) is a local minimizer of \(Q^{\diamondsuit } (\lambda ,\varvec{\gamma},\theta_{0} ,\,\varvec{\theta})\) satisfying \(\hat{\beta }_{0} = \hat{\theta }_{0}\) and \(\hat{\beta }_{kj} = \hat{\gamma }_{k} \hat{\theta }_{kj}\). □
Proof of Lemma 3
Clearly, \(Q^{\diamondsuit } (\lambda ,\varvec{\gamma},\theta_{0} ,\,\varvec{\theta})\) is bounded below due to the positivity of its each term. Let \({\hat{\user2{\gamma }}}_{{}}^{(t)}\) and \((\hat{\theta }_{0}^{(t)} ,\,\,{\hat{\user2{\theta }}}_{{}}^{(t)} )\) denote the minimizers of the criterion in the optimization problems (3.1) and (3.2) at the tth iteration, respectively. Note that the minimization problems in Steps 1 and 2 are equivalent to minimizing \(Q^{\diamondsuit } (\lambda ,\hat{\varvec{\gamma }}^{(t - 1)} ,\theta_{0} ,\,\varvec{\theta})\) with respect to \((\theta_{0} ,\,\,\varvec{\theta})\) and minimizing \(Q^{\diamondsuit } (\lambda ,\varvec{\gamma},\hat{\theta }_{0}^{(t)} ,\,\,\hat{\varvec{\theta }}^{(t)} )\) with respect to \(\varvec{\gamma}\), respectively. Since \(\hat{\varvec{\gamma }}^{(t)}\) is the minimizer of \(Q^{\diamondsuit } (\lambda ,\varvec{\gamma},\hat{\theta }_{0}^{(t)} ,\,\,\hat{\varvec{\theta }}^{(t)} )\), we have \(Q^{\diamondsuit } (\lambda ,\hat{\varvec{\gamma }}^{(t)} ,\hat{\theta }_{0}^{(t)} ,\,\hat{\varvec{\theta }}^{(t)} )\) \(\le Q^{\diamondsuit } (\lambda ,\hat{\varvec{\gamma }}^{(t - 1)} ,\hat{\theta }_{0}^{(t)} ,\,\hat{\varvec{\theta }}^{(t)} )\). Similarly, since \((\hat{\theta }_{0}^{(t + 1)} ,\,\hat{\varvec{\theta }}^{(t + 1)} )\) is the minimizer of \(Q^{\diamondsuit } (\lambda ,\hat{\varvec{\gamma }}^{(t)} ,\hat{\theta }_{0}^{{}} ,\,\hat{\varvec{\theta }}^{{}} )\) and \(\hat{\varvec{\gamma }}^{(t + 1)}\) is the minimizer of \(Q^{\diamondsuit } (\lambda ,\varvec{\gamma},\hat{\theta }_{0}^{(t + 1)} ,\,\,\hat{\varvec{\theta }}^{(t + 1)} )\), we can show that \(Q^{\diamondsuit } (\lambda ,\hat{\varvec{\gamma }}^{(t)} ,\hat{\theta }_{0}^{(t + 1)} ,\,\hat{\varvec{\theta }}^{(t + 1)} )\) \(\le Q^{\diamondsuit } (\lambda ,\hat{\varvec{\gamma }}^{(t)} ,\hat{\theta }_{0}^{(t)} ,\,\hat{\varvec{\theta }}^{(t)} )\) and \(Q^{\diamondsuit } (\lambda ,\hat{\varvec{\gamma }}^{(t + 1)} ,\hat{\theta }_{0}^{(t + 1)} ,\,\hat{\varvec{\theta }}^{(t + 1)} ) \le Q^{\diamondsuit } (\lambda ,\hat{\varvec{\gamma }}^{(t)} ,\hat{\theta }_{0}^{(t + 1)} ,\hat{\varvec{\theta }}^{(t + 1)} )\). Therefore, we have \(Q^{\diamondsuit } (\lambda ,\hat{\varvec{\gamma }}^{(t + 1)} ,\hat{\theta }_{0}^{(t + 1)} ,\,\hat{\varvec{\theta }}^{(t + 1)} )\) \(\le Q^{\diamondsuit } (\lambda ,\hat{\varvec{\gamma }}^{(t)} ,\hat{\theta }_{0}^{(t)} ,\,\hat{\varvec{\theta }}^{(t)} )\), which implies that \(Q^{\diamondsuit } (\lambda ,\varvec{\gamma},\theta_{0} ,\,\varvec{\theta})\) decreases for each iteration. □
Rights and permissions
About this article
Cite this article
Bang, S., Kang, J., Jhun, M. et al. Hierarchically penalized support vector machine with grouped variables. Int. J. Mach. Learn. & Cyber. 8, 1211–1221 (2017). https://doi.org/10.1007/s13042-016-0494-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-016-0494-2