Abstract
During the past few years there has been an explosion of interest in learning methods based on sparsity regularization. In this chapter, we discuss a general class of such methods, in which the regularizer can be expressed as the composition of a convex function ω with a linear function. This setting includes several methods such as the Group Lasso, the Fused Lasso, multi-task learning and many more. We present a general approach for solving regularization problems of this kind, under the assumption that the proximity operator of the function ω is available. Furthermore, we comment on the application of this approach to support vector machines, a technique pioneered by the groundbreaking work of Vladimir Vapnik.
Dedicated to Vladimir Vapnik with esteem and gratitude for his fundamental contribution to Machine Learning.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Argyriou, A., Evgeniou, T., Pontil, M.: Convex multi-task feature learning. Mach. Learn. 73(3), 243–272 (2008)
Argyriou, A., Micchelli, C., Pontil, M., Shen, L., Xu, Y.: Efficient first order methods for linear composite regularizers. CoRR 1104.1436 (2011)
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
Borwein, J., Lewis, A.: Convex Analysis and Nonlinear Optimization: Theory and Examples. CMS Books in Mathematics. Springer (2005)
Boser, B., Guyon, I., Vapnik, V.: A training algorithm for optimal margin classifiers. In: Proceedings of 5th Annual ACM Workshop on Computational Learning Theory, Pittsburgh, pp. 144–152 (1992)
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)
Cavallanti, G., Cesa-Bianchi, N., Gentile, C.: Linear algorithms for online multitask classification. J. Mach. Learn. Res. 11, 2901–2934 (2010)
Combettes, P., Pesquet, J.C.: Proximal splitting methods in signal processing. In: Bauschke, H., Burachik, R., Combettes, P., Elser, V., Luke, D., Wolkowicz, H. (eds.) Fixed-Point Algorithms for Inverse Problems in Science and Engineering, pp. 185–212. Springer (2011)
Combettes, P., Wajs, V.: Signal recovery by proximal forward-backward splitting. Multiscale Model. Simul. 4(4), 1168–1200 (2006)
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20, 273–297 (1995)
Evgeniou, T., Pontil, M., Poggio, T.: Regularization networks and support vector machines. Adv. Comput. Math. 13(1), 1–50 (2000)
Evgeniou, T., Pontil, M., Toubia, O.: A convex optimization approach to modeling heterogeneity in conjoint estimation. Mark. Sci. 26, 805–818 (2007)
Herbster, M., Lever, G.: Predicting the labelling of a graph via minimum p-seminorm interpolation. In: Proceedings of the 22nd Conference on Learning Theory (COLT), Montreal (2009)
Herbster, M., Pontil, M.: Prediction on a Graph with the Perceptron. Advances in Neural Information Processing Systems 19, pp. 577–584. MIT (2007)
Jenatton, R., Audibert, J.Y., Bach, F.: Structured variable selection with sparsity-inducing norms. CoRR 0904.3523v2 (2009)
Maurer, A., Pontil, M.: Structured sparsity and generalization. J. Mach. Learn. Res. 13, 671–690 (2012)
Micchelli, C., Pontil, M.: Feature space perspectives for learning the kernel. Mach. Learn. 66, 297–319 (2007)
Micchelli, C., Morales, J., Pontil, M.: A family of penalty functions for structured sparsity. In: NIPS, Vancouver, pp. 1612–1623 (2010)
Micchelli, C., Shen, L., Xu, Y.: Proximity algorithms for image models: denoising. Inverse Probl. 27(4) (2011)
Moreau, J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Acad. Sci. Paris Sér. A Math. 255, 2897–2899 (1962)
Mosci, S., Rosasco, L., Santoro, M., Verri, A., Villa, S.: Solving structured sparsity regularization with proximal methods. In: Proceedings of European Conference on Machine Learning and Knowledge Discovery in Databases, Barcelona, pp. 418–433 (2010)
Nesterov, Y.: A method of solving a convex programming problem with convergence rate O(1∕k 2). Sov. Math. Dokl. 27(2), 372–376 (1983)
Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course. Kluwer, Boston (2004)
Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Program. 103(1), 127–152 (2005)
Nesterov, Y.: Gradient methods for minimizing composite objective function. ECORE Discussion Paper 2007/76 (2007)
Pontil, M., Verri, A.: Properties of support vector machines. Neural Comput. 10, 955–974 (1998)
Pontil, M., Rifkin, R., Evgeniou, T.: From regression to classification in support vector machines. In: Proceedings of 7th European Symposium on Artificial Neural Networks, Bruges, pp. 225–230 (1999)
Rakotomamonjy, A., Bach, F., Canu, S., Grandvalet, Y.: SimpleMKL. J. Mach. Learn. Res. 9, 2491–2521 (2008)
Sonnenburg, S., Rätsch, G., Schäfer, C., Schölkopf, B.: Large scale multiple kernel learning. J. Mach. Learn. Res. 7, 1531–1565 (2006)
Suzuki, T., Tomioka, R.: SpicyMKL: a fast algorithm for multiple kernel learning with thousands of kernels. Mach. Learn. 8(1), 77–108 (2011)
Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., Knight, K.: Sparsity and smoothness via the fused Lasso. J. R. Stat. Soc.: Ser. B (Stat. Methodol.) 67(1), 91–108 (2005)
Tseng, P.: Approximation accuracy, gradient methods, and error bound for structured convex optimization. Math. Program. 125(2), 263–295 (2010)
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1999)
Villa, S., Salzo, S., Baldassarre, L., Verri, A.: Accelerated and inexact forward-backward splitting. SIAM J. Optim. 23(3), 1607–1633 (2013)
von Neumann, J.: Some matrix-inequalities and metrization of matric-space. Mitt. Forsch.-Inst. Math. Mech. Univ. Tomsk 1, 286–299 (1937)
Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. B 68(1), 49–67 (2006)
Zǎlinescu, C.: Convex Analysis in General Vector Spaces. World Scientific, River Edge/London (2002)
Zhao, P., Rocha, G., Yu, B.: Grouped and hierarchical model selection through composite absolute penalties. Ann. Stat. 37(6A), 3468–3497 (2009)
Acknowledgements
Part of this work was supported by EPSRC Grant EP/H027203/1, Royal Society International Joint Project Grant 2012/R2 and by the European Union Seventh Framework Programme (FP7 2007-2013) under grant agreement No. 246556.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Argyriou, A., Baldassarre, L., Micchelli, C.A., Pontil, M. (2013). On Sparsity Inducing Regularization Methods for Machine Learning. In: Schölkopf, B., Luo, Z., Vovk, V. (eds) Empirical Inference. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41136-6_18
Download citation
DOI: https://doi.org/10.1007/978-3-642-41136-6_18
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41135-9
Online ISBN: 978-3-642-41136-6
eBook Packages: Computer ScienceComputer Science (R0)