Abstract
The ℓ1, ∞ norm and the ℓ1,2 norm are well known tools for joint regularization in Group-Lasso methods. While the ℓ1,2 version has been studied in detail, there are still open questions regarding the uniqueness of solutions and the efficiency of algorithms for the ℓ1, ∞ variant. For the latter, we characterize the conditions for uniqueness of solutions, we present a simple test for uniqueness, and we derive a highly efficient active set algorithm that can deal with input dimensions in the millions. We compare both variants of the Group-Lasso for the two most common application scenarios of the Group-Lasso, one is to obtain sparsity on the level of groups in “standard” prediction problems, the second one is multi-task learning where the aim is to solve many learning problems in parallel which are coupled via the Group-Lasso constraint. We show that both version perform quite similar in “standard” applications. However, a very clear distinction between the variants occurs in multi-task settings where the ℓ1,2 version consistently outperforms the ℓ1, ∞ counterpart in terms of prediction accuracy.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Tibshirani, R.: Regression shrinkage and selection via the Lasso. J. Roy. Stat. Soc. B 58(1), 267–288 (1996)
Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. Roy. Stat. Soc. B, 49–67 (2006)
Turlach, B.A., Venables, W.N., Wright, S.J.: Simultaneous variable selection. Technometrics 47, 349–363 (2005)
Meier, L., van de Geer, S., Bühlmann, P.: The Group Lasso for Logistic Regression. J. Roy. Stat. Soc. B 70(1), 53–71 (2008)
Argyriou, A., Evgeniou, T., Pontil, M.: Multi-task feature learning. In: Advances in Neural Information Processing Systems, vol. 19. MIT Press, Cambridge (2007)
Kim, Y., Kim, J., Kim, Y.: Blockwise sparse regression. Statistica Sinica 16, 375–390 (2006)
Roth, V., Fischer, B.: The Group-Lasso for generalized linear models: uniqueness of solutions and efficient algorithms. In: ICML 2008, pp. 848–855. ACM, New York (2008)
Schmidt, M., Murphy, K., Fung, G., Rosales, R.: Structure learning in random fields for heart motion abnormality detection. In: CVPR (2008)
Quattoni, A., Carreras, X., Collins, M., Darrell, T.: An efficient projection for l 1 ∞ regularization. In: 26th Intern. Conference on Machine Learning (2009)
Liu, H., Palatucci, M., Zhang, J.: Blockwise coordinate descent procedures for the multi-task lasso, with applications to neural semantic basis discovery. In: 26th Intern. Conference on Machine Learning (2009)
Osborne, M., Presnell, B., Turlach, B.: On the LASSO and its dual. J. Comp. and Graphical Statistics 9(2), 319–337 (2000)
McCullaghand, P., Nelder, J.: Generalized Linear Models. Chapman & Hall, Boca Raton (1983)
Liu, Q., Xu, Q., Zheng, V.W., Xue, H., Cao, Z., Yang, Q.: Multi-task learning for cross-platform sirna efficacy prediction: an in-silico study. BMC Bioinformatics 11(1), 181 (2010)
Yeo, G., Burge, C.: Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals. J. Comp. Biology 11, 377–394 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Vogt, J.E., Roth, V. (2010). The Group-Lasso: ℓ1, ∞ Regularization versus ℓ1,2 Regularization. In: Goesele, M., Roth, S., Kuijper, A., Schiele, B., Schindler, K. (eds) Pattern Recognition. DAGM 2010. Lecture Notes in Computer Science, vol 6376. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15986-2_26
Download citation
DOI: https://doi.org/10.1007/978-3-642-15986-2_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15985-5
Online ISBN: 978-3-642-15986-2
eBook Packages: Computer ScienceComputer Science (R0)