Efficient block-coordinate descent algorithms for the Group Lasso

Qin, Zhiwei; Scheinberg, Katya; Goldfarb, Donald

doi:10.1007/s12532-013-0051-x

Efficient block-coordinate descent algorithms for the Group Lasso

Full Length Paper
Published: 31 March 2013

Volume 5, pages 143–169, (2013)
Cite this article

Mathematical Programming Computation Aims and scope Submit manuscript

Zhiwei Qin¹,
Katya Scheinberg² &
Donald Goldfarb¹

3277 Accesses
Explore all metrics

Abstract

We present two algorithms to solve the Group Lasso problem (Yuan and Lin in, J R Stat Soc Ser B (Stat Methodol) 68(1):49–67, 2006). First, we propose a general version of the Block Coordinate Descent (BCD) algorithm for the Group Lasso that employs an efficient approach for optimizing each subproblem exactly. We show that it exhibits excellent performance when the groups are of moderate size. For groups of large size, we propose an extension of ISTA/FISTA SIAM (Beck and Teboulle in, SIAM J Imag Sci 2(1):183–202, 2009) based on variable step-lengths that can be viewed as a simplified version of BCD. By combining the two approaches we obtain an implementation that is very competitive and often outperforms other state-of-the-art approaches for this problem. We show how these methods fit into the globally convergent general block coordinate gradient descent framework in Tseng and Yun (Math Program 117(1):387–423, 2009). We also show that the proposed approach is more efficient in practice than the one implemented in Tseng and Yun (Math Program 117(1):387–423, 2009). In addition, we apply our algorithms to the Multiple Measurement Vector (MMV) recovery problem, which can be viewed as a special case of the Group Lasso problem, and compare their performance to other methods in this particular instance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

LARS-type algorithm for group lasso

Article 23 May 2016

Group linear algorithm with sparse principal decomposition: a variable selection and clustering method for generalized linear models

Article 06 May 2022

Newton-Type Methods with the Proximal Gradient Step for Sparse Estimation

Article 20 March 2024

Notes

This is equivalent to a special case of what is recently known as multi-task regression with structured sparsity [13].
This algorithm coincides with the M-BCD method proposed in [21] recently while the first version of this paper was in preparation.
We ran only the Group Lasso experiments on SLEP.
We ran the MMV experiments on SPOR-SPG [4] and the Group Lasso experiments on SPG in [5].

References

Bach, F.: Consistency of the group Lasso and multiple kernel learning. J. Mach. Learn. Res. 9, 1179–1225 (2008)
MathSciNet MATH Google Scholar
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imag. Sci. 2(1), 183–202 (2009)
Article MathSciNet MATH Google Scholar
van den Berg, E., Friedlander, M.: Joint-sparse recovery from multiple measurements. arXiv 904 (2009)
van den Berg, E., Friedlander, M.: Sparse Optimization With Least-squares Constraints. Tech. rep., Technical Report TR-2010-02, Department of Computer Science, University of British Columbia, Columbia (2010)
van den Berg, E., Schmidt, M., Friedlander, M., Murphy, K.: Group sparsity via linear-time projection. Tech. rep., Technical Report TR-2008-09, Department of Computer Science, University of British Columbia, Columbia (2008)
Candès, E., Romberg, J., Tao, T.: Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. Inform Theory IEEE Trans 52(2), 489–509 (2006)
Article MATH Google Scholar
Chen, J., Huo, X.: Theoretical results on sparse representations of multiple-measurement vectors. IEEE Trans. Signal Process. 54, 12 (2006)
Google Scholar
Dolan, E., Moré, J.: Benchmarking optimization software with performance profiles. Math. Program. 91(2), 201–213 (2002)
Article MathSciNet MATH Google Scholar
Donoho, D.: Compressed sensing, information theory. IEEE Trans. 52(4), 1289–1306 (2006)
MathSciNet Google Scholar
Friedman, J., Hastie, T., Tibshirani, R.: A note on the group lasso and a sparse group lasso. preprint, Leipzig (2010)
Jacob, L., Obozinski, G., Vert, J.: Group Lasso with overlap and graph Lasso. In: Proceedings of the 26th Annual International Conference on Machine Learning, ACM, New York, pp. 433–440 (2009)
Kim, D., Sra, S., Dhillon, I.: A scalable trust-region algorithm with application to mixed-norm regression. vol. 1. In: Internetional Conference Machine Learning (ICML), Atlanta (2010)
Kim, S., Xing, E.: Tree-guided group lasso for multi-task regression with structured sparsity. In: Proceedings of the 27th Annual International Conference on, Machine Learning, New York (2010)
Liu, J., Ji, S., Ye, J.: Multi-task feature learning via efficient l 2, 1-norm minimization. In: Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, AUAI Press, Corvallis, pp. 339–348 (2009)
Liu, J., Ji, S., Ye, J.: SLEP: Sparse Learning with Efficient Projections. Arizona State University, Arizona (2009)
Google Scholar
Ma, S., Song, X., Huang, J.: Supervised group Lasso with applications to microarray data analysis. BMC bioinformatics 8(1), 60 (2007)
Article Google Scholar
Meier, L., Van De Geer, S., Buhlmann, P.: The group lasso for logistic regression. J. Royal Stat. Soc. Ser. B (Stat. Methodol.) 70(1), 53–71 (2008)
Article MATH Google Scholar
Moré, J., Sorensen, D.: Computing a trust region step. SIAM J. Sci. Statist. Comput. 4, 553 (1983)
Article MathSciNet MATH Google Scholar
Nesterov, Y.: Efficiency of coordinate descent methods on huge-scale optimization problems. CORE Discussion Papers, Belgique (2010)
Nocedal, J., Wright, S.: Numerical optimization. Springer verlag, New York (1999)
Book MATH Google Scholar
Rakotomamonjy, A.: Surveying and comparing simultaneous sparse approximation (or group-lasso) algorithms. Sig. Process. 91(7), 1505–1526 (2011)
Article MATH Google Scholar
Richtárik, P., Takáč, M.: Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function. Arxiv, preprint arXiv:1107.2848 (2011)
Roth, V., Fischer, B.: The group-lasso for generalized linear models: uniqueness of solutions and efficient algorithms. In: Proceedings of the 25th international conference on Machine learning, ACM, Bellevue, pp. 848–855 (2008)
Subramanian, A., Tamayo, P., Mootha, V., Mukherjee, S., Ebert, B., Gillette, M., Paulovich, A., Pomeroy, S., Golub, T., Lander, E., et al.: Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. U.S.A. 102(43), 15,545 (2005)
Article Google Scholar
Sun, L., Liu, J., Chen, J., Ye, J.: Efficient Recovery of Jointly Sparse Vectors. NIPS, Canada, (2009)
Google Scholar
R, Tibshirani: Regression shrinkage and selection via the lasso. J. Royal Statist. Soc. Ser. B (Methodol.) 58(1), 267–288 (1966)
Google Scholar
Tseng, P.: Convergence of a block coordinate descent method for nondifferentiable minimization. J. Optim. Theory Appl. 109(3), 475–494 (2001)
Article MathSciNet MATH Google Scholar
Tseng, P., Yun, S.: A coordinate gradient descent method for nonsmooth separable minimization. Math. Program. 117(1), 387–423 (2009)
Article MathSciNet MATH Google Scholar
Van De Vijver, M., He, Y., van’t Veer, L., Dai, H., Hart, A., Voskuil, D., Schreiber, G., Peterse, J., Roberts, C., Marton, M., et al.: A gene-expression signature as a predictor of survival in breast cancer. N. Engl. J. Med. 347(25), 1999 (2002)
Article Google Scholar
Vandenberghe, L.: Gradient methods for nonsmooth problems. EE236C course notes (2008)
Wright, S., Nowak, R., Figueiredo, M.: Sparse reconstruction by separable approximation. IEEE Trans. Signal Process. 57(7), 2479–2493 (2009)
Article MathSciNet Google Scholar
Yang, H., Xu, Z., King, I., Lyu, M.: Online learning for group lasso. In: 27th Intl Conf. on Machine Learning (ICML2010). Citeseer (2010)
Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. Royal Statist. Soc. Ser. B (Statist. Methodol.) 68(1), 49–67 (2006)
Article MathSciNet MATH Google Scholar
Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. Royal Statist. Soc. Ser. B (Statist. Methodol.) 67(2), 301–320 (2005)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

We would like to thank Shiqian Ma for valuable discussions on the MMV problems. We also thank the two anonymous reviewers for their constructive comments, which improved this paper significantly.

Author information

Authors and Affiliations

Department of Industrial Engineering and Operations Research, Columbia University, New York, NY, 10027, USA
Zhiwei Qin & Donald Goldfarb
Department of Industrial and Systems Engineering, Lehigh University, Bethlehem, PA, 18015, USA
Katya Scheinberg

Authors

Zhiwei Qin
View author publications
You can also search for this author inPubMed Google Scholar
Katya Scheinberg
View author publications
You can also search for this author inPubMed Google Scholar
Donald Goldfarb
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Zhiwei Qin.

Additional information

Research supported in part by NSF Grants DMS 06-06712 and DMS 10-16571, ONR Grant N00014-08-1-1118 and DOE Grant DE-FG02-08ER25856 and AFOSR Grant FA9550-11-1-0239.

Appendices

Appendix A: Simulated Group Lasso data sets

For the specifications of the data sets in Table 2, interested readers can refer to the references provided at the beginning of Sect. 7.1. Here, we provide the details of the data sets in Table 3.

1.1 A.1 yl1L

50 latent variables $Z_{1}, \ldots , Z_{50}$ are simulated from a centered multivariate Gaussian distribution with covariance between $Z_{i}$ and $Z_{j}$ being $0.5^{|i-j|}$. The first 47 latent variables are encoded in $\{0,\ldots ,9\}$ according to their inverse cdf values as done in [33]. The last three variables are encoded in $\{0,\ldots ,999\}$. Each latent variable corresponds to one segment and contributes $L$ columns in the design matrix with each column $j$ containing values of the indicator function $I(Z_{i}=j)$. $L$ is the size of the encoding set for $Z_{i}$. The responses are a linear combination of a sparse selection of the segments plus a Gaussian noise. We simulate 5,000 observations.

1.2 A.2 yl4L

110 latent variables are simulated in the same way as the third data set in [33]. The first 50 variables contribute 3 columns each in the design matrix $A$ with the $i$-th column among the three containing the $i$-th power of the variable. The next 50 variables are encoded in a set of 3, and the final 10 variables are encoded in a set of 50, similar to yl1L. In addition, 4 groups of 1,000 Gaussian random numbers are also added to $A$. The responses are constructed in a similar way as in yl1L. 2,000 observation are simulated.

1.3 A.3 mgb2L

5,001 variables are simulated as in yl1L without categorization. They are then divided into six groups, with first containing one variable and the rest containing 1,000 each. The responses are constructed in a similar way as in yl1L. We collect 2,000 observations.

1.4 A.4 ljyL

We simulate 15 groups independent standard Gaussian random variables. The first five groups are of a size 5 each, and the last 10 groups contain 500 variables each. The responses are constructed in a similar way as in yl1L, and we simulate 2,000 observations.

1.5 A.5 glassoL1, glassoL2

The design matrix $A$ is drawn from the standard Gaussian distribution. The length of each group is sampled from the uniform distribution $U(10,50)$ with probability 0.9 and from the uniform distribution $U(50,300)$ with probability 0.1. We continue to generate the groups until the specified number of features $m$ is reached. The ground truth feature coefficients $x^*$ is generated from $\mathcal N (2,4)$ with approximately $5~\%$ non-zero values, and we assume a $\mathcal N (0.5,0.25)$ noise.

Appendix B: MMV scalability test sets

The data sets in Table 5 are generated in the same say with different attributes. Both the design matrix $A$ and the non-zeros rows in the ground truth $X_0$ are drawn from the standard Gaussian distribution. The indices of the non-zero rows in $X_0$ are uniformly sampled. The measurement matrix $B$ is obtained by $B = AX_0$. The data attributes, such as the number of measurements and the sparsity level of the ground truth, are set to ensure that the exact solution is recoverable by the MMV model.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Qin, Z., Scheinberg, K. & Goldfarb, D. Efficient block-coordinate descent algorithms for the Group Lasso. Math. Prog. Comp. 5, 143–169 (2013). https://doi.org/10.1007/s12532-013-0051-x

Download citation

Received: 15 May 2011
Accepted: 15 February 2013
Published: 31 March 2013
Issue Date: June 2013
DOI: https://doi.org/10.1007/s12532-013-0051-x

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient block-coordinate descent algorithms for the Group Lasso

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

LARS-type algorithm for group lasso

Group linear algorithm with sparse principal decomposition: a variable selection and clustering method for generalized linear models

Newton-Type Methods with the Proximal Gradient Step for Sparse Estimation

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Appendix A: Simulated Group Lasso data sets

1.1 A.1 yl1L

1.2 A.2 yl4L

1.3 A.3 mgb2L

1.4 A.4 ljyL

1.5 A.5 glassoL1, glassoL2

Appendix B: MMV scalability test sets

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Subscribe and save

Buy Now