Coordinate descent algorithm for covariance graphical lasso

Wang, Hao

doi:10.1007/s11222-013-9385-5

Coordinate descent algorithm for covariance graphical lasso

Published: 23 February 2013

Volume 24, pages 521–529, (2014)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

Hao Wang¹

1145 Accesses
15 Citations
Explore all metrics

Abstract

Bien and Tibshirani (Biometrika, 98(4):807–820, 2011) have proposed a covariance graphical lasso method that applies a lasso penalty on the elements of the covariance matrix. This method is definitely useful because it not only produces sparse and positive definite estimates of the covariance matrix but also discovers marginal independence structures by generating exact zeros in the estimated covariance matrix. However, the objective function is not convex, making the optimization challenging. Bien and Tibshirani (Biometrika, 98(4):807–820, 2011) described a majorize-minimize approach to optimize it. We develop a new optimization method based on coordinate descent. We discuss the convergence property of the algorithm. Through simulation experiments, we show that the new algorithm has a number of advantages over the majorize-minimize approach, including its simplicity, computing speed and numerical stability. Finally, we show that the cyclic version of the coordinate descent algorithm is more efficient than the greedy version.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Gauss on least-squares and maximum-likelihood estimation

Article Open access 02 April 2022

Using the dglars Package to Estimate a Sparse Generalized Linear Model

On weighted means and their inequalities

Article Open access 06 April 2021

Notes

An unpublished Ph.D. dissertation Lin (2010) may consider the covariance graphical lasso method earlier than Bien and Tibshirani (2011).

References

Bien, J., Tibshirani, R.J.: Sparse estimation of a covariance matrix. Biometrika 98(4), 807–820 (2011). doi:10.1093/biomet/asr054
Article MATH MathSciNet Google Scholar
Breheny, P., Huang, J.: Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection. Ann. Appl. Stat. 1, 232–253 (2011)
Article MathSciNet Google Scholar
Dempster, A.: Covariance selection. Biometrics 28, 157–175 (1972)
Article Google Scholar
Friedman, J., Hastie, T., Höfling, H., Tibshirani, R.: Pathwise coordinate optimization. Ann. Appl. Stat. 1(2), 302–332 (2007)
Article MATH MathSciNet Google Scholar
Friedman, J., Hastie, T., Tibshirani, R.: Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9(3), 432–441 (2008)
Article MATH Google Scholar
Fu, W.J.: Penalized regressions: the bridge versus the lasso. J. Comput. Graph. Stat. 7(3), 397–416 (1998)
Google Scholar
Hunter, D.R., Lange, K.: A tutorial on MM algorithms. Am. Stat. 58(1) (2004). doi:10.2307/27643496
Lin, N.: A penalized likelihood approach in covariance graphical model selection. Ph.D. Thesis, National University of Singapore (2010)
Sardy, S., Bruce, A.G., Tseng, P.: Block coordinate relaxation methods for nonparametric wavelet denoising. J. Comput. Graph. Stat. 9(2), 361–379 (2000)
MathSciNet Google Scholar
Tseng, P.: Convergence of a block coordinate descent method for nondifferentiable minimization. J. Optim. Theory Appl. 109, 475–494 (2001)
Article MATH MathSciNet Google Scholar
Wu, T.T., Lange, K.: Coordinate descent algorithms for lasso penalized regression. Ann. Appl. Stat. 2(1), 224–244 (2008)
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics, University of South Carolina, Columbia, SC, 29208, USA
Hao Wang

Authors

Hao Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hao Wang.

Appendix: Details of the majorize-minimize MM algorithm in Sect. 4

Consider $\sqrt{\sigma_{ij}^{2}+\epsilon}$ as an approximation to |σ _ij| for a small ϵ>0. Consequently, the original objective function (1) can be approximated by

$$ \operatorname{log}(\det\boldsymbol{\Sigma})+\operatorname {tr}\bigl(\mathbf{S}\boldsymbol{\Sigma}^{-1}\bigr) + 2\rho \sum _{i<j}\sqrt{\sigma_{ij}^2+ \epsilon} + \rho\sum_{i} \sigma_{ii}. $$

(9)

Note the inequality

$$\sqrt{\sigma_{ij}^2+\epsilon}\leq\sqrt{ \bigl(\sigma_{ij}^{(k)}\bigr)^2+\epsilon} + {\sigma_{ij}^2- (\sigma_{ij}^{(k)})^2 \over2 \sqrt{(\sigma_{ij}^{(k)})^2+\epsilon} }, $$

for a fixed $\sigma_{ij}^{(k)}$ and all σ _ij. Then (9) is majorized by

(10)

The minimize-step in MM then minimizes (10) along each column (row) of Σ. Without loss of generality, consider the last column and row. Partition Σ and S as in (2) and consider the same transformation from (σ ₁₂,σ ₂₂) to $(\boldsymbol{\beta}= \boldsymbol{\sigma}_{12},\gamma=\sigma_{22} - \boldsymbol{\sigma}_{12}^{\prime}\boldsymbol{\Sigma}_{11}^{-1} \boldsymbol{\sigma}_{12})$. The four terms in (10) can be written as functions of (β,γ)

where c ₁, c ₂ and c ₃ are constants not involving (β,γ). Dropping off c ₁,c ₂ and c ₃ from (10), we only have to minimize

(11)

For γ, it is easy to derive from (11) that the conditional minimum point given β is the same as in (5). For β, (11) can be written as a function of β,

$$Q\bigl(\boldsymbol{\beta}\mid\gamma, \varSigma^{(k)}\bigr) = \boldsymbol{\beta }^\prime\bigl(\mathbf{V}+\rho\mathbf{D}^{-1} \bigr) \boldsymbol{\beta}-2 \mathbf{u}^\prime\boldsymbol{\beta}, $$

where V and u are defined in (6). This implies that the conditional minimum point of β is β=(V+ρ D ⁻¹)⁻¹ u. Cycling through every column always drives down the approximated objective function (9). In our implementation, the value of ϵ is chosen as follows. The approximation error of (9) to (1) is

Note that the algorithm stops when the change of the objective function is less than 10⁻³. We choose ϵ such that $\rho p(p-1) \sqrt{\epsilon}=0.001$, i.e., ϵ=(0.001/(ρ(p−1)p))², to ensure that the choice of ϵ has no more influence on the estimated Σ than the stopping rule.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, H. Coordinate descent algorithm for covariance graphical lasso. Stat Comput 24, 521–529 (2014). https://doi.org/10.1007/s11222-013-9385-5

Download citation

Received: 11 July 2012
Accepted: 12 February 2013
Published: 23 February 2013
Issue Date: July 2014
DOI: https://doi.org/10.1007/s11222-013-9385-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Coordinate descent algorithm for covariance graphical lasso

Abstract

Access this article

Similar content being viewed by others

Gauss on least-squares and maximum-likelihood estimation