Abstract
Bien and Tibshirani (Biometrika, 98(4):807–820, 2011) have proposed a covariance graphical lasso method that applies a lasso penalty on the elements of the covariance matrix. This method is definitely useful because it not only produces sparse and positive definite estimates of the covariance matrix but also discovers marginal independence structures by generating exact zeros in the estimated covariance matrix. However, the objective function is not convex, making the optimization challenging. Bien and Tibshirani (Biometrika, 98(4):807–820, 2011) described a majorize-minimize approach to optimize it. We develop a new optimization method based on coordinate descent. We discuss the convergence property of the algorithm. Through simulation experiments, we show that the new algorithm has a number of advantages over the majorize-minimize approach, including its simplicity, computing speed and numerical stability. Finally, we show that the cyclic version of the coordinate descent algorithm is more efficient than the greedy version.
Similar content being viewed by others
References
Bien, J., Tibshirani, R.J.: Sparse estimation of a covariance matrix. Biometrika 98(4), 807–820 (2011). doi:10.1093/biomet/asr054
Breheny, P., Huang, J.: Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection. Ann. Appl. Stat. 1, 232–253 (2011)
Dempster, A.: Covariance selection. Biometrics 28, 157–175 (1972)
Friedman, J., Hastie, T., Höfling, H., Tibshirani, R.: Pathwise coordinate optimization. Ann. Appl. Stat. 1(2), 302–332 (2007)
Friedman, J., Hastie, T., Tibshirani, R.: Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9(3), 432–441 (2008)
Fu, W.J.: Penalized regressions: the bridge versus the lasso. J. Comput. Graph. Stat. 7(3), 397–416 (1998)
Hunter, D.R., Lange, K.: A tutorial on MM algorithms. Am. Stat. 58(1) (2004). doi:10.2307/27643496
Lin, N.: A penalized likelihood approach in covariance graphical model selection. Ph.D. Thesis, National University of Singapore (2010)
Sardy, S., Bruce, A.G., Tseng, P.: Block coordinate relaxation methods for nonparametric wavelet denoising. J. Comput. Graph. Stat. 9(2), 361–379 (2000)
Tseng, P.: Convergence of a block coordinate descent method for nondifferentiable minimization. J. Optim. Theory Appl. 109, 475–494 (2001)
Wu, T.T., Lange, K.: Coordinate descent algorithms for lasso penalized regression. Ann. Appl. Stat. 2(1), 224–244 (2008)
Author information
Authors and Affiliations
Corresponding author
Appendix: Details of the majorize-minimize MM algorithm in Sect. 4
Appendix: Details of the majorize-minimize MM algorithm in Sect. 4
Consider \(\sqrt{\sigma_{ij}^{2}+\epsilon}\) as an approximation to |σ ij | for a small ϵ>0. Consequently, the original objective function (1) can be approximated by
Note the inequality
for a fixed \(\sigma_{ij}^{(k)}\) and all σ ij . Then (9) is majorized by
The minimize-step in MM then minimizes (10) along each column (row) of Σ. Without loss of generality, consider the last column and row. Partition Σ and S as in (2) and consider the same transformation from (σ 12,σ 22) to \((\boldsymbol{\beta}= \boldsymbol{\sigma}_{12},\gamma=\sigma_{22} - \boldsymbol{\sigma}_{12}^{\prime}\boldsymbol{\Sigma}_{11}^{-1} \boldsymbol{\sigma}_{12})\). The four terms in (10) can be written as functions of (β,γ)
where c 1, c 2 and c 3 are constants not involving (β,γ). Dropping off c 1,c 2 and c 3 from (10), we only have to minimize
For γ, it is easy to derive from (11) that the conditional minimum point given β is the same as in (5). For β, (11) can be written as a function of β,
where V and u are defined in (6). This implies that the conditional minimum point of β is β=(V+ρ D −1)−1 u. Cycling through every column always drives down the approximated objective function (9). In our implementation, the value of ϵ is chosen as follows. The approximation error of (9) to (1) is
Note that the algorithm stops when the change of the objective function is less than 10−3. We choose ϵ such that \(\rho p(p-1) \sqrt{\epsilon}=0.001\), i.e., ϵ=(0.001/(ρ(p−1)p))2, to ensure that the choice of ϵ has no more influence on the estimated Σ than the stopping rule.
Rights and permissions
About this article
Cite this article
Wang, H. Coordinate descent algorithm for covariance graphical lasso. Stat Comput 24, 521–529 (2014). https://doi.org/10.1007/s11222-013-9385-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-013-9385-5