Abstract
This paper proposes and analyzes an inexact variant of the proximal generalized alternating direction method of multipliers (ADMM) for solving separable linearly constrained convex optimization problems. In this variant, the first subproblem is approximately solved using a relative error condition whereas the second one is assumed to be easy to solve. In many ADMM applications, one of the subproblems has a closed-form solution; for instance, \(\ell _1\) regularized convex composite optimization problems. The proposed method possesses iteration-complexity bounds similar to its exact version. More specifically, it is shown that, for a given tolerance \(\rho >0\), an approximate solution of the Lagrangian system associated to the problem under consideration is obtained in at most \(\mathcal {O}(1/\rho ^2)\) (resp. \(\mathcal {O}(1/\rho )\) in the ergodic case) iterations. Numerical experiments are presented to illustrate the performance of the proposed scheme.
Similar content being viewed by others
References
Adona, V.A., Gonçalves, M.L.N., Melo, J.G.: Iteration-complexity analysis of a generalized alternating direction method of multipliers. J. Glob. Optim. 73(2), 331–348 (2019)
Adona, V.A., Gonçalves, M.L.N., Melo, J.G.: A partially inexact proximal alternating direction method of multipliers and its iteration-complexity analysis. J. Optim. Theory Appl. 182(2), 640–666 (2019)
Alon, U., Barkai, N., Notterman, D.A., Gish, K., Ybarra, S., Mack, D., Levine, A.J.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. U. S. A. 96(12), 6745–6750 (1999)
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
Beer, D.G., Kardia, S.L.R., Huang, C., Giordano, T.J., Levin, A.M., Misek, D.E., Lin, L., Chen, G., Gharib, T.G., Thomas, D.G., et al.: Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nat. Med. 8(8), 816 (2002)
Bertsekas, D.P.: Constrained optimization and Lagrange multiplier methods. Academic Press, New York (1982)
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)
Bredies, K., Sun, H.: A proximal point analysis of the preconditioned alternating direction method of multipliers. J. Optim. Theory Appl. 173(3), 878–907 (2017)
Cano, A., Masegosa, A., Moral, S.: ELVIRA biomedical data set repository. http://leo.ugr.es/elvira/DBCRepository/ (2005). Accessed on 7 Jan 2018
Dheeru, D., Taniskidou, E.K.: UCI machine learning repository. http://archive.ics.uci.edu/ml/datasets/madelon (2018). Accessed on 7 Jan 2018
Eckstein, J., Bertsekas, D.P.: On the Douglas–Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Progr. 55(3, Ser. A), 293–318 (1992)
Eckstein, J., Silva, P.J.S.: A practical relative error criterion for augmented Lagrangians. Math. Progr. 141(1), 319–348 (2013)
Eckstein, J., Yao, W.: Approximate ADMM algorithms derived from Lagrangian splitting. Comput. Optim. Appl. 68(2), 363–405 (2017)
Eckstein, J., Yao, W.: Relative-error approximate versions of Douglas-Rachford splitting and special cases of the ADMM. Math. Progr. 170(2), 417–444 (2018)
Fang, E.X., Bingsheng, H., Liu, H., Xiaoming, Y.: Generalized alternating direction method of multipliers: new theoretical insights and applications. Math. Progr. Comput. 7(2), 149–187 (2015)
Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comput. Math. Appl. 2, 17–40 (1976)
Glowinski, R., Marroco, A.: Sur l’approximation, par éléments finis d’ordre un, et la résolution, par penalisation-dualité, d’une classe de problèmes de Dirichlet non linéaires. R.A.I.R.O. 9(R2), 41–76 (1975)
Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)
Gonçalves, M.L.N., Alves, M.M., Melo, J.G.: Pointwise and ergodic convergence rates of a variable metric proximal alternating direction method of multipliers. J. Optim. Theory Appl. 177(2), 448–478 (2018)
Gonçalves, M.L.N., Melo, J.G., Monteiro, R.D.C.: On the iteration-complexity of a non-Euclidean hybrid proximal extragradient framework and of a proximal ADMM. Optimization (2019). https://doi.org/10.1080/02331934.2019.1652297
Guyon, I., Gunn, S., Hur, A.B., Dror, G.: Result analysis of the NIPS 2003 feature selection challenge. In: Saul, L.K., Weiss, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems, pp. 545–552. MIT Press, Cambridge (2005)
Koh, K., Kim, S.J., Boyd, S.: An interior-point method for large-scale \(l_{1}\)-regularized logistic regression. J. Mach. Learn. Res. 8, 1519–1555 (2007)
Nishihara, R., Lessard, L., Recht, B., Packard, A., Jordan, M.I.: A general analysis of the convergence of ADMM. arXiv preprint arXiv:1502.02009 (2015)
Nocedal, J., Wright, S.J.: Numerical Optimization, 2nd edn. Springer, New York (2006)
Pomeroy, S.L., Tamayo, P., Gaasenbeek, M., Sturla, L.M., Angelo, M., McLaughlin, M.E., Kim, J.Y.H., Goumnerova, L.C., Black, P.M., Lau, C., et al.: Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415(6870), 436–442 (2002)
Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1970)
Shipp, M.A., Ross, K.N., Tamayo, P., Weng, A.P., Kutok, J.L., Aguiar, R.C.T., Gaasenbeek, M., Angelo, M., Reich, M., Pinkus, G.S., et al.: Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat. Med. 8(1), 68–74 (2002)
Singh, D., Febbo, P.G., Ross, K., Jackson, D.G., Manola, J., Ladd, C., Tamayo, P., Renshaw, A.A., D’Amico, A.V., Richie, J.P., et al.: Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2), 203–209 (2002)
Solodov, M.V., Svaiter, B.F.: A hybrid approximate extragradient-proximal point algorithm using the enlargement of a maximal monotone operator. Set-Valued Anal. 7(4), 323–345 (1999)
Solodov, M.V., Svaiter, B.F.: A hybrid projection-proximal point algorithm. J. Convex Anal. 6(1), 59–70 (1999)
Tibshirani, R.: Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Ser. B 58(1), 267–288 (1996)
Tibshirani, R.J.: The Lasso problem and uniqueness. Electron. J. Stat. 7, 1456–1490 (2013)
Xie, J., Liao, A., Yang, X.: An inexact alternating direction method of multipliers with relative error criteria. Optim. Lett. 11(3), 583–596 (2017)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The work of these authors was supported in part by CAPES, FAPEG/GO, and CNPq Grants 302666/2017-6, 312559/2019-4 and 408123/2018-4.
Appendix A: Proof of Proposition 1
Appendix A: Proof of Proposition 1
The proof of Proposition 1 is divided into the following steps. First we prove the inclusion in Proposition 1(a), and then we establish two technical lemmas which will be used to prove the inequality in Propositions1(a) and 1(b).
First, note that the definitions of \(\tilde{\gamma }_{k}\) and \(\gamma _{k}\) given in (12) and (14), respectively, imply that
Proof of the inclusion in Proposition 1(a): From the inclusion in (11) and the first relation in (14), we have
Now, the first-order optimality condition for (13) and the definition of \(\gamma _k\) in (14) imply that
On the other hand, it follows from (44) that
which, combined with (46), yields
From the second equality in (14), we obtain
Therefore, the inclusion in Proposition 1(a) now follows by combining (45), (47), (48) and the definitions of M and T in (16).
In order to prove the remaining statements of Proposition 1, we need to establish two technical results. Note first that the relation in (44) implies that
For simplicity, we also consider the following symmetric matrices
It is easy to verify that S, N and P are positive semidefinite for every \(\beta >0\) and \(\alpha \in (0,2)\).
Lemma 1
Let \(\{z_k\}\) and \(\{{{\tilde{z}}}_k\}\) be as in (18). Then, for every \(k\ge 1\), the following hold:
and
where the matrices M, N and P are as in (16) and (50).
Proof
Using the fact that \({\tilde{z}}_{k}-z_{k-1}=({\tilde{x}}_{k}-x_{k-1},y_{k}-y_{k-1},\tilde{\gamma }_{k}-\gamma _{k-1})\) and the definition of M in (16), we obtain
On the other hand, equality (44) implies that
and
Combining the last three equalities, we find
Thus, (51) follows from the last equality and the definition of N in (50).
Let us now prove (52). Using \({\tilde{z}}_{k}-z_{k}=({\tilde{x}}_{k}-x_{k},0,\tilde{\gamma }_{k}-\gamma _{k})\) [see (18)] and the definition of M in (16), we have
It follows from (44) and some algebraic manipulations that
Therefore, the desired equality now follows by combining the last two equalities and the definition of P in (50). \(\square\)
Lemma 2
Let \(\{(x_k,y_k,\gamma _k)\}\) be generated by Algorithm 1. Then, the following hold:
- (a):
-
\(2\langle {B(y_1-y_{0})},{\gamma _1-\gamma _{0}}\rangle \ge \Vert y_1-y_{0}\Vert _{H}^2 - 4d_0^2\), where \(d_0\) is as in (19);
- (b):
-
\(2\langle B(y_k-y_{k-1}),\gamma _k-\gamma _{k-1} \rangle \ge \Vert y_k-y_{k-1}\Vert _{H}^2-\Vert y_{k-1}-y_{k-2}\Vert _{H}^2\), for every\(k\ge 2\).
Proof
(a) Consider \(z_0,z_1\) and \({{\tilde{z}}}_1\) as in (18), and let an arbitrary \(z^{*}:=(x^*,y^*,\gamma ^*)\in \varOmega ^{*}\) (see Assumpiton 1). Note that, in view of the definition of \(d_0\) in (19), in order to establish (a), it is sufficient to prove that
where M is as in (16). Let us then show (53). From the definitions of M and \(\{z_k\}\), we have
Hence, we obtain
where the last inequality is due to \(\Vert z+z^{\prime }\Vert _{M}^2\le 2\left( \Vert z\Vert _{M}^2+\Vert z^{\prime }\Vert _{M}^2\right)\) for all \(z, z^{\prime }\). We will now prove that
Since we have already proved that the inclusion in Proposition 1(a) holds, we have \(M(z_0-z_1) \in T(\tilde{z}_1)\) where M and T are as in (16). Thus, using that \(0 \in T(z^*)\) and T is monotone, we obtain \(\langle M(z_0-z_1),z^*- {\tilde{z}}_1 \rangle \le 0\). Hence,
Using (52), the inequality in (11), and the first equality in (14) (all with \(k=1\)), we have
where P is as in (50). Now, (51) with \(k=1\) becomes
where N is as in (50). Combining the last three inequalities and the fact that \(\tau _{2}<1\) (see Algorithm 1), we find
where the last equality is due to the fact that \(P-N=-\alpha (2-\alpha )S\), with S given in (49). The last inequality, (49) with \(k=1\) and the fact that \(\alpha \in (0,2-\tau _{1})\) yield
which implies that (55) holds. Therefore, (a) now follows by combining (54) and (55).
(b) From the first-order optimality condition for (13) and the second relation in (14), we obtain
Hence, for every \(k\ge 2\), using the above inclusion with \(j \leftarrow k\) and \(j \leftarrow k-1\) and the monotonicity of \(\partial g\) , we have
where the last inequality is due to the fact that \(2\left\langle Hy,y^{\prime }\right\rangle \le \Vert y\Vert _{H}^{2}+\Vert y^{\prime }\Vert _{H}^{2}\) for all \(y, y^{\prime }\). Therefore, (b) follows trivially from the last inequality. \(\square\)
We are now ready to prove the remaining statements of Proposition 1.
Proof of the inequality in Proposition 1(a) Using (52) and the first relation in (14), we have
where the inequality is due to the second condition in (11). It follows from the last inequality, (51) and the fact that \({\sigma }\ge \tau _{2}\) [see (20)] that
where
We will show that \(a_{k}\ge \eta _{k}-\eta _{k-1}\), where the sequence \(\{\eta _{k}\}\) is defined in (21). From (49), we find
which, combined with definition of \(a_k\), yields
Hence, using the definitions of N, S and P in (49) and (50), we obtain
where
Now, from the definition of \(\sigma\) given in (20), we obtain \({\sigma }\ge (1+\alpha \tau _{1})/(1+\alpha (2-\alpha ))\). Hence, \({\hat{\xi }}\ge 0\) and
where the last inequality is due to the fact that \(\alpha \in (0,2-\tau _{1})\). Moreover, since \(\sigma \in (0,1)\) (see (20)), we find
Thus, \(\bar{\xi } >{\hat{\xi }}\ge 0\), and \(\tilde{\xi }\ge 0\). Hence, from (58), Lemma 2 and the fact that \(\bar{\xi }=\alpha ^{3}\xi\) [see (20) and (59)], it follows that
which, combined with the definitions of \(\{\eta _{k}\}\) in (21) yields \(a_{k}\ge \eta _{k}-\eta _{k-1}\) for every \(k\ge 1\). Hence, the desired inequality now follows from (57).
Proof of Proposition 1(b) First, for every \(z^*=(x^*,y^*,\gamma ^*)\in \varOmega ^*\), we have
Now, since \(M(z_{k-1}-z_k)\in T(\tilde{z}_k)\) (see (22)), \(0 \in T(z^*)\), and T is monotone, we trivially obtain \(\langle M(z_{k-1}-z_k),{\tilde{z}}_k-z^* \rangle \ge 0\). Therefore, combining the last two inequalities and (22), we obtain
which is equivalent to the desired inequality.
Rights and permissions
About this article
Cite this article
Adona, V.A., Gonçalves, M.L.N. & Melo, J.G. An inexact proximal generalized alternating direction method of multipliers. Comput Optim Appl 76, 621–647 (2020). https://doi.org/10.1007/s10589-020-00191-1
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10589-020-00191-1
Keywords
- Generalized alternating direction method of multipliers
- Convex program
- Relative error criterion
- Pointwise iteration-complexity
- Ergodic iteration-complexity