Abstract
Designing algorithms for an optimization model often amounts to maintaining a balance between the degree of information to request from the model on the one hand, and the computational speed to expect on the other hand. Naturally, the more information is available, the faster one can expect the algorithm to converge. The popular algorithm of ADMM demands that objective function is easy to optimize once the coupled constraints are shifted to the objective with multipliers. However, in many applications this assumption does not hold; instead, often only some noisy estimations of the gradient of the objective—or even only the objective itself—are available. This paper aims to bridge this gap. We present a suite of variants of the ADMM, where the trade-offs between the required information on the objective and the computational complexity are explicitly given. The new variants allow the method to be applicable on a much broader class of problems where only noisy estimations of the gradient or the function values are accessible, yet the flexibility is achieved without sacrificing the computational complexity bounds.
Similar content being viewed by others
References
Bertsekas, D.P.: Nonlinear Programming. Athena Scientific, Belmont (1999)
Boley, D.: Local linear convergence of the alternating direction method of multipliers on quadratic or linear programs. SIAM J. Optim. 23(4), 2183–2207 (2013)
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends® Mach. Learn. 3(1), 1–122 (2011)
Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imag. Vis. 40(1), 120–145 (2011)
Chen, C., He, B., Ye, Y., Yuan, X.: The direct extension of admm for multi-block convex minimization problems is not necessarily convergent. Math. Prog. 155(1–2), 57–79 (2016)
DAspremont, A., Banerjee, O., Ghaoui, L.E.: First-order methods for sparse covariance selection. SIAM J. Matrix Anal. Appl. 30(1), 56–66 (2008)
Davis, D., Yin, W.: Convergence rate analysis of several splitting schemes. arXiv preprint arXiv:1406.4834 (2014)
Deng, W., Lai, M.-J., Peng, Z., Yin, W.: Parallel multi-block admm with o (1/k) convergence. arXiv preprint arXiv:1312.3040 (2013)
Deng, W., Yin, W.: On the global and linear convergence of the generalized alternating direction method of multipliers. J. Sci. Comput. 66(3), 889–916 (2015)
Douglas, J., Rachford, H.: On the numerical solution of heat conduction problems in two and three space variables. Trans. Am. Math. Soc. 82(2), 421–439 (1956)
Drori, Y., Sabach, S., Teboulle, M.: A simple algorithm for a class of nonsmooth convex-concave saddle-point problems. Oper. Res. Lett. 43(2), 209–214 (2015)
Eckstein, J., Bertsekas, D.: On the Douglas–Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Progr. 55(1–3), 293–318 (1992)
Ermoliev, Y.: Stochastic quasigradient methods and their application to system optimization. Int. J. Probab. Stoch. Process. 9(1–2), 1–36 (1983)
Gaivoronskii, A.: Nonstationary stochastic programming problems. Cybern. Syst. Anal. 14(4), 575–579 (1978)
Ghadimi, S., Lan, G.: Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization I: A generic algorithmic framework. SIAM J. Optim. 22(4), 1469–1492 (2012)
Ghadimi, S., Lan, G.: Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization, II: shrinking procedures and optimal algorithms. SIAM J. Optim. 23(4), 2061–2089 (2013)
Ghadimi, S., Lan, G.: Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM J. Optim. 23(4), 2341–2368 (2013)
Ghadimi, S., Lan, G.: Accelerated gradient methods for nonconvex nonlinear and stochastic programming. Math. Progr. 156(1), 59–99 (2015)
Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Math. Progr. 155(1), 267–305 (2014)
Glowinski, R., Le Tallec, P.: Augmented Lagrangian and operator-splitting methods in nonlinear mechanics. SIAM, Philadelphia (1989)
Glowinski, R., Marroco, A.: Sur l’approximation, par éléments finis d’ordre un, et la résolution, par pénalisation-dualité d’une classe de problèmes de dirichlet non linéaires. Revue française d’automatique, informatique, recherche opérationnelle. Analyse numérique, 9(2), 41–76 (1975)
Han, D., Yuan, X.: Local linear convergence of the alternating direction method of multipliers for quadratic programs. SIAM J. Numer. Anal. 51(6), 3446–3457 (2013)
He, B., Hou, L., Yuan, X.: On full jacobian decomposition of the augmented lagrangian method for separable convex programming. SIAM J. Optim. 25(4), 2274–2312 (2015)
He, B., Liao, L.-Z., Han, D., Yang, H.: A new inexact alternating directions method for monotone variational inequalities. Math. Progr. 92(1), 103–118 (2002)
He, B., Tao, M., Yuan, X.: Convergence rate and iteration complexity on the alternating direction method of multipliers with a substitution procedure for separable convex programming. Math. Oper. Res. 42(3), 662–691 (2017)
He, B., Xu, H.-K., Yuan, X.: On the proximal jacobian decomposition of alm for multiple-block separable convex minimization problems and its relationship to admm. J. Sci. Comput. 66(3), 1204–1217 (2016)
He, B., Yuan, X.: On the \(O(1/n)\) convergence rate of the Douglas–Rachford alternating direction method. SIAM J. Numer. Anal. 50(2), 700–709 (2012)
He, B., Yuan, X.: On non-ergodic convergence rate of douglas-rachford alternating direction method of multipliers. Numerische Mathematik 130(3), 567–577 (2015)
Hong, M., Luo, Z.: On the linear convergence of the alternating direction method of multipliers. arXiv preprint arXiv:1208.3922 (2012)
Lan, G.: An optimal method for stochastic composite optimization. Math. Progr. 133(1–2), 365–397 (2012)
Lin, T., Ma, S., Zhang, S.: An extragradient-based alternating direction method for convex minimization. Found. Comput. Math. 17(1), 35–59 (2017)
Lin, T., Ma, S., Zhang, S.: On the global linear convergence of the ADMM with multi-block variables. SIAM J. Optim. 25(3), 1478–1497 (2015)
Lin, T., Ma, S., Zhang, S.: On the sublinear convergence rate of multi-block admm. J. Oper. Res. Soc. China 3(3), 251–274 (2015)
Liu, J., Chen, J., Ye, J.: Large-scale sparse logistic regression. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, pp. 547–556. ACM (2009)
Monteiro, R.D., Svaiter, B.F.: Iteration-complexity of block-decomposition algorithms and the alternating direction method of multipliers. SIAM J. Optim. 23(1), 475–507 (2013)
Nemirovski, A.: Prox-method with rate of convergence o(1/t) for variational inequalities with lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM J. Optim. 15(1), 229–251 (2005)
Nemirovski, A., Juditsky, A., Lan, G., Shapiro, A.: Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19(4), 1574–1609 (2009)
Nemirovski, A., Yudin, D.: Problem Complexity and Method Efficiency in Optimization. Wiley, Hoboken (1983)
Nesterov, Y., Spokoiny, V.: Random gradient-free minimization of convex functions. Found. Comput. Math. 17(2), 527–566 (2017)
Ng, M.K., Wang, F., Yuan, X.: Inexact alternating direction methods for image recovery. SIAM J. Sci. Comput. 33(4), 1643–1668 (2011)
Ouyang, H., He, N., Tran, L., Gray, A.: Stochastic alternating direction method of multipliers. In: Proceedings of the 30th International Conference on Machine Learning, pp. 80–88 (2013)
Polyak, B.: New stochastic approximation type procedures. Automat. i Telemekh 7(98–107), 2 (1990)
Polyak, B., Juditsky, A.: Acceleration of stochastic approximation by averaging. SIAM J. Control Optim. 30(4), 838–855 (1992)
Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22(3), 400–407 (1951)
Ruszczyński, A., Syski, W.: A method of aggregate stochastic subgradients with on-line stepsize rules for convex stochastic programming problems. In: Stochastic Programming 84 Part II, pp. 113–131. Springer (1986)
Sacks, J.: Asymptotic distribution of stochastic approximation procedures. Ann. Math. Stat. 29(2), 373–405 (1958)
Scheinberg, K., Ma, S., Goldfarb, D.: Sparse inverse covariance selection via alternating linearization methods. In: Advances in neural information processing systems, pp. 2101–2109 (2010)
Shalev-Shwartz, S.: Online learning and online convex optimization. Found. Trend® Mach. Learn. 4(2), 107–194 (2011)
Shefi, R., Teboulle, M.: Rate of convergence analysis of decomposition methods based on the proximal method of multipliers for convex minimization. SIAM J. Optim. 24(1), 269–297 (2014)
Suzuki, T.: Dual averaging and proximal gradient descent for online alternating direction multiplier method. In: Proceedings of the 30th International Conference on Machine Learning, pp. 392–400 (2013)
Suzuki, T.: Stochastic dual coordinate ascent with alternating direction method of multipliers. In: Proceedings of the 31th International Conference on Machine Learning, pp. 736–744 (2014)
Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., Knight, K.: Sparsity and smoothness via the fused LASSO. J. R. Stat. Soc.: Ser. B 67(1), 91–108 (2005)
Tibshirani, R.J., Taylor, J.: The solution path of the generalized lasso. Ann. Stat. 39(3), 1335–1371 (2011)
Wang, H., Banerjee, A.: Online alternating direction method. In: Proceedings of the 29th International Conference on Machine Learning (2012)
Wang, M., Fang, X., Liu, H.: Stochastic compositional gradient descent: algorithms for minimizing compositions of expected-value functions. Math. Progr. Ser. A 161(1-2), 419–449 (2017)
Zhao, P., Yang, J., Zhang, T., Li, P.: Adaptive stochastic alternating direction method of multipliers. In: Proceedings of the 32nd International Conference on Machine Learning, pp. 69–77 (2015)
Zhong, W., Kwok, J.T.: Fast stochastic alternating direction method of multipliers. In: Proceedings of the 31th International Conference on Machine Learning, pp. 46–54 (2014)
Author information
Authors and Affiliations
Corresponding author
Additional information
Bo Jiang: Research of this author was supported in part by National Natural Science Foundation of China (Grant Nos. 11401364 and 11771269) and Program for Innovative Research Team of Shanghai University of Finance and Economics. Shuzhong Zhang: Research of this author was supported in part by National Science Foundation (Grant CMMI-1462408).
Appendix
Appendix
1.1 Proof of Proposition 2.3
Here we will prove Proposition 2.3. Before proceeding, let us present some technical lemmas without proof.
Lemma 7.1
Suppose function f is smooth and its gradient is Lipschitz continuous, i.e. (2) holds, then we have
Lemma 7.2
Suppose function f is smooth and convex, and its gradient is Lipschitz continuous with the constant L, i.e. (2) holds. Then we have
Furthermore, if f is \(\kappa \)-strongly convex, we have
Lemma 7.1 is also known as the descent lemma which is well known; one can find its proof in e.g. [1]. Lemma 7.2 is similar to Fact 1 in [11] which follows from the (strong) convexity of f and Lemma 7.1.
Proof of (20) in Proposition 2.3
Proof
First, by the optimality condition of the two subproblems in SGADM, we have
and
where \(\partial g(y)\) is a subgradient of g at y. Using \(\tilde{\lambda }^k=\lambda ^k-\gamma (Ax^k+By^{k+1}-b)\) and the definition of \(\tilde{w}^k\) in (14), the above two inequalities are equivalent to
and
Moreover,
Thus
By the convexity of g(y) and (96),
Since \(\delta _{k+1}=G(x^{k},\xi ^{k+1})-\nabla f(x^{k})\), and by (97) we have
which leads to
Using (94), the above further leads to
Furthermore,
Substituting (101) in (100), and dividing both sides by \(\alpha _k\), we get
Finally, (20) follows by summing (102), (99), and (98). \(\square \)
Now we show the second statement in Proposition 2.3.
Proof of (21) in Proposition 2.3
Proof
First, by (15), we have \(P(w^k-\tilde{w}^k)=(w^k-w^{k+1})\), and so
Applying the identity
to the term \((w-\tilde{w}^k)^{\top }M_k(w^k-w^{k+1})\), we obtain
Using (15) again, we have
Note that \(Q_k=M_kP\) and the definition of those matrices (see (13)), we have
As a result,
Combining (105), (104), and (103), the desired inequality (21) follows. \(\square \)
1.2 Proof of Proposition 3.1
We first show the first part of Proposition 3.1.
Proof of (49) in Proposition 3.1
Proof
by the optimality condition of the two subproblems in SGALM, we have
and also
Using \(\tilde{\lambda }^k=\lambda ^k-\gamma (Ax^k+By^{k+1}-b)\) and the definition of \(\tilde{w}^k\), the above two inequalities are equivalent to
and
Also,
Since \(\delta ^f_{k+1}=S_f(x^{k},\xi ^{k+1})-\nabla f(x^{k})\) and using (107), similar to (100) and (101) we have
Similarly, since \(\delta ^g_{k+1}=S_g(y^{k},\zeta ^{k+1})-\nabla g(y^{k})\) and using (106), we also have
Finally, (49) follows by summing (110), (109), and (108). \(\square \)
Notice that \(\hat{Q}_k=\hat{M}_kP\) and
Inequality (50) in Proposition 3.1 follows similarly as the derivation of (21) in Proposition 2.3.
1.3 Properties of the Smoothing Function
In this subsection, we will prove Lemma 4.1. Before that, we need some technical preparations which are summarized in the following lemma.
Lemma 7.3
Let \(\alpha (n)\) be the volume of the unit ball in \(\mathbf {R}^n\), and \(\beta (n)\) be the surface area of the unit sphere in \(\mathbf {R}^n\). We also denote B, and \(S_p\), to be the unit ball and unit sphere respectively.
-
(a)
If \(M_p\) is defined as \(M_p=\frac{1}{\alpha (n)}\int _{v \in B}\Vert v\Vert ^pdv\), we have
$$\begin{aligned} M_p=\frac{n}{n+p}. \end{aligned}$$(111) -
(b)
Let I be the identity matrix in \(\mathbf {R}^{n\times n}\), then
$$\begin{aligned} \int _{S_p}vv^{\top }dv=\frac{\beta (n)}{n}I. \end{aligned}$$(112)
Proof
For (a), we can directly compute \(M_p\) by using the polar coordinates,
For (b), Let \(V=vv^{\top }\), then we know that \(V_{ij}=v_i v_j\). Therefore, if \(i\ne j\), by the symmetry of the unit sphere \(S_p\) (i.e. if \(v\in S_p,v=(v_1,v_2,\ldots ,v_n)\), then \(w\in S_p\) for all \(w=(\pm v_1,\pm v_2,\ldots ,\pm v_n)\)), we have
Thus, we obtain \(\int _{S_p}V_{ij}dv=0\).
If \(i=j\), we know that \(V_{ii}=v_i^2\). Since we already know that
Then, by symmetry, we have
Thus we also have \(\int _{S_p}V_{ii}^2dv=\frac{\beta (n)}{n}\), for \(i=1,2,\ldots ,n\). Therefore, \(\int _{S_p}vv^{\top }dv=\frac{\beta (n)}{n}I\). \(\square \)
By the next three propositions, the part (b) of Lemma 4.1 is shown; for part (a) and (c) of Lemma 4.1, the proof can be found in [48].
Proposition 7.4
If \(f\in C^1_L(\mathbf {R}^n)\), then
Proof
Since \(f\in C^1_L(\mathbf {R}^n)\), we have
\(\square \)
Proposition 7.5
If \(f\in C^1_L(\mathbf {R}^n)\), then
Proof
\(\square \)
Proposition 7.6
If \(f\in C^1_L(\mathbf {R}^n)\), and the \({\mathcal {SZO}}\) defined as \(g_\mu (x)=\frac{n}{\mu }[f(x+\mu v)-f(x)]v\), then we have
Proof
\(\square \)
Rights and permissions
About this article
Cite this article
Gao, X., Jiang, B. & Zhang, S. On the Information-Adaptive Variants of the ADMM: An Iteration Complexity Perspective. J Sci Comput 76, 327–363 (2018). https://doi.org/10.1007/s10915-017-0621-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10915-017-0621-6
Keywords
- Alternating direction method of multipliers (ADMM)
- Iteration complexity
- Stochastic approximation
- First-order method
- Direct method