On the Information-Adaptive Variants of the ADMM: An Iteration Complexity Perspective

Gao, Xiang; Jiang, Bo; Zhang, Shuzhong

doi:10.1007/s10915-017-0621-6

On the Information-Adaptive Variants of the ADMM: An Iteration Complexity Perspective

Published: 06 December 2017

Volume 76, pages 327–363, (2018)
Cite this article

Journal of Scientific Computing Aims and scope Submit manuscript

Xiang Gao¹,
Bo Jiang² &
Shuzhong Zhang¹

1553 Accesses
31 Citations
Explore all metrics

Abstract

Designing algorithms for an optimization model often amounts to maintaining a balance between the degree of information to request from the model on the one hand, and the computational speed to expect on the other hand. Naturally, the more information is available, the faster one can expect the algorithm to converge. The popular algorithm of ADMM demands that objective function is easy to optimize once the coupled constraints are shifted to the objective with multipliers. However, in many applications this assumption does not hold; instead, often only some noisy estimations of the gradient of the objective—or even only the objective itself—are available. This paper aims to bridge this gap. We present a suite of variants of the ADMM, where the trade-offs between the required information on the objective and the computational complexity are explicitly given. The new variants allow the method to be applicable on a much broader class of problems where only noisy estimations of the gradient or the function values are accessible, yet the flexibility is achieved without sacrificing the computational complexity bounds.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Frank-Wolfe Algorithm: A Short Introduction

Article Open access 13 December 2023

An ADMM algorithm for two-stage stochastic programming problems

Article 02 December 2019

A note on the convergence of ADMM for linearly constrained convex optimization problems

Article 01 August 2016

Notes

https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html.

References

Bertsekas, D.P.: Nonlinear Programming. Athena Scientific, Belmont (1999)
MATH Google Scholar
Boley, D.: Local linear convergence of the alternating direction method of multipliers on quadratic or linear programs. SIAM J. Optim. 23(4), 2183–2207 (2013)
Article MathSciNet MATH Google Scholar
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends® Mach. Learn. 3(1), 1–122 (2011)
MATH Google Scholar
Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imag. Vis. 40(1), 120–145 (2011)
Article MathSciNet MATH Google Scholar
Chen, C., He, B., Ye, Y., Yuan, X.: The direct extension of admm for multi-block convex minimization problems is not necessarily convergent. Math. Prog. 155(1–2), 57–79 (2016)
Article MathSciNet MATH Google Scholar
DAspremont, A., Banerjee, O., Ghaoui, L.E.: First-order methods for sparse covariance selection. SIAM J. Matrix Anal. Appl. 30(1), 56–66 (2008)
Article MathSciNet MATH Google Scholar
Davis, D., Yin, W.: Convergence rate analysis of several splitting schemes. arXiv preprint arXiv:1406.4834 (2014)
Deng, W., Lai, M.-J., Peng, Z., Yin, W.: Parallel multi-block admm with o (1/k) convergence. arXiv preprint arXiv:1312.3040 (2013)
Deng, W., Yin, W.: On the global and linear convergence of the generalized alternating direction method of multipliers. J. Sci. Comput. 66(3), 889–916 (2015)
Douglas, J., Rachford, H.: On the numerical solution of heat conduction problems in two and three space variables. Trans. Am. Math. Soc. 82(2), 421–439 (1956)
Article MathSciNet MATH Google Scholar
Drori, Y., Sabach, S., Teboulle, M.: A simple algorithm for a class of nonsmooth convex-concave saddle-point problems. Oper. Res. Lett. 43(2), 209–214 (2015)
Article MathSciNet Google Scholar
Eckstein, J., Bertsekas, D.: On the Douglas–Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Progr. 55(1–3), 293–318 (1992)
Article MathSciNet MATH Google Scholar
Ermoliev, Y.: Stochastic quasigradient methods and their application to system optimization. Int. J. Probab. Stoch. Process. 9(1–2), 1–36 (1983)
MathSciNet MATH Google Scholar
Gaivoronskii, A.: Nonstationary stochastic programming problems. Cybern. Syst. Anal. 14(4), 575–579 (1978)
Article MathSciNet Google Scholar
Ghadimi, S., Lan, G.: Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization I: A generic algorithmic framework. SIAM J. Optim. 22(4), 1469–1492 (2012)
Article MathSciNet MATH Google Scholar
Ghadimi, S., Lan, G.: Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization, II: shrinking procedures and optimal algorithms. SIAM J. Optim. 23(4), 2061–2089 (2013)
Article MathSciNet MATH Google Scholar
Ghadimi, S., Lan, G.: Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM J. Optim. 23(4), 2341–2368 (2013)
Article MathSciNet MATH Google Scholar
Ghadimi, S., Lan, G.: Accelerated gradient methods for nonconvex nonlinear and stochastic programming. Math. Progr. 156(1), 59–99 (2015)
MathSciNet MATH Google Scholar
Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Math. Progr. 155(1), 267–305 (2014)
MathSciNet MATH Google Scholar
Glowinski, R., Le Tallec, P.: Augmented Lagrangian and operator-splitting methods in nonlinear mechanics. SIAM, Philadelphia (1989)
Book MATH Google Scholar
Glowinski, R., Marroco, A.: Sur l’approximation, par éléments finis d’ordre un, et la résolution, par pénalisation-dualité d’une classe de problèmes de dirichlet non linéaires. Revue française d’automatique, informatique, recherche opérationnelle. Analyse numérique, 9(2), 41–76 (1975)
Han, D., Yuan, X.: Local linear convergence of the alternating direction method of multipliers for quadratic programs. SIAM J. Numer. Anal. 51(6), 3446–3457 (2013)
Article MathSciNet MATH Google Scholar
He, B., Hou, L., Yuan, X.: On full jacobian decomposition of the augmented lagrangian method for separable convex programming. SIAM J. Optim. 25(4), 2274–2312 (2015)
Article MathSciNet MATH Google Scholar
He, B., Liao, L.-Z., Han, D., Yang, H.: A new inexact alternating directions method for monotone variational inequalities. Math. Progr. 92(1), 103–118 (2002)
Article MathSciNet MATH Google Scholar
He, B., Tao, M., Yuan, X.: Convergence rate and iteration complexity on the alternating direction method of multipliers with a substitution procedure for separable convex programming. Math. Oper. Res. 42(3), 662–691 (2017)
He, B., Xu, H.-K., Yuan, X.: On the proximal jacobian decomposition of alm for multiple-block separable convex minimization problems and its relationship to admm. J. Sci. Comput. 66(3), 1204–1217 (2016)
Article MathSciNet MATH Google Scholar
He, B., Yuan, X.: On the $O(1/n)$ convergence rate of the Douglas–Rachford alternating direction method. SIAM J. Numer. Anal. 50(2), 700–709 (2012)
Article MathSciNet MATH Google Scholar
He, B., Yuan, X.: On non-ergodic convergence rate of douglas-rachford alternating direction method of multipliers. Numerische Mathematik 130(3), 567–577 (2015)
Article MathSciNet MATH Google Scholar
Hong, M., Luo, Z.: On the linear convergence of the alternating direction method of multipliers. arXiv preprint arXiv:1208.3922 (2012)
Lan, G.: An optimal method for stochastic composite optimization. Math. Progr. 133(1–2), 365–397 (2012)
Article MathSciNet MATH Google Scholar
Lin, T., Ma, S., Zhang, S.: An extragradient-based alternating direction method for convex minimization. Found. Comput. Math. 17(1), 35–59 (2017)
Lin, T., Ma, S., Zhang, S.: On the global linear convergence of the ADMM with multi-block variables. SIAM J. Optim. 25(3), 1478–1497 (2015)
Article MathSciNet MATH Google Scholar
Lin, T., Ma, S., Zhang, S.: On the sublinear convergence rate of multi-block admm. J. Oper. Res. Soc. China 3(3), 251–274 (2015)
Article MathSciNet MATH Google Scholar
Liu, J., Chen, J., Ye, J.: Large-scale sparse logistic regression. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, pp. 547–556. ACM (2009)
Monteiro, R.D., Svaiter, B.F.: Iteration-complexity of block-decomposition algorithms and the alternating direction method of multipliers. SIAM J. Optim. 23(1), 475–507 (2013)
Article MathSciNet MATH Google Scholar
Nemirovski, A.: Prox-method with rate of convergence o(1/t) for variational inequalities with lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM J. Optim. 15(1), 229–251 (2005)
Article MathSciNet MATH Google Scholar
Nemirovski, A., Juditsky, A., Lan, G., Shapiro, A.: Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19(4), 1574–1609 (2009)
Article MathSciNet MATH Google Scholar
Nemirovski, A., Yudin, D.: Problem Complexity and Method Efficiency in Optimization. Wiley, Hoboken (1983)
Google Scholar
Nesterov, Y., Spokoiny, V.: Random gradient-free minimization of convex functions. Found. Comput. Math. 17(2), 527–566 (2017)
Ng, M.K., Wang, F., Yuan, X.: Inexact alternating direction methods for image recovery. SIAM J. Sci. Comput. 33(4), 1643–1668 (2011)
Article MathSciNet MATH Google Scholar
Ouyang, H., He, N., Tran, L., Gray, A.: Stochastic alternating direction method of multipliers. In: Proceedings of the 30th International Conference on Machine Learning, pp. 80–88 (2013)
Polyak, B.: New stochastic approximation type procedures. Automat. i Telemekh 7(98–107), 2 (1990)
Google Scholar
Polyak, B., Juditsky, A.: Acceleration of stochastic approximation by averaging. SIAM J. Control Optim. 30(4), 838–855 (1992)
Article MathSciNet MATH Google Scholar
Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22(3), 400–407 (1951)
Ruszczyński, A., Syski, W.: A method of aggregate stochastic subgradients with on-line stepsize rules for convex stochastic programming problems. In: Stochastic Programming 84 Part II, pp. 113–131. Springer (1986)
Sacks, J.: Asymptotic distribution of stochastic approximation procedures. Ann. Math. Stat. 29(2), 373–405 (1958)
Scheinberg, K., Ma, S., Goldfarb, D.: Sparse inverse covariance selection via alternating linearization methods. In: Advances in neural information processing systems, pp. 2101–2109 (2010)
Shalev-Shwartz, S.: Online learning and online convex optimization. Found. Trend® Mach. Learn. 4(2), 107–194 (2011)
Article MATH Google Scholar
Shefi, R., Teboulle, M.: Rate of convergence analysis of decomposition methods based on the proximal method of multipliers for convex minimization. SIAM J. Optim. 24(1), 269–297 (2014)
Article MathSciNet MATH Google Scholar
Suzuki, T.: Dual averaging and proximal gradient descent for online alternating direction multiplier method. In: Proceedings of the 30th International Conference on Machine Learning, pp. 392–400 (2013)
Suzuki, T.: Stochastic dual coordinate ascent with alternating direction method of multipliers. In: Proceedings of the 31th International Conference on Machine Learning, pp. 736–744 (2014)
Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., Knight, K.: Sparsity and smoothness via the fused LASSO. J. R. Stat. Soc.: Ser. B 67(1), 91–108 (2005)
Article MathSciNet MATH Google Scholar
Tibshirani, R.J., Taylor, J.: The solution path of the generalized lasso. Ann. Stat. 39(3), 1335–1371 (2011)
Article MathSciNet MATH Google Scholar
Wang, H., Banerjee, A.: Online alternating direction method. In: Proceedings of the 29th International Conference on Machine Learning (2012)
Wang, M., Fang, X., Liu, H.: Stochastic compositional gradient descent: algorithms for minimizing compositions of expected-value functions. Math. Progr. Ser. A 161(1-2), 419–449 (2017)
Zhao, P., Yang, J., Zhang, T., Li, P.: Adaptive stochastic alternating direction method of multipliers. In: Proceedings of the 32nd International Conference on Machine Learning, pp. 69–77 (2015)
Zhong, W., Kwok, J.T.: Fast stochastic alternating direction method of multipliers. In: Proceedings of the 31th International Conference on Machine Learning, pp. 46–54 (2014)

Download references

Author information

Authors and Affiliations

Department of Industrial and Systems Engineering, University of Minnesota, Minneapolis, MN, 55455, USA
Xiang Gao & Shuzhong Zhang
Research Institute for Interdisciplinary Sciences, School of Information Management and Engineering, Shanghai University of Finance and Economics, Shanghai, 200433, China
Bo Jiang

Authors

Xiang Gao
View author publications
You can also search for this author in PubMed Google Scholar
Bo Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Shuzhong Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bo Jiang.

Additional information

Bo Jiang: Research of this author was supported in part by National Natural Science Foundation of China (Grant Nos. 11401364 and 11771269) and Program for Innovative Research Team of Shanghai University of Finance and Economics. Shuzhong Zhang: Research of this author was supported in part by National Science Foundation (Grant CMMI-1462408).

Appendix

1.1 Proof of Proposition 2.3

Here we will prove Proposition 2.3. Before proceeding, let us present some technical lemmas without proof.

Lemma 7.1

Suppose function f is smooth and its gradient is Lipschitz continuous, i.e. (2) holds, then we have

$$\begin{aligned} f(x) \le f(y) + \nabla f(y)^{\top }(x-y) + \frac{L}{2}\Vert x - y\Vert ^2. \end{aligned}$$

(93)

Lemma 7.2

Suppose function f is smooth and convex, and its gradient is Lipschitz continuous with the constant L, i.e. (2) holds. Then we have

$$\begin{aligned} (x-y)^{\top }\nabla f(z) \le f(x)-f(y)+\frac{L}{2}\Vert z-y\Vert ^2 . \end{aligned}$$

(94)

Furthermore, if f is $\kappa $-strongly convex, we have

$$\begin{aligned} (x-y)^{\top }\nabla f(z) \le f(x)-f(y)+\frac{L}{2}\Vert z-y\Vert ^2-\frac{\kappa }{2}\Vert x-z\Vert ^2 . \end{aligned}$$

(95)

Lemma 7.1 is also known as the descent lemma which is well known; one can find its proof in e.g. [1]. Lemma 7.2 is similar to Fact 1 in [11] which follows from the (strong) convexity of f and Lemma 7.1.

Proof of (20) in Proposition 2.3

Proof

First, by the optimality condition of the two subproblems in SGADM, we have

$$\begin{aligned}&(y-y^{k+1})^{\top }\left( \partial g(y^{k+1})-B^{\top }\left( \lambda ^k-\gamma (Ax^k+By^{k+1}-b)\right) -H(y^k-y^{k+1})\right) \\&\quad \ge 0, \quad \forall y\in \mathcal {Y}, \end{aligned}$$

and

$$\begin{aligned}&(x-x^{k+1})^{\top }\left( x^{k+1}-\left( x^k-\alpha _k \left( G(x^k,\xi ^{k+1})-A^{\top }(\lambda ^k-\gamma (Ax^k+By^{k+1}-b))\right) \right) \right) \\&\quad \ge 0, \quad \forall x\in \mathcal {X}, \end{aligned}$$

where $\partial g(y)$ is a subgradient of g at y. Using $\tilde{\lambda }^k=\lambda ^k-\gamma (Ax^k+By^{k+1}-b)$ and the definition of $\tilde{w}^k$ in (14), the above two inequalities are equivalent to

$$\begin{aligned} (y-\tilde{y}^{k})^{\top }\left( \partial g(\tilde{y}^{k})-B^{\top }\tilde{\lambda }^k-H(y^k-y^{k+1})\right) \ge 0, \quad \forall y\in \mathcal {Y}, \end{aligned}$$

(96)

and

$$\begin{aligned} (x-\tilde{x}^{k})^{\top }\left( \alpha _k\left( G(x^k,\xi ^{k+1})-A^{\top }\tilde{\lambda }^k\right) -(x^k-\tilde{x}^k)\right) \ge 0, \quad \forall x\in \mathcal {X}. \end{aligned}$$

(97)

Moreover,

$$\begin{aligned} (A\tilde{x}^k+B\tilde{y}^k-b)-\left( -A(x^k-\tilde{x}^k)+\frac{1}{\gamma }\left( \lambda ^k-\tilde{\lambda }^k\right) \right) =0 . \end{aligned}$$

Thus

$$\begin{aligned} (\lambda -\tilde{\lambda }^k)^{\top }(A\tilde{x}^k+B\tilde{y}^k-b) =\left( \lambda -\tilde{\lambda }^k\right) ^{\top }\left( -A\left( x^k-\tilde{x}^k\right) +\frac{1}{\gamma }\left( \lambda ^k-\tilde{\lambda }^k\right) \right) . \end{aligned}$$

(98)

By the convexity of g(y) and (96),

$$\begin{aligned} g(y)-g(\tilde{y}^k)+(y-\tilde{y}^{k})^{\top }\left( -B^{\top }\tilde{\lambda }^k\right) \ge (y-\tilde{y}^{k})^{\top }H(y^k-\tilde{y}^{k}), \quad \forall y\in \mathcal {Y}. \end{aligned}$$

(99)

Since $\delta _{k+1}=G(x^{k},\xi ^{k+1})-\nabla f(x^{k})$, and by (97) we have

$$\begin{aligned} (x-\tilde{x}^{k})^{\top }\left( \alpha _k(\nabla f(x^k)-A^{\top }\tilde{\lambda }^k)+\alpha _k\delta _{k+1}-(x^k-\tilde{x}^k)\right) \ge 0, \quad \forall x\in \mathcal {X}\end{aligned}$$

which leads to

$$\begin{aligned}&(x-\tilde{x}^{k})^{\top }\left( \alpha _k(\nabla f(x^k)-A^{\top }\tilde{\lambda }^k)\right) \\&\quad \ge (x-\tilde{x}^k)^{\top } \left( x^k-\tilde{x}^k\right) -\,\alpha _k\left( x-\tilde{x}^k\right) ^{\top }\delta _{k+1}, \quad \forall x\in \mathcal {X}. \end{aligned}$$

Using (94), the above further leads to

$$\begin{aligned}&\alpha _k(f(x)-f(\tilde{x}^k))+(x-\tilde{x}^{k})^{\top }(-\alpha _k A^{\top }\tilde{\lambda }^k) \nonumber \\&\quad \ge (x-\tilde{x}^k)^{\top }\left( x^k-\tilde{x}^k\right) -\alpha _k \left( x-\tilde{x}^k\right) ^{\top }\delta _{k+1}-\frac{\alpha _kL}{2}\Vert x^k-\tilde{x}^k\Vert ^2, \quad \forall x\in \mathcal {X}.\nonumber \\ \end{aligned}$$

(100)

Furthermore,

$$\begin{aligned} (x-\tilde{x}^k)^{\top }\delta _{k+1}= & {} (x-x^k)^{\top }\delta _{k+1}+(x^k-\tilde{x}^k)^{\top }\delta _{k+1} \nonumber \\\le & {} (x-x^k)^{\top }\delta _{k+1}+\frac{\eta _k}{2}\Vert x^k-\tilde{x}^k\Vert ^2+\frac{\Vert \delta _{k+1}\Vert ^2}{2\eta _k}. \end{aligned}$$

(101)

Substituting (101) in (100), and dividing both sides by $\alpha _k$, we get

$$\begin{aligned}&f(x)-f(\tilde{x}^k)+(x-\tilde{x}^{k})^{\top }(-A^{\top }\tilde{\lambda }^k) \nonumber \\&\quad \ge \frac{(x-\tilde{x}^k)^{\top }(x^k-\tilde{x}^k)}{\alpha _k} -(x-x^k)^{\top }\delta _{k+1}-\frac{\Vert \delta _{k+1}\Vert ^2}{2\eta _k}-\frac{\eta _k+L}{2}\Vert x^k-\tilde{x}^k\Vert ^2.\qquad \end{aligned}$$

(102)

Finally, (20) follows by summing (102), (99), and (98). $\square $

Now we show the second statement in Proposition 2.3.

Proof of (21) in Proposition 2.3

Proof

First, by (15), we have $P(w^k-\tilde{w}^k)=(w^k-w^{k+1})$, and so

$$\begin{aligned} (w-\tilde{w}^k)^{\top } Q_k(w^k-\tilde{w}^k)=(w-\tilde{w}^k)^{\top } M_kP(w^k-\tilde{w}^k)=(w-\tilde{w}^k)^{\top } M_k(w^k-w^{k+1}) . \end{aligned}$$

Applying the identity

$$\begin{aligned} (a-b)^{\top }M_k(c-d)=\frac{1}{2}\left( \Vert a-d\Vert _{M_k}^2-\Vert a-c\Vert _{M_k}^2\right) +\frac{1}{2}\left( \Vert c-b\Vert _{M_k}^2-\Vert d-b\Vert _{M_k}^2\right) \end{aligned}$$

to the term $(w-\tilde{w}^k)^{\top }M_k(w^k-w^{k+1})$, we obtain

$$\begin{aligned}&(w-\tilde{w}^k)^{\top }M_k(w^k-w^{k+1}) \nonumber \\&\quad = \frac{1}{2}\left( \Vert w-w^{k+1}\Vert _{M_k}^2-\Vert w-w^k\Vert _{M_k}^2\right) +\frac{1}{2}\left( \Vert w^k-\tilde{w}^k\Vert _{M_k}^2-\Vert w^{k+1}-\tilde{w}^k\Vert _{M_k}^2\right) .\nonumber \\ \end{aligned}$$

(103)

Using (15) again, we have

$$\begin{aligned}&\Vert w^k-\tilde{w}^k\Vert _{M_k}^2-\Vert w^{k+1}-\tilde{w}^k\Vert _{M_k}^2\nonumber \\&\quad =\Vert w^k-\tilde{w}^k\Vert _{M_k}^2-\Vert (w^k-\tilde{w}^k)-(w^k-w^{k+1})\Vert _{M_k}^2\nonumber \\&\quad =\Vert w^k-\tilde{w}^k\Vert _{M_k}^2-\Vert (w^k-\tilde{w}^k)-P(w^k-\tilde{w}^{k})\Vert _{M_k}^2\nonumber \\&\quad =(w^k-\tilde{w}^k)^{\top }(2M_kP-P^{\top }M_kP)(w^k-\tilde{w}^k). \end{aligned}$$

(104)

Note that $Q_k=M_kP$ and the definition of those matrices (see (13)), we have

$$\begin{aligned} 2M_kP-P^{\top }M_kP=2Q_k-P^{\top }Q_k=\left( \begin{array}{ccc} H &{} 0 &{} 0 \\ 0 &{} \frac{1}{\alpha _k}I_{n_x}-\gamma A^{\top }A &{} A^{\top } \\ 0 &{} -A &{} \frac{1}{\gamma }I_m \\ \end{array} \right) . \end{aligned}$$

As a result,

$$\begin{aligned}&(w^k-\tilde{w}^k)^{\top }(2M_kP-P^{\top }M_kP)(w^k-\tilde{w}^k)\nonumber \\&\quad = \Vert y^k-\tilde{y}^k\Vert _H^2+\frac{1}{\gamma }\Vert \lambda ^k-\tilde{\lambda }^k\Vert ^2+(x^k-\tilde{x}^k)^{\top }\left( \frac{1}{\alpha _k}I_{n_x}-\gamma A^{\top }A\right) (x^k-\tilde{x}^k) \nonumber \\&\quad \ge (x^k-\tilde{x}^k)^{\top }\left( \frac{1}{\alpha _k}I_{n_x}-\gamma A^{\top }A\right) (x^k-\tilde{x}^k). \end{aligned}$$

(105)

Combining (105), (104), and (103), the desired inequality (21) follows. $\square $

1.2 Proof of Proposition 3.1

We first show the first part of Proposition 3.1.

Proof of (49) in Proposition 3.1

Proof

by the optimality condition of the two subproblems in SGALM, we have

$$\begin{aligned}&(y-y^{k+1})^{\top }\left( y^{k+1}-y^k+\beta _k\left( S_g(y^k,\zeta ^{k+1})-B^{\top }(\lambda ^k-\gamma (Ax^k+By^{k}-b))\right) \right) \\&\quad \ge 0, \quad \forall y\in \mathcal {Y}, \end{aligned}$$

and also

$$\begin{aligned}&(x-x^{k+1})^{\top }\left( x^{k+1}-x^k+\alpha _k \left( S_f(x^k,\xi ^{k+1})-A^{\top }(\lambda ^k-\gamma (Ax^k+By^{k+1}-b))\right) \right) \\&\quad \ge 0, \quad \forall x\in \mathcal {X}. \end{aligned}$$

Using $\tilde{\lambda }^k=\lambda ^k-\gamma (Ax^k+By^{k+1}-b)$ and the definition of $\tilde{w}^k$, the above two inequalities are equivalent to

$$\begin{aligned} (y-\tilde{y}^{k})^{\top }\left( \beta _k\left( S_g(y^k,\zeta ^{k+1})-B^{\top }\tilde{\lambda }^k\right) -(I_{n_y}-\beta _k\gamma B^{\top }B)(y^k-\tilde{y}^k)\right) \ge 0, \quad \forall y\in \mathcal {Y}, \end{aligned}$$

(106)

and

$$\begin{aligned} (x-\tilde{x}^{k})^{\top }\left( \alpha _k(S_f(x^k,\xi ^{k+1})-A^{\top }\tilde{\lambda }^k)-(x^k-\tilde{x}^k)\right) \ge 0, \quad \forall x\in \mathcal {X}. \end{aligned}$$

(107)

Also,

$$\begin{aligned} (\lambda -\tilde{\lambda }^k)^{\top }(A\tilde{x}^k+B\tilde{y}^k-b)=(\lambda -\tilde{\lambda }^k)^{\top } \left( -A(x^k-\tilde{x}^k)+\frac{1}{\gamma }(\lambda ^k-\tilde{\lambda }^k)\right) . \end{aligned}$$

(108)

Since $\delta ^f_{k+1}=S_f(x^{k},\xi ^{k+1})-\nabla f(x^{k})$ and using (107), similar to (100) and (101) we have

$$\begin{aligned}&f(x)-f(\tilde{x}^k)+(x-\tilde{x}^{k})^{\top }(-A^{\top }\tilde{\lambda }^k) \nonumber \\&\quad \ge \frac{(x-\tilde{x}^k)^{\top }(x^k-\tilde{x}^k)}{\alpha _k} -(x-x^k)^{\top }\delta ^f_{k+1}-\frac{\Vert \delta ^f_{k+1}\Vert ^2}{2\eta _k}-\frac{\eta _k+L}{2}\Vert x^k-\tilde{x}^k\Vert ^2.\qquad \quad \end{aligned}$$

(109)

Similarly, since $\delta ^g_{k+1}=S_g(y^{k},\zeta ^{k+1})-\nabla g(y^{k})$ and using (106), we also have

$$\begin{aligned}&g(y)-g(\tilde{y}^k)+(y-\tilde{y}^{k})^{\top }(-B^{\top }\tilde{\lambda }^k) \nonumber \\&\quad \ge (y-\tilde{y}^k)^{\top }\left( \frac{1}{\beta _k}I_{n_y}-\gamma B^{\top }B\right) (y^k-\tilde{y}^k) \nonumber \\&\qquad -(y-y^k)^{\top }\delta ^g_{k+1}-\frac{\Vert \delta ^g_{k+1}\Vert ^2}{2\eta _k}-\frac{\eta _k+L}{2}\Vert y^k-\tilde{y}^k\Vert ^2. \end{aligned}$$

(110)

Finally, (49) follows by summing (110), (109), and (108). $\square $

Notice that $\hat{Q}_k=\hat{M}_kP$ and

$$\begin{aligned} 2M_kP-P^{\top }M_kP= & {} \left( \begin{array}{ccc} H_k &{} 0 &{} 0 \\ 0 &{} \frac{1}{\alpha _k}I_{n_x}-\gamma A^{\top }A &{} A^{\top } \\ 0 &{} -A &{} \frac{1}{\gamma }I_m \\ \end{array} \right) \\= & {} \left( \begin{array}{ccc} \frac{1}{\alpha _k}I_{n_x}-\gamma B^{\top }B &{} 0 &{} 0 \\ 0 &{} \frac{1}{\alpha _k}I_{n_x}-\gamma A^{\top }A &{} A^{\top } \\ 0 &{} -A &{} \frac{1}{\gamma }I_m \\ \end{array} \right) . \end{aligned}$$

Inequality (50) in Proposition 3.1 follows similarly as the derivation of (21) in Proposition 2.3.

1.3 Properties of the Smoothing Function

In this subsection, we will prove Lemma 4.1. Before that, we need some technical preparations which are summarized in the following lemma.

Lemma 7.3

Let $\alpha (n)$ be the volume of the unit ball in $\mathbf {R}^n$, and $\beta (n)$ be the surface area of the unit sphere in $\mathbf {R}^n$. We also denote B, and $S_p$, to be the unit ball and unit sphere respectively.

(a)
If $M_p$ is defined as $M_p=\frac{1}{\alpha (n)}\int _{v \in B}\Vert v\Vert ^pdv$, we have
$$\begin{aligned} M_p=\frac{n}{n+p}. \end{aligned}$$
(111)
(b)
Let I be the identity matrix in $\mathbf {R}^{n\times n}$, then
$$\begin{aligned} \int _{S_p}vv^{\top }dv=\frac{\beta (n)}{n}I. \end{aligned}$$
(112)

Proof

For (a), we can directly compute $M_p$ by using the polar coordinates,

$$\begin{aligned} M_p=\frac{1}{\alpha (n)}\int _{B}\Vert v\Vert ^pdv=\frac{1}{\alpha (n)}\int _{0}^1\int _{S_p}r^p r^{n-1}drd\theta =\frac{1}{n+p}\frac{\beta (n)}{\alpha (n)}=\frac{n}{n+p}. \end{aligned}$$

For (b), Let $V=vv^{\top }$, then we know that $V_{ij}=v_i v_j$. Therefore, if $i\ne j$, by the symmetry of the unit sphere $S_p$ (i.e. if $v\in S_p,v=(v_1,v_2,\ldots ,v_n)$, then $w\in S_p$ for all $w=(\pm v_1,\pm v_2,\ldots ,\pm v_n)$), we have

$$\begin{aligned} \int _{S_p}V_{ij}dv=\int _{S_p}v_i v_jdv=\int _{S_p}-v_i v_jdv=\int _{S_p}-V_{ij}dv. \end{aligned}$$

Thus, we obtain $\int _{S_p}V_{ij}dv=0$.

If $i=j$, we know that $V_{ii}=v_i^2$. Since we already know that

$$\begin{aligned} \int _{S_p}(v_1^2+v_2^2+\ldots +v_n^2)dv=\int _{S_p}\Vert v\Vert ^2dv=\beta (n). \end{aligned}$$

Then, by symmetry, we have

$$\begin{aligned} \int _{S_p}v_1^2dv=\int _{S_p}v_2^2dv=\ldots =\int _{S_p}v_n^2dv=\frac{\beta (n)}{n}. \end{aligned}$$

Thus we also have $\int _{S_p}V_{ii}^2dv=\frac{\beta (n)}{n}$, for $i=1,2,\ldots ,n$. Therefore, $\int _{S_p}vv^{\top }dv=\frac{\beta (n)}{n}I$. $\square $

By the next three propositions, the part (b) of Lemma 4.1 is shown; for part (a) and (c) of Lemma 4.1, the proof can be found in [48].

Proposition 7.4

If $f\in C^1_L(\mathbf {R}^n)$, then

$$\begin{aligned} |f_\mu (x)-f(x)|\le \frac{L\mu ^2}{2}. \end{aligned}$$

(113)

Proof

Since $f\in C^1_L(\mathbf {R}^n)$, we have

$$\begin{aligned} |f_\mu (x)-f(x)|= & {} \left| \frac{1}{\alpha (n)}\int _{B}f(x+\mu v)dv-f(x)\right| \\= & {} \left| \frac{1}{\alpha (n)}\int _{B}(f(x+\mu v)-f(x)-\nabla f(x)^{\top }\mu v)dv\right| \\\le & {} \int _{B}\left| (f(x+\mu v)-f(x)-\nabla f(x)^{\top }\mu v)\right| dv \\\le & {} \int _{B}\frac{L\mu ^2}{2}\Vert v\Vert ^2dv \\{\mathop {=}\limits ^{(111)}}&\frac{L\mu ^2}{2}\frac{n}{n+2}\le \frac{L\mu ^2}{2} . \end{aligned}$$

$\square $

Proposition 7.5

If $f\in C^1_L(\mathbf {R}^n)$, then

$$\begin{aligned} \Vert \nabla f_\mu (x)-\nabla f(x)\Vert \le \frac{\mu nL}{2} . \end{aligned}$$

(114)

Proof

$$\begin{aligned}&\Vert \nabla f_\mu (x)-\nabla f(x)\Vert \\&\quad =\left\| \frac{1}{\beta (n)}\left[ \frac{n}{\mu }\int _{S_p}f(x+\mu v)vdv\right] -\nabla f(x)\right\| \\&\quad {\mathop {=}\limits ^{(112)}}\left\| \frac{1}{\beta (n)}\left[ \frac{n}{\mu }\int _{S_p}f(x+\mu v)vdv-\int _{S_p}\frac{n}{\mu }f(x)vdv-\int _{S_p}\frac{n}{\mu }\langle \nabla f(x),\mu v\rangle vdv\right] \right\| \\&\quad \le \frac{n}{\beta (n)\mu }\int _{S_p}|f(x+\mu v)-f(x)-\langle \nabla f(x),\mu v\rangle |\Vert v\Vert dv \\&\quad \le \frac{n}{\beta (n)\mu }\frac{L\mu ^2}{2}\int _{S_p}\Vert v\Vert ^3dv=\frac{\mu n L}{2} . \end{aligned}$$

$\square $

Proposition 7.6

If $f\in C^1_L(\mathbf {R}^n)$, and the ${\mathcal {SZO}}$ defined as $g_\mu (x)=\frac{n}{\mu }[f(x+\mu v)-f(x)]v$, then we have

$$\begin{aligned} \mathbf{\mathsf E}_v\left[ \Vert g_\mu (x)\Vert ^2\right] \le 2n\Vert \nabla f(x)\Vert ^2+\frac{\mu ^2}{2}L^2n^2 . \end{aligned}$$

(115)

Proof

$$\begin{aligned} \mathbf{\mathsf E}_v[\Vert g_\mu (x)\Vert ^2]= & {} \frac{1}{\beta (n)}\int _{S_p}\frac{n^2}{\mu ^2}|f(x+\mu v)-f(x)|^2\Vert v\Vert ^2dv \\= & {} \frac{n^2}{\beta (n)\mu ^2}\int _{S_p}\left[ f(x+\mu v)-f(x)-\langle \nabla f(x),\mu v\rangle +\langle \nabla f(x),\mu v\rangle \right] ^2dv \\\le & {} \frac{n^2}{\beta (n)\mu ^2}\int _{S_p} \left[ 2\left( f(x+\mu v)-f(x)-\langle \nabla f(x),\mu v\rangle \right) ^2\right. \\&\left. +\,2\left( \langle \nabla f(x),\mu v\rangle \right) ^2\right] dv \\\le & {} \frac{n^2}{\beta (n)\mu ^2}\left[ \int _{S_p}2\left( \frac{L\mu ^2}{2}\Vert v\Vert ^2\right) ^2dv+2\mu ^2\int _{S_p}\nabla f(x)^{\top }vv^{\top }\nabla f(x)dv\right] \\&{\mathop {=}\limits ^{(112)}}\frac{n^2}{\beta (n)\mu ^2}\left[ \frac{L^2\mu ^4}{2}\beta (n)+2\mu ^2\frac{\beta (n)}{n}\Vert \nabla f(x)\Vert ^2\right] \\= & {} 2n\Vert \nabla f(x)\Vert ^2+\frac{\mu ^2}{2}L^2n^2 . \end{aligned}$$

$\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gao, X., Jiang, B. & Zhang, S. On the Information-Adaptive Variants of the ADMM: An Iteration Complexity Perspective. J Sci Comput 76, 327–363 (2018). https://doi.org/10.1007/s10915-017-0621-6

Download citation

Received: 28 April 2016
Revised: 20 May 2017
Accepted: 23 November 2017
Published: 06 December 2017
Issue Date: July 2018
DOI: https://doi.org/10.1007/s10915-017-0621-6

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On the Information-Adaptive Variants of the ADMM: An Iteration Complexity Perspective

Abstract

Access this article

Similar content being viewed by others

The Frank-Wolfe Algorithm: A Short Introduction

An ADMM algorithm for two-stage stochastic programming problems

A note on the convergence of ADMM for linearly constrained convex optimization problems

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

1.1 Proof of Proposition 2.3

Lemma 7.1

Lemma 7.2

Proof

Proof

1.2 Proof of Proposition 3.1

Proof

1.3 Properties of the Smoothing Function

Lemma 7.3

Proof

Proposition 7.4

Proof

Proposition 7.5

Proof

Proposition 7.6

Proof

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

On the Information-Adaptive Variants of the ADMM: An Iteration Complexity Perspective

Abstract

Access this article

Similar content being viewed by others

The Frank-Wolfe Algorithm: A Short Introduction

An ADMM algorithm for two-stage stochastic programming problems

A note on the convergence of ADMM for linearly constrained convex optimization problems

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Appendix

1.1 Proof of Proposition 2.3

Lemma 7.1

Lemma 7.2

Proof

Proof

1.2 Proof of Proposition 3.1

Proof

1.3 Properties of the Smoothing Function

Lemma 7.3

Proof

Proposition 7.4

Proof

Proposition 7.5

Proof

Proposition 7.6

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation