Skip to main content
Log in

On the Information-Adaptive Variants of the ADMM: An Iteration Complexity Perspective

  • Published:
Journal of Scientific Computing Aims and scope Submit manuscript

Abstract

Designing algorithms for an optimization model often amounts to maintaining a balance between the degree of information to request from the model on the one hand, and the computational speed to expect on the other hand. Naturally, the more information is available, the faster one can expect the algorithm to converge. The popular algorithm of ADMM demands that objective function is easy to optimize once the coupled constraints are shifted to the objective with multipliers. However, in many applications this assumption does not hold; instead, often only some noisy estimations of the gradient of the objective—or even only the objective itself—are available. This paper aims to bridge this gap. We present a suite of variants of the ADMM, where the trade-offs between the required information on the objective and the computational complexity are explicitly given. The new variants allow the method to be applicable on a much broader class of problems where only noisy estimations of the gradient or the function values are accessible, yet the flexibility is achieved without sacrificing the computational complexity bounds.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html.

References

  1. Bertsekas, D.P.: Nonlinear Programming. Athena Scientific, Belmont (1999)

    MATH  Google Scholar 

  2. Boley, D.: Local linear convergence of the alternating direction method of multipliers on quadratic or linear programs. SIAM J. Optim. 23(4), 2183–2207 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  3. Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends® Mach. Learn. 3(1), 1–122 (2011)

    MATH  Google Scholar 

  4. Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imag. Vis. 40(1), 120–145 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  5. Chen, C., He, B., Ye, Y., Yuan, X.: The direct extension of admm for multi-block convex minimization problems is not necessarily convergent. Math. Prog. 155(1–2), 57–79 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  6. DAspremont, A., Banerjee, O., Ghaoui, L.E.: First-order methods for sparse covariance selection. SIAM J. Matrix Anal. Appl. 30(1), 56–66 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  7. Davis, D., Yin, W.: Convergence rate analysis of several splitting schemes. arXiv preprint arXiv:1406.4834 (2014)

  8. Deng, W., Lai, M.-J., Peng, Z., Yin, W.: Parallel multi-block admm with o (1/k) convergence. arXiv preprint arXiv:1312.3040 (2013)

  9. Deng, W., Yin, W.: On the global and linear convergence of the generalized alternating direction method of multipliers. J. Sci. Comput. 66(3), 889–916 (2015)

  10. Douglas, J., Rachford, H.: On the numerical solution of heat conduction problems in two and three space variables. Trans. Am. Math. Soc. 82(2), 421–439 (1956)

    Article  MathSciNet  MATH  Google Scholar 

  11. Drori, Y., Sabach, S., Teboulle, M.: A simple algorithm for a class of nonsmooth convex-concave saddle-point problems. Oper. Res. Lett. 43(2), 209–214 (2015)

    Article  MathSciNet  Google Scholar 

  12. Eckstein, J., Bertsekas, D.: On the Douglas–Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Progr. 55(1–3), 293–318 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  13. Ermoliev, Y.: Stochastic quasigradient methods and their application to system optimization. Int. J. Probab. Stoch. Process. 9(1–2), 1–36 (1983)

    MathSciNet  MATH  Google Scholar 

  14. Gaivoronskii, A.: Nonstationary stochastic programming problems. Cybern. Syst. Anal. 14(4), 575–579 (1978)

    Article  MathSciNet  Google Scholar 

  15. Ghadimi, S., Lan, G.: Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization I: A generic algorithmic framework. SIAM J. Optim. 22(4), 1469–1492 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  16. Ghadimi, S., Lan, G.: Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization, II: shrinking procedures and optimal algorithms. SIAM J. Optim. 23(4), 2061–2089 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  17. Ghadimi, S., Lan, G.: Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM J. Optim. 23(4), 2341–2368 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  18. Ghadimi, S., Lan, G.: Accelerated gradient methods for nonconvex nonlinear and stochastic programming. Math. Progr. 156(1), 59–99 (2015)

    MathSciNet  MATH  Google Scholar 

  19. Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Math. Progr. 155(1), 267–305 (2014)

    MathSciNet  MATH  Google Scholar 

  20. Glowinski, R., Le Tallec, P.: Augmented Lagrangian and operator-splitting methods in nonlinear mechanics. SIAM, Philadelphia (1989)

    Book  MATH  Google Scholar 

  21. Glowinski, R., Marroco, A.: Sur l’approximation, par éléments finis d’ordre un, et la résolution, par pénalisation-dualité d’une classe de problèmes de dirichlet non linéaires. Revue française d’automatique, informatique, recherche opérationnelle. Analyse numérique, 9(2), 41–76 (1975)

  22. Han, D., Yuan, X.: Local linear convergence of the alternating direction method of multipliers for quadratic programs. SIAM J. Numer. Anal. 51(6), 3446–3457 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  23. He, B., Hou, L., Yuan, X.: On full jacobian decomposition of the augmented lagrangian method for separable convex programming. SIAM J. Optim. 25(4), 2274–2312 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  24. He, B., Liao, L.-Z., Han, D., Yang, H.: A new inexact alternating directions method for monotone variational inequalities. Math. Progr. 92(1), 103–118 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  25. He, B., Tao, M., Yuan, X.: Convergence rate and iteration complexity on the alternating direction method of multipliers with a substitution procedure for separable convex programming. Math. Oper. Res. 42(3), 662–691 (2017)

  26. He, B., Xu, H.-K., Yuan, X.: On the proximal jacobian decomposition of alm for multiple-block separable convex minimization problems and its relationship to admm. J. Sci. Comput. 66(3), 1204–1217 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  27. He, B., Yuan, X.: On the \(O(1/n)\) convergence rate of the Douglas–Rachford alternating direction method. SIAM J. Numer. Anal. 50(2), 700–709 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  28. He, B., Yuan, X.: On non-ergodic convergence rate of douglas-rachford alternating direction method of multipliers. Numerische Mathematik 130(3), 567–577 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  29. Hong, M., Luo, Z.: On the linear convergence of the alternating direction method of multipliers. arXiv preprint arXiv:1208.3922 (2012)

  30. Lan, G.: An optimal method for stochastic composite optimization. Math. Progr. 133(1–2), 365–397 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  31. Lin, T., Ma, S., Zhang, S.: An extragradient-based alternating direction method for convex minimization. Found. Comput. Math. 17(1), 35–59 (2017)

  32. Lin, T., Ma, S., Zhang, S.: On the global linear convergence of the ADMM with multi-block variables. SIAM J. Optim. 25(3), 1478–1497 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  33. Lin, T., Ma, S., Zhang, S.: On the sublinear convergence rate of multi-block admm. J. Oper. Res. Soc. China 3(3), 251–274 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  34. Liu, J., Chen, J., Ye, J.: Large-scale sparse logistic regression. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, pp. 547–556. ACM (2009)

  35. Monteiro, R.D., Svaiter, B.F.: Iteration-complexity of block-decomposition algorithms and the alternating direction method of multipliers. SIAM J. Optim. 23(1), 475–507 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  36. Nemirovski, A.: Prox-method with rate of convergence o(1/t) for variational inequalities with lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM J. Optim. 15(1), 229–251 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  37. Nemirovski, A., Juditsky, A., Lan, G., Shapiro, A.: Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19(4), 1574–1609 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  38. Nemirovski, A., Yudin, D.: Problem Complexity and Method Efficiency in Optimization. Wiley, Hoboken (1983)

    Google Scholar 

  39. Nesterov, Y., Spokoiny, V.: Random gradient-free minimization of convex functions. Found. Comput. Math. 17(2), 527–566 (2017)

  40. Ng, M.K., Wang, F., Yuan, X.: Inexact alternating direction methods for image recovery. SIAM J. Sci. Comput. 33(4), 1643–1668 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  41. Ouyang, H., He, N., Tran, L., Gray, A.: Stochastic alternating direction method of multipliers. In: Proceedings of the 30th International Conference on Machine Learning, pp. 80–88 (2013)

  42. Polyak, B.: New stochastic approximation type procedures. Automat. i Telemekh 7(98–107), 2 (1990)

    Google Scholar 

  43. Polyak, B., Juditsky, A.: Acceleration of stochastic approximation by averaging. SIAM J. Control Optim. 30(4), 838–855 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  44. Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22(3), 400–407 (1951)

  45. Ruszczyński, A., Syski, W.: A method of aggregate stochastic subgradients with on-line stepsize rules for convex stochastic programming problems. In: Stochastic Programming 84 Part II, pp. 113–131. Springer (1986)

  46. Sacks, J.: Asymptotic distribution of stochastic approximation procedures. Ann. Math. Stat. 29(2), 373–405 (1958)

  47. Scheinberg, K., Ma, S., Goldfarb, D.: Sparse inverse covariance selection via alternating linearization methods. In: Advances in neural information processing systems, pp. 2101–2109 (2010)

  48. Shalev-Shwartz, S.: Online learning and online convex optimization. Found. Trend® Mach. Learn. 4(2), 107–194 (2011)

    Article  MATH  Google Scholar 

  49. Shefi, R., Teboulle, M.: Rate of convergence analysis of decomposition methods based on the proximal method of multipliers for convex minimization. SIAM J. Optim. 24(1), 269–297 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  50. Suzuki, T.: Dual averaging and proximal gradient descent for online alternating direction multiplier method. In: Proceedings of the 30th International Conference on Machine Learning, pp. 392–400 (2013)

  51. Suzuki, T.: Stochastic dual coordinate ascent with alternating direction method of multipliers. In: Proceedings of the 31th International Conference on Machine Learning, pp. 736–744 (2014)

  52. Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., Knight, K.: Sparsity and smoothness via the fused LASSO. J. R. Stat. Soc.: Ser. B 67(1), 91–108 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  53. Tibshirani, R.J., Taylor, J.: The solution path of the generalized lasso. Ann. Stat. 39(3), 1335–1371 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  54. Wang, H., Banerjee, A.: Online alternating direction method. In: Proceedings of the 29th International Conference on Machine Learning (2012)

  55. Wang, M., Fang, X., Liu, H.: Stochastic compositional gradient descent: algorithms for minimizing compositions of expected-value functions. Math. Progr. Ser. A 161(1-2), 419–449 (2017)

  56. Zhao, P., Yang, J., Zhang, T., Li, P.: Adaptive stochastic alternating direction method of multipliers. In: Proceedings of the 32nd International Conference on Machine Learning, pp. 69–77 (2015)

  57. Zhong, W., Kwok, J.T.: Fast stochastic alternating direction method of multipliers. In: Proceedings of the 31th International Conference on Machine Learning, pp. 46–54 (2014)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bo Jiang.

Additional information

Bo Jiang: Research of this author was supported in part by National Natural Science Foundation of China (Grant Nos. 11401364 and 11771269) and Program for Innovative Research Team of Shanghai University of Finance and Economics. Shuzhong Zhang: Research of this author was supported in part by National Science Foundation (Grant CMMI-1462408).

Appendix

Appendix

1.1 Proof of Proposition 2.3

Here we will prove Proposition 2.3. Before proceeding, let us present some technical lemmas without proof.

Lemma 7.1

Suppose function f is smooth and its gradient is Lipschitz continuous, i.e. (2) holds, then we have

$$\begin{aligned} f(x) \le f(y) + \nabla f(y)^{\top }(x-y) + \frac{L}{2}\Vert x - y\Vert ^2. \end{aligned}$$
(93)

Lemma 7.2

Suppose function f is smooth and convex, and its gradient is Lipschitz continuous with the constant L, i.e. (2) holds. Then we have

$$\begin{aligned} (x-y)^{\top }\nabla f(z) \le f(x)-f(y)+\frac{L}{2}\Vert z-y\Vert ^2 . \end{aligned}$$
(94)

Furthermore, if f is \(\kappa \)-strongly convex, we have

$$\begin{aligned} (x-y)^{\top }\nabla f(z) \le f(x)-f(y)+\frac{L}{2}\Vert z-y\Vert ^2-\frac{\kappa }{2}\Vert x-z\Vert ^2 . \end{aligned}$$
(95)

Lemma 7.1 is also known as the descent lemma which is well known; one can find its proof in e.g. [1]. Lemma 7.2 is similar to Fact 1 in [11] which follows from the (strong) convexity of f and Lemma 7.1.

Proof of (20)  in Proposition 2.3

Proof

First, by the optimality condition of the two subproblems in SGADM, we have

$$\begin{aligned}&(y-y^{k+1})^{\top }\left( \partial g(y^{k+1})-B^{\top }\left( \lambda ^k-\gamma (Ax^k+By^{k+1}-b)\right) -H(y^k-y^{k+1})\right) \\&\quad \ge 0, \quad \forall y\in \mathcal {Y}, \end{aligned}$$

and

$$\begin{aligned}&(x-x^{k+1})^{\top }\left( x^{k+1}-\left( x^k-\alpha _k \left( G(x^k,\xi ^{k+1})-A^{\top }(\lambda ^k-\gamma (Ax^k+By^{k+1}-b))\right) \right) \right) \\&\quad \ge 0, \quad \forall x\in \mathcal {X}, \end{aligned}$$

where \(\partial g(y)\) is a subgradient of g at y. Using \(\tilde{\lambda }^k=\lambda ^k-\gamma (Ax^k+By^{k+1}-b)\) and the definition of \(\tilde{w}^k\) in (14), the above two inequalities are equivalent to

$$\begin{aligned} (y-\tilde{y}^{k})^{\top }\left( \partial g(\tilde{y}^{k})-B^{\top }\tilde{\lambda }^k-H(y^k-y^{k+1})\right) \ge 0, \quad \forall y\in \mathcal {Y}, \end{aligned}$$
(96)

and

$$\begin{aligned} (x-\tilde{x}^{k})^{\top }\left( \alpha _k\left( G(x^k,\xi ^{k+1})-A^{\top }\tilde{\lambda }^k\right) -(x^k-\tilde{x}^k)\right) \ge 0, \quad \forall x\in \mathcal {X}. \end{aligned}$$
(97)

Moreover,

$$\begin{aligned} (A\tilde{x}^k+B\tilde{y}^k-b)-\left( -A(x^k-\tilde{x}^k)+\frac{1}{\gamma }\left( \lambda ^k-\tilde{\lambda }^k\right) \right) =0 . \end{aligned}$$

Thus

$$\begin{aligned} (\lambda -\tilde{\lambda }^k)^{\top }(A\tilde{x}^k+B\tilde{y}^k-b) =\left( \lambda -\tilde{\lambda }^k\right) ^{\top }\left( -A\left( x^k-\tilde{x}^k\right) +\frac{1}{\gamma }\left( \lambda ^k-\tilde{\lambda }^k\right) \right) . \end{aligned}$$
(98)

By the convexity of g(y) and (96),

$$\begin{aligned} g(y)-g(\tilde{y}^k)+(y-\tilde{y}^{k})^{\top }\left( -B^{\top }\tilde{\lambda }^k\right) \ge (y-\tilde{y}^{k})^{\top }H(y^k-\tilde{y}^{k}), \quad \forall y\in \mathcal {Y}. \end{aligned}$$
(99)

Since \(\delta _{k+1}=G(x^{k},\xi ^{k+1})-\nabla f(x^{k})\), and by (97) we have

$$\begin{aligned} (x-\tilde{x}^{k})^{\top }\left( \alpha _k(\nabla f(x^k)-A^{\top }\tilde{\lambda }^k)+\alpha _k\delta _{k+1}-(x^k-\tilde{x}^k)\right) \ge 0, \quad \forall x\in \mathcal {X}\end{aligned}$$

which leads to

$$\begin{aligned}&(x-\tilde{x}^{k})^{\top }\left( \alpha _k(\nabla f(x^k)-A^{\top }\tilde{\lambda }^k)\right) \\&\quad \ge (x-\tilde{x}^k)^{\top } \left( x^k-\tilde{x}^k\right) -\,\alpha _k\left( x-\tilde{x}^k\right) ^{\top }\delta _{k+1}, \quad \forall x\in \mathcal {X}. \end{aligned}$$

Using (94), the above further leads to

$$\begin{aligned}&\alpha _k(f(x)-f(\tilde{x}^k))+(x-\tilde{x}^{k})^{\top }(-\alpha _k A^{\top }\tilde{\lambda }^k) \nonumber \\&\quad \ge (x-\tilde{x}^k)^{\top }\left( x^k-\tilde{x}^k\right) -\alpha _k \left( x-\tilde{x}^k\right) ^{\top }\delta _{k+1}-\frac{\alpha _kL}{2}\Vert x^k-\tilde{x}^k\Vert ^2, \quad \forall x\in \mathcal {X}.\nonumber \\ \end{aligned}$$
(100)

Furthermore,

$$\begin{aligned} (x-\tilde{x}^k)^{\top }\delta _{k+1}= & {} (x-x^k)^{\top }\delta _{k+1}+(x^k-\tilde{x}^k)^{\top }\delta _{k+1} \nonumber \\\le & {} (x-x^k)^{\top }\delta _{k+1}+\frac{\eta _k}{2}\Vert x^k-\tilde{x}^k\Vert ^2+\frac{\Vert \delta _{k+1}\Vert ^2}{2\eta _k}. \end{aligned}$$
(101)

Substituting (101) in (100), and dividing both sides by \(\alpha _k\), we get

$$\begin{aligned}&f(x)-f(\tilde{x}^k)+(x-\tilde{x}^{k})^{\top }(-A^{\top }\tilde{\lambda }^k) \nonumber \\&\quad \ge \frac{(x-\tilde{x}^k)^{\top }(x^k-\tilde{x}^k)}{\alpha _k} -(x-x^k)^{\top }\delta _{k+1}-\frac{\Vert \delta _{k+1}\Vert ^2}{2\eta _k}-\frac{\eta _k+L}{2}\Vert x^k-\tilde{x}^k\Vert ^2.\qquad \end{aligned}$$
(102)

Finally, (20) follows by summing (102), (99), and (98). \(\square \)

Now we show the second statement in Proposition 2.3.

Proof of (21)  in Proposition 2.3

Proof

First, by (15), we have \(P(w^k-\tilde{w}^k)=(w^k-w^{k+1})\), and so

$$\begin{aligned} (w-\tilde{w}^k)^{\top } Q_k(w^k-\tilde{w}^k)=(w-\tilde{w}^k)^{\top } M_kP(w^k-\tilde{w}^k)=(w-\tilde{w}^k)^{\top } M_k(w^k-w^{k+1}) . \end{aligned}$$

Applying the identity

$$\begin{aligned} (a-b)^{\top }M_k(c-d)=\frac{1}{2}\left( \Vert a-d\Vert _{M_k}^2-\Vert a-c\Vert _{M_k}^2\right) +\frac{1}{2}\left( \Vert c-b\Vert _{M_k}^2-\Vert d-b\Vert _{M_k}^2\right) \end{aligned}$$

to the term \((w-\tilde{w}^k)^{\top }M_k(w^k-w^{k+1})\), we obtain

$$\begin{aligned}&(w-\tilde{w}^k)^{\top }M_k(w^k-w^{k+1}) \nonumber \\&\quad = \frac{1}{2}\left( \Vert w-w^{k+1}\Vert _{M_k}^2-\Vert w-w^k\Vert _{M_k}^2\right) +\frac{1}{2}\left( \Vert w^k-\tilde{w}^k\Vert _{M_k}^2-\Vert w^{k+1}-\tilde{w}^k\Vert _{M_k}^2\right) .\nonumber \\ \end{aligned}$$
(103)

Using (15) again, we have

$$\begin{aligned}&\Vert w^k-\tilde{w}^k\Vert _{M_k}^2-\Vert w^{k+1}-\tilde{w}^k\Vert _{M_k}^2\nonumber \\&\quad =\Vert w^k-\tilde{w}^k\Vert _{M_k}^2-\Vert (w^k-\tilde{w}^k)-(w^k-w^{k+1})\Vert _{M_k}^2\nonumber \\&\quad =\Vert w^k-\tilde{w}^k\Vert _{M_k}^2-\Vert (w^k-\tilde{w}^k)-P(w^k-\tilde{w}^{k})\Vert _{M_k}^2\nonumber \\&\quad =(w^k-\tilde{w}^k)^{\top }(2M_kP-P^{\top }M_kP)(w^k-\tilde{w}^k). \end{aligned}$$
(104)

Note that \(Q_k=M_kP\) and the definition of those matrices (see (13)), we have

$$\begin{aligned} 2M_kP-P^{\top }M_kP=2Q_k-P^{\top }Q_k=\left( \begin{array}{ccc} H &{} 0 &{} 0 \\ 0 &{} \frac{1}{\alpha _k}I_{n_x}-\gamma A^{\top }A &{} A^{\top } \\ 0 &{} -A &{} \frac{1}{\gamma }I_m \\ \end{array} \right) . \end{aligned}$$

As a result,

$$\begin{aligned}&(w^k-\tilde{w}^k)^{\top }(2M_kP-P^{\top }M_kP)(w^k-\tilde{w}^k)\nonumber \\&\quad = \Vert y^k-\tilde{y}^k\Vert _H^2+\frac{1}{\gamma }\Vert \lambda ^k-\tilde{\lambda }^k\Vert ^2+(x^k-\tilde{x}^k)^{\top }\left( \frac{1}{\alpha _k}I_{n_x}-\gamma A^{\top }A\right) (x^k-\tilde{x}^k) \nonumber \\&\quad \ge (x^k-\tilde{x}^k)^{\top }\left( \frac{1}{\alpha _k}I_{n_x}-\gamma A^{\top }A\right) (x^k-\tilde{x}^k). \end{aligned}$$
(105)

Combining (105), (104), and (103), the desired inequality (21) follows. \(\square \)

1.2 Proof of Proposition 3.1

We first show the first part of Proposition 3.1.

Proof of (49)  in Proposition 3.1

Proof

by the optimality condition of the two subproblems in SGALM, we have

$$\begin{aligned}&(y-y^{k+1})^{\top }\left( y^{k+1}-y^k+\beta _k\left( S_g(y^k,\zeta ^{k+1})-B^{\top }(\lambda ^k-\gamma (Ax^k+By^{k}-b))\right) \right) \\&\quad \ge 0, \quad \forall y\in \mathcal {Y}, \end{aligned}$$

and also

$$\begin{aligned}&(x-x^{k+1})^{\top }\left( x^{k+1}-x^k+\alpha _k \left( S_f(x^k,\xi ^{k+1})-A^{\top }(\lambda ^k-\gamma (Ax^k+By^{k+1}-b))\right) \right) \\&\quad \ge 0, \quad \forall x\in \mathcal {X}. \end{aligned}$$

Using \(\tilde{\lambda }^k=\lambda ^k-\gamma (Ax^k+By^{k+1}-b)\) and the definition of \(\tilde{w}^k\), the above two inequalities are equivalent to

$$\begin{aligned} (y-\tilde{y}^{k})^{\top }\left( \beta _k\left( S_g(y^k,\zeta ^{k+1})-B^{\top }\tilde{\lambda }^k\right) -(I_{n_y}-\beta _k\gamma B^{\top }B)(y^k-\tilde{y}^k)\right) \ge 0, \quad \forall y\in \mathcal {Y}, \end{aligned}$$
(106)

and

$$\begin{aligned} (x-\tilde{x}^{k})^{\top }\left( \alpha _k(S_f(x^k,\xi ^{k+1})-A^{\top }\tilde{\lambda }^k)-(x^k-\tilde{x}^k)\right) \ge 0, \quad \forall x\in \mathcal {X}. \end{aligned}$$
(107)

Also,

$$\begin{aligned} (\lambda -\tilde{\lambda }^k)^{\top }(A\tilde{x}^k+B\tilde{y}^k-b)=(\lambda -\tilde{\lambda }^k)^{\top } \left( -A(x^k-\tilde{x}^k)+\frac{1}{\gamma }(\lambda ^k-\tilde{\lambda }^k)\right) . \end{aligned}$$
(108)

Since \(\delta ^f_{k+1}=S_f(x^{k},\xi ^{k+1})-\nabla f(x^{k})\) and using (107), similar to (100) and (101) we have

$$\begin{aligned}&f(x)-f(\tilde{x}^k)+(x-\tilde{x}^{k})^{\top }(-A^{\top }\tilde{\lambda }^k) \nonumber \\&\quad \ge \frac{(x-\tilde{x}^k)^{\top }(x^k-\tilde{x}^k)}{\alpha _k} -(x-x^k)^{\top }\delta ^f_{k+1}-\frac{\Vert \delta ^f_{k+1}\Vert ^2}{2\eta _k}-\frac{\eta _k+L}{2}\Vert x^k-\tilde{x}^k\Vert ^2.\qquad \quad \end{aligned}$$
(109)

Similarly, since \(\delta ^g_{k+1}=S_g(y^{k},\zeta ^{k+1})-\nabla g(y^{k})\) and using (106), we also have

$$\begin{aligned}&g(y)-g(\tilde{y}^k)+(y-\tilde{y}^{k})^{\top }(-B^{\top }\tilde{\lambda }^k) \nonumber \\&\quad \ge (y-\tilde{y}^k)^{\top }\left( \frac{1}{\beta _k}I_{n_y}-\gamma B^{\top }B\right) (y^k-\tilde{y}^k) \nonumber \\&\qquad -(y-y^k)^{\top }\delta ^g_{k+1}-\frac{\Vert \delta ^g_{k+1}\Vert ^2}{2\eta _k}-\frac{\eta _k+L}{2}\Vert y^k-\tilde{y}^k\Vert ^2. \end{aligned}$$
(110)

Finally, (49) follows by summing (110), (109), and (108). \(\square \)

Notice that \(\hat{Q}_k=\hat{M}_kP\) and

$$\begin{aligned} 2M_kP-P^{\top }M_kP= & {} \left( \begin{array}{ccc} H_k &{} 0 &{} 0 \\ 0 &{} \frac{1}{\alpha _k}I_{n_x}-\gamma A^{\top }A &{} A^{\top } \\ 0 &{} -A &{} \frac{1}{\gamma }I_m \\ \end{array} \right) \\= & {} \left( \begin{array}{ccc} \frac{1}{\alpha _k}I_{n_x}-\gamma B^{\top }B &{} 0 &{} 0 \\ 0 &{} \frac{1}{\alpha _k}I_{n_x}-\gamma A^{\top }A &{} A^{\top } \\ 0 &{} -A &{} \frac{1}{\gamma }I_m \\ \end{array} \right) . \end{aligned}$$

Inequality (50) in Proposition 3.1 follows similarly as the derivation of (21) in Proposition 2.3.

1.3 Properties of the Smoothing Function

In this subsection, we will prove Lemma 4.1. Before that, we need some technical preparations which are summarized in the following lemma.

Lemma 7.3

Let \(\alpha (n)\) be the volume of the unit ball in \(\mathbf {R}^n\), and \(\beta (n)\) be the surface area of the unit sphere in \(\mathbf {R}^n\). We also denote B, and \(S_p\), to be the unit ball and unit sphere respectively.

  1. (a)

    If \(M_p\) is defined as \(M_p=\frac{1}{\alpha (n)}\int _{v \in B}\Vert v\Vert ^pdv\), we have

    $$\begin{aligned} M_p=\frac{n}{n+p}. \end{aligned}$$
    (111)
  2. (b)

    Let I be the identity matrix in \(\mathbf {R}^{n\times n}\), then

    $$\begin{aligned} \int _{S_p}vv^{\top }dv=\frac{\beta (n)}{n}I. \end{aligned}$$
    (112)

Proof

For (a), we can directly compute \(M_p\) by using the polar coordinates,

$$\begin{aligned} M_p=\frac{1}{\alpha (n)}\int _{B}\Vert v\Vert ^pdv=\frac{1}{\alpha (n)}\int _{0}^1\int _{S_p}r^p r^{n-1}drd\theta =\frac{1}{n+p}\frac{\beta (n)}{\alpha (n)}=\frac{n}{n+p}. \end{aligned}$$

For (b), Let \(V=vv^{\top }\), then we know that \(V_{ij}=v_i v_j\). Therefore, if \(i\ne j\), by the symmetry of the unit sphere \(S_p\) (i.e. if \(v\in S_p,v=(v_1,v_2,\ldots ,v_n)\), then \(w\in S_p\) for all \(w=(\pm v_1,\pm v_2,\ldots ,\pm v_n)\)), we have

$$\begin{aligned} \int _{S_p}V_{ij}dv=\int _{S_p}v_i v_jdv=\int _{S_p}-v_i v_jdv=\int _{S_p}-V_{ij}dv. \end{aligned}$$

Thus, we obtain \(\int _{S_p}V_{ij}dv=0\).

If \(i=j\), we know that \(V_{ii}=v_i^2\). Since we already know that

$$\begin{aligned} \int _{S_p}(v_1^2+v_2^2+\ldots +v_n^2)dv=\int _{S_p}\Vert v\Vert ^2dv=\beta (n). \end{aligned}$$

Then, by symmetry, we have

$$\begin{aligned} \int _{S_p}v_1^2dv=\int _{S_p}v_2^2dv=\ldots =\int _{S_p}v_n^2dv=\frac{\beta (n)}{n}. \end{aligned}$$

Thus we also have \(\int _{S_p}V_{ii}^2dv=\frac{\beta (n)}{n}\), for \(i=1,2,\ldots ,n\). Therefore, \(\int _{S_p}vv^{\top }dv=\frac{\beta (n)}{n}I\). \(\square \)

By the next three propositions, the part (b) of Lemma 4.1 is shown; for part (a) and (c) of Lemma 4.1, the proof can be found in [48].

Proposition 7.4

If \(f\in C^1_L(\mathbf {R}^n)\), then

$$\begin{aligned} |f_\mu (x)-f(x)|\le \frac{L\mu ^2}{2}. \end{aligned}$$
(113)

Proof

Since \(f\in C^1_L(\mathbf {R}^n)\), we have

$$\begin{aligned} |f_\mu (x)-f(x)|= & {} \left| \frac{1}{\alpha (n)}\int _{B}f(x+\mu v)dv-f(x)\right| \\= & {} \left| \frac{1}{\alpha (n)}\int _{B}(f(x+\mu v)-f(x)-\nabla f(x)^{\top }\mu v)dv\right| \\\le & {} \int _{B}\left| (f(x+\mu v)-f(x)-\nabla f(x)^{\top }\mu v)\right| dv \\\le & {} \int _{B}\frac{L\mu ^2}{2}\Vert v\Vert ^2dv \\{\mathop {=}\limits ^{(111)}}&\frac{L\mu ^2}{2}\frac{n}{n+2}\le \frac{L\mu ^2}{2} . \end{aligned}$$

\(\square \)

Proposition 7.5

If \(f\in C^1_L(\mathbf {R}^n)\), then

$$\begin{aligned} \Vert \nabla f_\mu (x)-\nabla f(x)\Vert \le \frac{\mu nL}{2} . \end{aligned}$$
(114)

Proof

$$\begin{aligned}&\Vert \nabla f_\mu (x)-\nabla f(x)\Vert \\&\quad =\left\| \frac{1}{\beta (n)}\left[ \frac{n}{\mu }\int _{S_p}f(x+\mu v)vdv\right] -\nabla f(x)\right\| \\&\quad {\mathop {=}\limits ^{(112)}}\left\| \frac{1}{\beta (n)}\left[ \frac{n}{\mu }\int _{S_p}f(x+\mu v)vdv-\int _{S_p}\frac{n}{\mu }f(x)vdv-\int _{S_p}\frac{n}{\mu }\langle \nabla f(x),\mu v\rangle vdv\right] \right\| \\&\quad \le \frac{n}{\beta (n)\mu }\int _{S_p}|f(x+\mu v)-f(x)-\langle \nabla f(x),\mu v\rangle |\Vert v\Vert dv \\&\quad \le \frac{n}{\beta (n)\mu }\frac{L\mu ^2}{2}\int _{S_p}\Vert v\Vert ^3dv=\frac{\mu n L}{2} . \end{aligned}$$

\(\square \)

Proposition 7.6

If \(f\in C^1_L(\mathbf {R}^n)\), and the \({\mathcal {SZO}}\) defined as \(g_\mu (x)=\frac{n}{\mu }[f(x+\mu v)-f(x)]v\), then we have

$$\begin{aligned} \mathbf{\mathsf E}_v\left[ \Vert g_\mu (x)\Vert ^2\right] \le 2n\Vert \nabla f(x)\Vert ^2+\frac{\mu ^2}{2}L^2n^2 . \end{aligned}$$
(115)

Proof

$$\begin{aligned} \mathbf{\mathsf E}_v[\Vert g_\mu (x)\Vert ^2]= & {} \frac{1}{\beta (n)}\int _{S_p}\frac{n^2}{\mu ^2}|f(x+\mu v)-f(x)|^2\Vert v\Vert ^2dv \\= & {} \frac{n^2}{\beta (n)\mu ^2}\int _{S_p}\left[ f(x+\mu v)-f(x)-\langle \nabla f(x),\mu v\rangle +\langle \nabla f(x),\mu v\rangle \right] ^2dv \\\le & {} \frac{n^2}{\beta (n)\mu ^2}\int _{S_p} \left[ 2\left( f(x+\mu v)-f(x)-\langle \nabla f(x),\mu v\rangle \right) ^2\right. \\&\left. +\,2\left( \langle \nabla f(x),\mu v\rangle \right) ^2\right] dv \\\le & {} \frac{n^2}{\beta (n)\mu ^2}\left[ \int _{S_p}2\left( \frac{L\mu ^2}{2}\Vert v\Vert ^2\right) ^2dv+2\mu ^2\int _{S_p}\nabla f(x)^{\top }vv^{\top }\nabla f(x)dv\right] \\&{\mathop {=}\limits ^{(112)}}\frac{n^2}{\beta (n)\mu ^2}\left[ \frac{L^2\mu ^4}{2}\beta (n)+2\mu ^2\frac{\beta (n)}{n}\Vert \nabla f(x)\Vert ^2\right] \\= & {} 2n\Vert \nabla f(x)\Vert ^2+\frac{\mu ^2}{2}L^2n^2 . \end{aligned}$$

\(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gao, X., Jiang, B. & Zhang, S. On the Information-Adaptive Variants of the ADMM: An Iteration Complexity Perspective. J Sci Comput 76, 327–363 (2018). https://doi.org/10.1007/s10915-017-0621-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10915-017-0621-6

Keywords

Mathematics Subject Classification

Navigation