Skip to main content
Log in

A faster stochastic alternating direction method for large scale convex composite problems

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Inspired by the fact that certain randomization schemes incorporated into the stochastic (proximal) gradient methods allow for a large reduction in computational time, we incorporate such a scheme into stochastic alternating direction method of multipliers (ADMM), yielding a faster stochastic alternating direction method (FSADM) for solving a class of large scale convex composite problems. In the numerical experiments, we observe a reduction of this method in computational time compared to previous methods. More importantly, we unify the stochastic ADMM for solving general convex and strongly convex composite problems (i.e., the iterative scheme does not change when the the problem goes from strongly convex to general convex). In addition, we establish the convergence rates of FSADM for these two cases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. https://www.csie.ntu.edu.tw/cjlin/libsvmtools/datasets/

References

  1. Bottou L, Curtis FE, Nocedal J (2018) Optimization methods for large-scale machine learning. SIAM Rev 60(2):223–311

    Article  MathSciNet  Google Scholar 

  2. Johnson R, Zhang T (2013) Accelerating stochastic gradient descent using predictive variance reduction. In: Advances in neural information processing systems, pp 315–323

  3. Xiao L, Zhang T (2014) A proximal stochastic gradient method with progressive variance reduction. SIAM J Optim 24(4):2057–2075

    Article  MathSciNet  Google Scholar 

  4. Zheng S, Kwok JT (2016) Fast-and-light stochastic admm. In: International joint conference on artificial intelligence, pp 2407–2613

  5. Konečnỳ J, Liu J, Richtárik P, Takáč M (2015) Mini-batch semi-stochastic gradient descent in the proximal setting. IEEE J Sel Top Signal Process 10(2):242–255

    Article  Google Scholar 

  6. Ghadimi S, Lan G (2013) Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM J Optim 23(4):2341–2368

    Article  MathSciNet  Google Scholar 

  7. Ghadimi S, Lan G, Zhang H (2016) Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Math Program 155(1-2):267–305

    Article  MathSciNet  Google Scholar 

  8. Goldstein T, Osher S (2009) The split bregman method for l1-regularized problems. SIAM J Imag Sci 2(2):323–343

    Article  MathSciNet  Google Scholar 

  9. Yang J, Zhang Y (2011) Alternating direction algorithms for l1-problem in compressive sensing. SIAM J Sci Comput 33(1-2):250–278

    Article  MathSciNet  Google Scholar 

  10. Boyd S, Parikh N, Chu E, Peleato B, Eckstein J, et al. (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends Mach Learn 3 (1):1–122

    Article  Google Scholar 

  11. Glowinski R, Marroco A (1975) Sur l’approximation, par éléments finis d’ordre un, et la résolution, par pénalisation-dualité d’une classe de problèmes de dirichlet non linéaires. ESAIM: Math Modell Numer Anal-Modél Math Anal Numér 9(R2):41–76

    MATH  Google Scholar 

  12. Gabay D, Mercier B (1976) A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comput Math Appl 2(1):17–40

    Article  Google Scholar 

  13. Hestenes MR (1969) Multiplier and gradient methods. J Optim Theory Appl 4(5):303–320

    Article  MathSciNet  Google Scholar 

  14. Powell MJD (1969) A method for nonlinear constraints in minimization problems. Optimization:283–298

  15. Douglas J, Rachford HH (1956) On the numerical solution of heat conduction problems in two and three space variables. Trans Am Math Soc 82(2):421–439

    Article  MathSciNet  Google Scholar 

  16. Eckstein J, Bertsekas DP (1992) On the douglas-rachford splitting method and the proximal point algorithm for maximal monotone operators. Math Program 55(1-3):293–318

    Article  MathSciNet  Google Scholar 

  17. He B, Yuan X (2012) On the O(1/n) convergence rate of the douglas–rachford alternating direction method. SIAM J Numer Anal 50(2):700–709

    Article  MathSciNet  Google Scholar 

  18. Monteiro Renato DC, Svaiter BF (2013) Iteration-complexity of block-decomposition algorithms and the alternating direction method of multipliers. SIAM J Optim 23(1):475–507

    Article  MathSciNet  Google Scholar 

  19. He B, Yuan X (2015) On non-ergodic convergence rate of douglas–rachford alternating direction method of multipliers. Numer Math 130(3):567–577

    Article  MathSciNet  Google Scholar 

  20. Deng W, Yin W (2016) On the global and linear convergence of the generalized alternating direction method of multipliers. J Sci Comput 66(3):889–916

    Article  MathSciNet  Google Scholar 

  21. Li M, Sun D, Toh K-C (2015) A convergent 3-block semi-proximal admm for convex minimization problems with one strongly convex block. Asia-Pac J Oper Res 32(04):1550024

    Article  MathSciNet  Google Scholar 

  22. Cai X, Han D, Yuan X (2017) On the convergence of the direct extension of admm for three-block separable convex minimization models with one strongly convex function. Comput Optim Appl 66(1):39–73

    Article  MathSciNet  Google Scholar 

  23. Hong M, Luo Z-Q (2017) On the linear convergence of the alternating direction method of multipliers. Math Program 162(1-2):165–199

    Article  MathSciNet  Google Scholar 

  24. Chen C, He B, Ye Y, Yuan X (2016) The direct extension of admm for multi-block convex minimization problems is not necessarily convergent. Math Program 155(1-2):57–79

    Article  MathSciNet  Google Scholar 

  25. Kurdyka K (1998) On gradients of functions definable in o-minimal structures. Ann l’inst Four 48(3):769–783

    Article  MathSciNet  Google Scholar 

  26. Lojasiewicz S (1963) Une propriété topologique des sous-ensembles analytiques réels. Les équ dérivées Partielles 117:87–89

    MATH  Google Scholar 

  27. Li G, Pong TK (2015) Global convergence of splitting methods for nonconvex composite optimization. SIAM J Optim 25(4):2434–2460

    Article  MathSciNet  Google Scholar 

  28. Wang Y, Yin W, Zeng J (2019) Global convergence of admm in nonconvex nonsmooth optimization. J Sci Comput 78(1):29–63

    Article  MathSciNet  Google Scholar 

  29. Jia Z, Gao X, Cai X, Han D (2021) Local linear convergence of the alternating direction method of multipliers for nonconvex separable optimization problems. J Optim Theory Appl 188(1):1–25

    Article  MathSciNet  Google Scholar 

  30. Jiang B, Lin T, Ma S, Zhang S (2019) Structured nonconvex and nonsmooth optimization: algorithms and iteration complexity analysis. Comput Optim Appl 72(1):115–157

    Article  MathSciNet  Google Scholar 

  31. Zhang J, Luo Z-Q (2020) A proximal alternating direction method of multiplier for linearly constrained nonconvex minimization. SIAM J Optim 30(3):2272–2302

    Article  MathSciNet  Google Scholar 

  32. Han D-R (2022) A survey on some recent developments of alternating direction method of multipliers. J Oper Res Soc China:1–52

  33. Ouyang H, He N, Tran L, Gray A (2013) Stochastic alternating direction method of multipliers. In: International conference on machine learning, pp 80–88

  34. Wang H, Banerjee A (2013) Online alternating direction method (longer version). arXiv:1306.3721

  35. Zhong W, Kwok J (2014) Fast stochastic alternating direction method of multipliers. In: International conference on machine learning, pp 46–54

  36. Zhao SY, Li WJ, Zhou ZH (2015) Scalable stochastic alternating direction method of multipliers. arXiv:1502.03529

  37. Chen C, Chen Y, Ouyang Y, Pasiliao E (2018) Stochastic accelerated alternating direction method of multipliers with importance sampling. J Optim Theory Appl 179(2):676–695

    Article  MathSciNet  Google Scholar 

  38. Huang F, Chen S, Huang H (2019) Faster stochastic alternating direction method of multipliers for nonconvex optimization. In: International conference on machine learning, pp 2839–2848

  39. Liu Y, Shang F, Cheng J (2017) Accelerated variance reduced stochastic admm. In: Proceedings of the AAAI conference on artificial intelligence, vol 31

  40. Xu Y, Liu M, Yang T, Lin Q (2017) No more fixed penalty parameter in admm: Faster convergence with new adaptive penalization. In: Advances in neural information processing systems, pp 1267–1277

  41. Zhao P, Yang J, Zhang T, Li P (2015) Adaptive stochastic alternating direction method of multipliers. In: International conference on machine learning, pp 69–77

  42. Gao X, Jiang B, Zhang S (2018) On the information-adaptive variants of the admm: an iteration complexity perspective. J Sci Comput 76(1):327–363

    Article  MathSciNet  Google Scholar 

  43. Xie Y, Shanbhag UV (2019) Si-admm: A stochastic inexact admm framework for stochastic convex programs. IEEE Trans Automat Control 65(6):2355–2370

    Article  MathSciNet  Google Scholar 

  44. Roux NL, Schmidt M, Bach FR (2012) A stochastic gradient method with an exponential convergence rate for finite training sets. In: Advances in neural information processing systems, pp 2663–2671

  45. Shalev-Shwartz S, Zhang T (2013) Stochastic dual coordinate ascent methods for regularized loss minimization. J Mach Learn Res 14:567–599

    MathSciNet  MATH  Google Scholar 

  46. Nesterov Y (1983) A method of solving a convex programming problem with convergence rate o(1/k2). In: Sov. Math. Doklady, vol 27, pp 372–376

  47. Nesterov Y (2013) Introductory lectures on convex optimization: A basic course, vol 87. Springer Science & Business Media, Berlin

    Google Scholar 

  48. Zhou X, Yuan H, Li CJ, Sun Q (2020) Stochastic modified equations for continuous limit of stochastic admm. arXiv:2003.03532

  49. Banerjee O, El Ghaoui L, d’Aspremont A (2008) Model selection through sparse maximum likelihood estimation for multivariate gaussian or binary data. J Mach Learn Res 9:485– 516

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

This work was supported by the Project of National Natural Science Foundation of China (11731013,11991022) and the Project of Promoting Scientific Research Ability of Excellent Young Teachers supported by the University of Chinese Academy of Sciences.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tong Zhao.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: Proof of Lemma 2

Appendix A: Proof of Lemma 2

Proof

According to the optimality condition of subproblem (5), we have

$${v_{t}} + \rho {A^{T}}\left( {A{x_{t}} - {y_{t}} + {u_{t - 1}}} \right) + \frac{1}{\eta }\left( {{x_{t}} - {x_{t - 1}}} \right) = 0.$$

Arranging the above equality, we get

$${x_{t}} = {x_{t - 1}} - \eta {g_{t}}.$$

Define the function

$${\varphi_{t}}\left( x \right) = \frac{\rho }{2}{\left\| {Ax - {y_{t}} + {u_{t - 1}}} \right\|^{2}},$$

then

$$ \begin{aligned} {x_{t}} =& \underset{x}{\arg \min } {v_{t}^{T}}x + \frac{\rho }{2}{\left\| {Ax - {y_{t}} + {u_{t - 1}}} \right\|^{2}} + \frac{1}{{2\eta }}\left\| {x - {x_{t - 1}}} \right\|^{2}\\ =& \underset{x}{\arg \min } \eta {v_{t}^{T}}x + \frac{{\eta \rho }}{2}{\left\| {Ax - {y_{t}} + {u_{t - 1}}} \right\|^{2}} + \frac{1}{2}\left\| {x - {x_{t - 1}}} \right\|^{2}\\ =& \underset{x}{\arg \min } \eta {\varphi_{t}}\left( x \right) + \frac{1}{2}{\left\| {x - \left( {{x_{t - 1}} - \eta {v_{t}}} \right)} \right\|^{2}}\\ =& {\text{pro}}{{\text{x}}_{\eta {\varphi_{t}}\left( x \right)}}\left( {{x_{t - 1}} - \eta {v_{t}}} \right), \end{aligned} $$

where \({\text {pro}}{{\text {x}}_{\theta } }\left (\cdot \right )\) is a proximal operator of the function 𝜃. By the μf-convexity of f, we have for any \(x \in {\text {dom}}\left (f \right )\),

$$f\left( x \right) \!\ge\! f\left( {{x_{t - 1}}} \right) + \nabla f{\left( {{x_{t - 1}}} \right)^{T}}\left( {x - {x_{t - 1}}} \right) + \frac{{{\mu_{f}}}}{2}{\left\| {x - {x_{t - 1}}} \right\|^{2}};$$

by smoothness of f, we can further lower bound \(f\left ({x_{t-1}} \right )\) by

$$ \begin{array}{@{}rcl@{}} f\left( {{x_{t - 1}}} \right) &\ge& f\left( {{x_{t}}} \right) - \nabla f{\left( {{x_{t - 1}}} \right)^{T}}\left( {{x_{t}} - {x_{t - 1}}} \right) \\&&- \frac{{{L_{f}}}}{2}{\left\| {{x_{t}} - {x_{t - 1}}} \right\|^{2}}. \end{array} $$

Therefore

$$ \begin{aligned} f\left( x \right) \ge& f\left( {{x_{t}}} \right) - \nabla f{\left( {{x_{t - 1}}} \right)^{T}}\left( {{x_{t}} - {x_{t - 1}}} \right) - \frac{{{L_{f}}}}{2}{\left\| {{x_{t}} - {x_{t - 1}}} \right\|^{2}} \\ &+ \nabla f{\left( {{x_{t - 1}}} \right)^{T}}\left( {x - {x_{t - 1}}} \right) + \frac{{{\mu_{f}}}}{2}{\left\| {x - {x_{t - 1}}} \right\|^{2}}\\ =& f\left( {{x_{t}}} \right) - \nabla f{\left( {{x_{t - 1}}} \right)^{T}}\left( {{x_{t}} - {x_{t - 1}}} \right) - \frac{{{\eta^{2}}{L_{f}}}}{2}\left\| {{g_{t}}} \right\|^{2} \\ & + \nabla f{\left( {{x_{t - 1}}} \right)^{T}}\left( {x - {x_{t - 1}}} \right) + \frac{{{\mu_{f}}}}{2}{\left\| {x - {x_{t - 1}}} \right\|^{2}}, \end{aligned} $$

Collecting all inner products on the right hand side and adding a term \({\left ({{g_{t}} - {v_{t}}} \right )^{T}}\left ({x - {x_{t}}} \right )\), we have

$$ \begin{aligned} -& \nabla f{\left( {{x_{t - 1}}} \right)^{T}}\left( {{x_{t}} - {x_{t - 1}}} \right) + \nabla f{\left( {{x_{t - 1}}} \right)^{T}}\left( {x - {x_{t - 1}}} \right) \\&+ {\left( {{g_{t}} - {v_{t}}} \right)^{T}}\left( {x - {x_{t}}} \right)\\ &= \nabla f{\left( {{x_{t - 1}}} \right)^{T}}\left( {x - {x_{t}}} \right) + {\left( {{g_{t}} - {v_{t}}} \right)^{T}}\left( {x - {x_{t}}} \right)\\ &= {g_{t}^{T}}\left( {x - {x_{t}}} \right) + {\left( {{v_{t}} - \nabla f\left( {{x_{t - 1}}} \right)} \right)^{T}}\left( {{x_{t}} - x} \right)\\ &= {g_{t}^{T}}\left( {x - {x_{t - 1}} + {x_{t - 1}} - {x_{t}}} \right) + {\left( {{v_{t}} - \nabla f\left( {{x_{t - 1}}} \right)} \right)^{T}}\left( {{x_{t}} - x} \right)\\ &= {g_{t}^{T}}\left( {x - {x_{t - 1}}} \right) + \eta \left\| {{g_{t}}} \right\|^{2} + {\left( {{v_{t}} - \nabla f\left( {{x_{t - 1}}} \right)} \right)^{T}}\left( {{x_{t}} - x} \right). \end{aligned} $$

Hence using gt = wt + vt and \(\eta \le \frac {1}{{{L_{f}}}}\), we obtain

$$ \begin{aligned} f&\left( x \right) +{w_{t}^{T}}\left( {x - {x_{t}}} \right) \\ \ge f&\left( {{x_{t}}} \right) + {g_{t}^{T}}\left( {x - {x_{t - 1}}} \right) + \frac{\eta }{2}\left( {2 - {L_{f}}\eta } \right)\left\| {{g_{t}}} \right\|^{2} \\ & + \frac{{{\mu_{f}}}}{2}{\left\| {x - {x_{t - 1}}} \right\|^{2}} + {\left( {{v_{t}} - \nabla f\left( {{x_{t - 1}}} \right)} \right)^{T}}\left( {{x_{t}} - x} \right)\\ \ge f&\left( {{x_{t}}} \right) + {g_{t}^{T}}\left( {x - {x_{t - 1}}} \right) + \frac{\eta }{2}\left\| {{g_{t}}} \right\|^{2} + \frac{{{\mu_{f}}}}{2}{\left\| {x - {x_{t - 1}}} \right\|^{2}} \\ &+ {\left( {{v_{t}} - \nabla f\left( {{x_{t - 1}}} \right)} \right)^{T}}\left( {{x_{t}} - x} \right). \end{aligned} $$

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hu, J., Guo, T. & Zhao, T. A faster stochastic alternating direction method for large scale convex composite problems. Appl Intell 52, 14233–14245 (2022). https://doi.org/10.1007/s10489-022-03319-4

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-03319-4

Keywords

Navigation