A faster stochastic alternating direction method for large scale convex composite problems

Hu, Jia; Guo, Tiande; Zhao, Tong

doi:10.1007/s10489-022-03319-4

A faster stochastic alternating direction method for large scale convex composite problems

Published: 04 March 2022

Volume 52, pages 14233–14245, (2022)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

346 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

Inspired by the fact that certain randomization schemes incorporated into the stochastic (proximal) gradient methods allow for a large reduction in computational time, we incorporate such a scheme into stochastic alternating direction method of multipliers (ADMM), yielding a faster stochastic alternating direction method (FSADM) for solving a class of large scale convex composite problems. In the numerical experiments, we observe a reduction of this method in computational time compared to previous methods. More importantly, we unify the stochastic ADMM for solving general convex and strongly convex composite problems (i.e., the iterative scheme does not change when the the problem goes from strongly convex to general convex). In addition, we establish the convergence rates of FSADM for these two cases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Convergence analysis on a modified generalized alternating direction method of multipliers

Article Open access 08 June 2018

An inexact proximal generalized alternating direction method of multipliers

Article 12 May 2020

A Bregman-style Partially Symmetric Alternating Direction Method of Multipliers for Nonconvex Multi-block Optimization

Article 19 April 2023

Notes

https://www.csie.ntu.edu.tw/cjlin/libsvmtools/datasets/

References

Bottou L, Curtis FE, Nocedal J (2018) Optimization methods for large-scale machine learning. SIAM Rev 60(2):223–311
Article MathSciNet Google Scholar
Johnson R, Zhang T (2013) Accelerating stochastic gradient descent using predictive variance reduction. In: Advances in neural information processing systems, pp 315–323
Xiao L, Zhang T (2014) A proximal stochastic gradient method with progressive variance reduction. SIAM J Optim 24(4):2057–2075
Article MathSciNet Google Scholar
Zheng S, Kwok JT (2016) Fast-and-light stochastic admm. In: International joint conference on artificial intelligence, pp 2407–2613
Konečnỳ J, Liu J, Richtárik P, Takáč M (2015) Mini-batch semi-stochastic gradient descent in the proximal setting. IEEE J Sel Top Signal Process 10(2):242–255
Article Google Scholar
Ghadimi S, Lan G (2013) Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM J Optim 23(4):2341–2368
Article MathSciNet Google Scholar
Ghadimi S, Lan G, Zhang H (2016) Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Math Program 155(1-2):267–305
Article MathSciNet Google Scholar
Goldstein T, Osher S (2009) The split bregman method for l₁-regularized problems. SIAM J Imag Sci 2(2):323–343
Article MathSciNet Google Scholar
Yang J, Zhang Y (2011) Alternating direction algorithms for l₁-problem in compressive sensing. SIAM J Sci Comput 33(1-2):250–278
Article MathSciNet Google Scholar
Boyd S, Parikh N, Chu E, Peleato B, Eckstein J, et al. (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends Mach Learn 3 (1):1–122
Article Google Scholar
Glowinski R, Marroco A (1975) Sur l’approximation, par éléments finis d’ordre un, et la résolution, par pénalisation-dualité d’une classe de problèmes de dirichlet non linéaires. ESAIM: Math Modell Numer Anal-Modél Math Anal Numér 9(R2):41–76
MATH Google Scholar
Gabay D, Mercier B (1976) A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comput Math Appl 2(1):17–40
Article Google Scholar
Hestenes MR (1969) Multiplier and gradient methods. J Optim Theory Appl 4(5):303–320
Article MathSciNet Google Scholar
Powell MJD (1969) A method for nonlinear constraints in minimization problems. Optimization:283–298
Douglas J, Rachford HH (1956) On the numerical solution of heat conduction problems in two and three space variables. Trans Am Math Soc 82(2):421–439
Article MathSciNet Google Scholar
Eckstein J, Bertsekas DP (1992) On the douglas-rachford splitting method and the proximal point algorithm for maximal monotone operators. Math Program 55(1-3):293–318
Article MathSciNet Google Scholar
He B, Yuan X (2012) On the O(1/n) convergence rate of the douglas–rachford alternating direction method. SIAM J Numer Anal 50(2):700–709
Article MathSciNet Google Scholar
Monteiro Renato DC, Svaiter BF (2013) Iteration-complexity of block-decomposition algorithms and the alternating direction method of multipliers. SIAM J Optim 23(1):475–507
Article MathSciNet Google Scholar
He B, Yuan X (2015) On non-ergodic convergence rate of douglas–rachford alternating direction method of multipliers. Numer Math 130(3):567–577
Article MathSciNet Google Scholar
Deng W, Yin W (2016) On the global and linear convergence of the generalized alternating direction method of multipliers. J Sci Comput 66(3):889–916
Article MathSciNet Google Scholar
Li M, Sun D, Toh K-C (2015) A convergent 3-block semi-proximal admm for convex minimization problems with one strongly convex block. Asia-Pac J Oper Res 32(04):1550024
Article MathSciNet Google Scholar
Cai X, Han D, Yuan X (2017) On the convergence of the direct extension of admm for three-block separable convex minimization models with one strongly convex function. Comput Optim Appl 66(1):39–73
Article MathSciNet Google Scholar
Hong M, Luo Z-Q (2017) On the linear convergence of the alternating direction method of multipliers. Math Program 162(1-2):165–199
Article MathSciNet Google Scholar
Chen C, He B, Ye Y, Yuan X (2016) The direct extension of admm for multi-block convex minimization problems is not necessarily convergent. Math Program 155(1-2):57–79
Article MathSciNet Google Scholar
Kurdyka K (1998) On gradients of functions definable in o-minimal structures. Ann l’inst Four 48(3):769–783
Article MathSciNet Google Scholar
Lojasiewicz S (1963) Une propriété topologique des sous-ensembles analytiques réels. Les équ dérivées Partielles 117:87–89
MATH Google Scholar
Li G, Pong TK (2015) Global convergence of splitting methods for nonconvex composite optimization. SIAM J Optim 25(4):2434–2460
Article MathSciNet Google Scholar
Wang Y, Yin W, Zeng J (2019) Global convergence of admm in nonconvex nonsmooth optimization. J Sci Comput 78(1):29–63
Article MathSciNet Google Scholar
Jia Z, Gao X, Cai X, Han D (2021) Local linear convergence of the alternating direction method of multipliers for nonconvex separable optimization problems. J Optim Theory Appl 188(1):1–25
Article MathSciNet Google Scholar
Jiang B, Lin T, Ma S, Zhang S (2019) Structured nonconvex and nonsmooth optimization: algorithms and iteration complexity analysis. Comput Optim Appl 72(1):115–157
Article MathSciNet Google Scholar
Zhang J, Luo Z-Q (2020) A proximal alternating direction method of multiplier for linearly constrained nonconvex minimization. SIAM J Optim 30(3):2272–2302
Article MathSciNet Google Scholar
Han D-R (2022) A survey on some recent developments of alternating direction method of multipliers. J Oper Res Soc China:1–52
Ouyang H, He N, Tran L, Gray A (2013) Stochastic alternating direction method of multipliers. In: International conference on machine learning, pp 80–88
Wang H, Banerjee A (2013) Online alternating direction method (longer version). arXiv:1306.3721
Zhong W, Kwok J (2014) Fast stochastic alternating direction method of multipliers. In: International conference on machine learning, pp 46–54
Zhao SY, Li WJ, Zhou ZH (2015) Scalable stochastic alternating direction method of multipliers. arXiv:1502.03529
Chen C, Chen Y, Ouyang Y, Pasiliao E (2018) Stochastic accelerated alternating direction method of multipliers with importance sampling. J Optim Theory Appl 179(2):676–695
Article MathSciNet Google Scholar
Huang F, Chen S, Huang H (2019) Faster stochastic alternating direction method of multipliers for nonconvex optimization. In: International conference on machine learning, pp 2839–2848
Liu Y, Shang F, Cheng J (2017) Accelerated variance reduced stochastic admm. In: Proceedings of the AAAI conference on artificial intelligence, vol 31
Xu Y, Liu M, Yang T, Lin Q (2017) No more fixed penalty parameter in admm: Faster convergence with new adaptive penalization. In: Advances in neural information processing systems, pp 1267–1277
Zhao P, Yang J, Zhang T, Li P (2015) Adaptive stochastic alternating direction method of multipliers. In: International conference on machine learning, pp 69–77
Gao X, Jiang B, Zhang S (2018) On the information-adaptive variants of the admm: an iteration complexity perspective. J Sci Comput 76(1):327–363
Article MathSciNet Google Scholar
Xie Y, Shanbhag UV (2019) Si-admm: A stochastic inexact admm framework for stochastic convex programs. IEEE Trans Automat Control 65(6):2355–2370
Article MathSciNet Google Scholar
Roux NL, Schmidt M, Bach FR (2012) A stochastic gradient method with an exponential convergence rate for finite training sets. In: Advances in neural information processing systems, pp 2663–2671
Shalev-Shwartz S, Zhang T (2013) Stochastic dual coordinate ascent methods for regularized loss minimization. J Mach Learn Res 14:567–599
MathSciNet MATH Google Scholar
Nesterov Y (1983) A method of solving a convex programming problem with convergence rate o(1/k²). In: Sov. Math. Doklady, vol 27, pp 372–376
Nesterov Y (2013) Introductory lectures on convex optimization: A basic course, vol 87. Springer Science & Business Media, Berlin
Google Scholar
Zhou X, Yuan H, Li CJ, Sun Q (2020) Stochastic modified equations for continuous limit of stochastic admm. arXiv:2003.03532
Banerjee O, El Ghaoui L, d’Aspremont A (2008) Model selection through sparse maximum likelihood estimation for multivariate gaussian or binary data. J Mach Learn Res 9:485– 516
MathSciNet MATH Google Scholar

Download references

Acknowledgements

This work was supported by the Project of National Natural Science Foundation of China (11731013,11991022) and the Project of Promoting Scientific Research Ability of Excellent Young Teachers supported by the University of Chinese Academy of Sciences.

Author information

Authors and Affiliations

School of Mathematical Sciences, University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing, 100049, People’s Republic of China
Jia Hu, Tiande Guo & Tong Zhao
Key Laboratory of Big Data Mining and Knowledge Management, Chinese Academy of Sciences, No.80 Zhonguancun East Road, Beijing, 100190, People’s Republic of China
Tiande Guo & Tong Zhao

Authors

Jia Hu
View author publications
You can also search for this author in PubMed Google Scholar
Tiande Guo
View author publications
You can also search for this author in PubMed Google Scholar
Tong Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tong Zhao.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: Proof of Lemma 2

Proof

According to the optimality condition of subproblem (5), we have

$${v_{t}} + \rho {A^{T}}\left( {A{x_{t}} - {y_{t}} + {u_{t - 1}}} \right) + \frac{1}{\eta }\left( {{x_{t}} - {x_{t - 1}}} \right) = 0.$$

Arranging the above equality, we get

$${x_{t}} = {x_{t - 1}} - \eta {g_{t}}.$$

Define the function

$${\varphi_{t}}\left( x \right) = \frac{\rho }{2}{\left\| {Ax - {y_{t}} + {u_{t - 1}}} \right\|^{2}},$$

then

$$ \begin{aligned} {x_{t}} =& \underset{x}{\arg \min } {v_{t}^{T}}x + \frac{\rho }{2}{\left\| {Ax - {y_{t}} + {u_{t - 1}}} \right\|^{2}} + \frac{1}{{2\eta }}\left\| {x - {x_{t - 1}}} \right\|^{2}\\ =& \underset{x}{\arg \min } \eta {v_{t}^{T}}x + \frac{{\eta \rho }}{2}{\left\| {Ax - {y_{t}} + {u_{t - 1}}} \right\|^{2}} + \frac{1}{2}\left\| {x - {x_{t - 1}}} \right\|^{2}\\ =& \underset{x}{\arg \min } \eta {\varphi_{t}}\left( x \right) + \frac{1}{2}{\left\| {x - \left( {{x_{t - 1}} - \eta {v_{t}}} \right)} \right\|^{2}}\\ =& {\text{pro}}{{\text{x}}_{\eta {\varphi_{t}}\left( x \right)}}\left( {{x_{t - 1}} - \eta {v_{t}}} \right), \end{aligned} $$

where ${\text {pro}}{{\text {x}}_{\theta } }\left (\cdot \right )$ is a proximal operator of the function 𝜃. By the μ_f-convexity of f, we have for any $x \in {\text {dom}}\left (f \right )$,

$$f\left( x \right) \!\ge\! f\left( {{x_{t - 1}}} \right) + \nabla f{\left( {{x_{t - 1}}} \right)^{T}}\left( {x - {x_{t - 1}}} \right) + \frac{{{\mu_{f}}}}{2}{\left\| {x - {x_{t - 1}}} \right\|^{2}};$$

by smoothness of f, we can further lower bound $f\left ({x_{t-1}} \right )$ by

$$ \begin{array}{@{}rcl@{}} f\left( {{x_{t - 1}}} \right) &\ge& f\left( {{x_{t}}} \right) - \nabla f{\left( {{x_{t - 1}}} \right)^{T}}\left( {{x_{t}} - {x_{t - 1}}} \right) \\&&- \frac{{{L_{f}}}}{2}{\left\| {{x_{t}} - {x_{t - 1}}} \right\|^{2}}. \end{array} $$

Therefore

$$ \begin{aligned} f\left( x \right) \ge& f\left( {{x_{t}}} \right) - \nabla f{\left( {{x_{t - 1}}} \right)^{T}}\left( {{x_{t}} - {x_{t - 1}}} \right) - \frac{{{L_{f}}}}{2}{\left\| {{x_{t}} - {x_{t - 1}}} \right\|^{2}} \\ &+ \nabla f{\left( {{x_{t - 1}}} \right)^{T}}\left( {x - {x_{t - 1}}} \right) + \frac{{{\mu_{f}}}}{2}{\left\| {x - {x_{t - 1}}} \right\|^{2}}\\ =& f\left( {{x_{t}}} \right) - \nabla f{\left( {{x_{t - 1}}} \right)^{T}}\left( {{x_{t}} - {x_{t - 1}}} \right) - \frac{{{\eta^{2}}{L_{f}}}}{2}\left\| {{g_{t}}} \right\|^{2} \\ & + \nabla f{\left( {{x_{t - 1}}} \right)^{T}}\left( {x - {x_{t - 1}}} \right) + \frac{{{\mu_{f}}}}{2}{\left\| {x - {x_{t - 1}}} \right\|^{2}}, \end{aligned} $$

Collecting all inner products on the right hand side and adding a term ${\left ({{g_{t}} - {v_{t}}} \right )^{T}}\left ({x - {x_{t}}} \right )$, we have

$$ \begin{aligned} -& \nabla f{\left( {{x_{t - 1}}} \right)^{T}}\left( {{x_{t}} - {x_{t - 1}}} \right) + \nabla f{\left( {{x_{t - 1}}} \right)^{T}}\left( {x - {x_{t - 1}}} \right) \\&+ {\left( {{g_{t}} - {v_{t}}} \right)^{T}}\left( {x - {x_{t}}} \right)\\ &= \nabla f{\left( {{x_{t - 1}}} \right)^{T}}\left( {x - {x_{t}}} \right) + {\left( {{g_{t}} - {v_{t}}} \right)^{T}}\left( {x - {x_{t}}} \right)\\ &= {g_{t}^{T}}\left( {x - {x_{t}}} \right) + {\left( {{v_{t}} - \nabla f\left( {{x_{t - 1}}} \right)} \right)^{T}}\left( {{x_{t}} - x} \right)\\ &= {g_{t}^{T}}\left( {x - {x_{t - 1}} + {x_{t - 1}} - {x_{t}}} \right) + {\left( {{v_{t}} - \nabla f\left( {{x_{t - 1}}} \right)} \right)^{T}}\left( {{x_{t}} - x} \right)\\ &= {g_{t}^{T}}\left( {x - {x_{t - 1}}} \right) + \eta \left\| {{g_{t}}} \right\|^{2} + {\left( {{v_{t}} - \nabla f\left( {{x_{t - 1}}} \right)} \right)^{T}}\left( {{x_{t}} - x} \right). \end{aligned} $$

Hence using g_t = w_t + v_t and $\eta \le \frac {1}{{{L_{f}}}}$, we obtain

$$ \begin{aligned} f&\left( x \right) +{w_{t}^{T}}\left( {x - {x_{t}}} \right) \\ \ge f&\left( {{x_{t}}} \right) + {g_{t}^{T}}\left( {x - {x_{t - 1}}} \right) + \frac{\eta }{2}\left( {2 - {L_{f}}\eta } \right)\left\| {{g_{t}}} \right\|^{2} \\ & + \frac{{{\mu_{f}}}}{2}{\left\| {x - {x_{t - 1}}} \right\|^{2}} + {\left( {{v_{t}} - \nabla f\left( {{x_{t - 1}}} \right)} \right)^{T}}\left( {{x_{t}} - x} \right)\\ \ge f&\left( {{x_{t}}} \right) + {g_{t}^{T}}\left( {x - {x_{t - 1}}} \right) + \frac{\eta }{2}\left\| {{g_{t}}} \right\|^{2} + \frac{{{\mu_{f}}}}{2}{\left\| {x - {x_{t - 1}}} \right\|^{2}} \\ &+ {\left( {{v_{t}} - \nabla f\left( {{x_{t - 1}}} \right)} \right)^{T}}\left( {{x_{t}} - x} \right). \end{aligned} $$

□

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hu, J., Guo, T. & Zhao, T. A faster stochastic alternating direction method for large scale convex composite problems. Appl Intell 52, 14233–14245 (2022). https://doi.org/10.1007/s10489-022-03319-4

Download citation

Accepted: 28 January 2022
Published: 04 March 2022
Issue Date: September 2022
DOI: https://doi.org/10.1007/s10489-022-03319-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A faster stochastic alternating direction method for large scale convex composite problems

Abstract

Access this article

Similar content being viewed by others

Convergence analysis on a modified generalized alternating direction method of multipliers

An inexact proximal generalized alternating direction method of multipliers

A Bregman-style Partially Symmetric Alternating Direction Method of Multipliers for Nonconvex Multi-block Optimization

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Appendix A: Proof of Lemma 2

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A faster stochastic alternating direction method for large scale convex composite problems

Abstract

Access this article

Similar content being viewed by others

Convergence analysis on a modified generalized alternating direction method of multipliers

An inexact proximal generalized alternating direction method of multipliers

A Bregman-style Partially Symmetric Alternating Direction Method of Multipliers for Nonconvex Multi-block Optimization

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Appendix A: Proof of Lemma 2

Appendix A: Proof of Lemma 2

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation