Skip to main content

Parallelized ADMM with General Objectives for Deep Learning

  • Conference paper
  • First Online:
Algorithms and Architectures for Parallel Processing (ICA3PP 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14489))

  • 85 Accesses

Abstract

While considerable efforts have been dedicated to improving models that employ regularized functions, the direct solution of non-convex models using most stochastic gradient optimization algorithms poses significant challenges due to their inherent non-convex nature. The Alternating Direction Method of Multipliers (ADMM) has emerged as a promising approach for addressing both convex and non-convex problems, boasting rapid convergence and effective constraint-handling capabilities. However, ADMM has not yet achieved significant advancements in the realm of non-convex regularized deep learning, and the development of parallelized ADMM techniques for non-convex objectives remains lacking. To address these challenges, this paper proposes the implementation of ADMM as a solution for solving general (non-convex regularized) deep learning tasks and presents a comprehensive analysis of its convergence properties. Furthermore, a parallelized framework for ADMM is proposed to address the absence of such advancements for general objectives. Experimental results reveal the stable convergence properties of ADMM when applied to non-convex objectives, demonstrating superior performance compared to ADMM with convex objectives. Additionally, we evaluate the computational efficiency of our proposed parallelized framework for ADMM.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ames, B.P., Hong, M.: Alternating direction method of multipliers for penalized zero-variance discriminant analysis. Comput. Optim. Appl. 64(3), 725–754 (2016)

    Article  MathSciNet  Google Scholar 

  2. Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J., et al.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends® Mach. Learn. 3(1), 1–122 (2011)

    Google Scholar 

  3. Candès, E.J., Li, X., Ma, Y., Wright, J.: Robust principal component analysis? J. ACM (JACM) 58(3), 11 (2011)

    Article  MathSciNet  Google Scholar 

  4. Chang, T.H., Hong, M., Liao, W.C., Wang, X.: Asynchronous distributed ADMM for large-scale optimization-part I: algorithm and convergence analysis. IEEE Trans. Sig. Process. 64, 3118–3130 (2015)

    Article  MathSciNet  Google Scholar 

  5. Chartrand, R., Wohlberg, B.: A nonconvex ADMM algorithm for group sparsity with sparse groups. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6009–6013. IEEE (2013)

    Google Scholar 

  6. Chen, C.C., Yang, C.L., Cheng, H.Y.: Efficient and robust parallel DNN training through model parallelism on multi-GPU platform. arXiv abs/1809.02839 (2018)

    Google Scholar 

  7. Dean, J., et al.: Large scale distributed deep networks. In: NIPS (2012)

    Google Scholar 

  8. Fortin, M., Glowinski, R.: Augmented Lagrangian methods: applications to the numerical solution of boundary-value problems (1983)

    Google Scholar 

  9. Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comput. Math. Appl. 2(1), 17–40 (1976)

    Google Scholar 

  10. Glowinski, R., Tallec, P.L.: Augmented Lagrangian and operator-splitting methods in nonlinear mechanics (1987)

    Google Scholar 

  11. Goldfarb, D., Ma, S., Scheinberg, K.: Fast alternating linearization methods for minimizing the sum of two convex functions. Math. Program. 141(1–2), 349–382 (2013)

    Article  MathSciNet  Google Scholar 

  12. Goldfarb, D., Qin, Z.: Robust low-rank tensor recovery: models and algorithms. SIAM J. Matrix Anal. Appl. 35(1), 225–253 (2014)

    Article  MathSciNet  Google Scholar 

  13. Guan, L., et al.: An efficient parallel and distributed solution to nonconvex penalized linear SVMs. Front. Inf. Technol. Electron. Eng. 21(4), 17 (2020)

    Article  Google Scholar 

  14. Guan, L., Yang, Z., Li, D., Lu, X.: pdlADMM: an ADMM-based framework for parallel deep learning training with efficiency. Neurocomputing 435, 264–272 (2021). https://doi.org/10.1016/j.neucom.2020.09.029

    Article  Google Scholar 

  15. He, B., Yuan, X.: On the o(1/n) convergence rate of the Douglas-Rachford alternating direction method. SIAM J. Numer. Anal. 50(2), 700–709 (2012)

    Article  MathSciNet  Google Scholar 

  16. Huang, F., Chen, S.: Mini-batch stochastic ADMMs for nonconvex nonsmooth optimization. arXiv preprint arXiv:1802.03284 (2018)

  17. Huang, F., Chen, S., Lu, Z.: Stochastic alternating direction method of multipliers with variance reduction for nonconvex optimization. arXiv preprint arXiv:1610.02758 (2016)

  18. Huo, Z., Gu, B., Yang, Q., Huang, H.: Decoupled parallel backpropagation with convergence guarantee. arXiv abs/1804.10574 (2018)

    Google Scholar 

  19. Jiang, S., Mei Lei, Y., Wang, S., Wang, D.: An asynchronous ADMM algorithm for distributed optimization with dynamic scheduling strategy. 2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS), pp. 1–8 (2019)

    Google Scholar 

  20. Kingma, D., Ba, J.: Adam: a method for stochastic optimization. Comput. Sci. (2014)

    Google Scholar 

  21. Kolda, T.G., Bader, B.W.: Tensor decompositions and applications. SIAM Rev. 51(3), 455–500 (2009)

    Article  MathSciNet  Google Scholar 

  22. Lan, Q., Qiao, L.B., Wang, Y.J.: Stochastic extra-gradient based alternating direction methods for graph-guided regularized minimization. Front. Inf. Technol. Electron. Eng. (006), 019 (2018)

    Google Scholar 

  23. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P., et al.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)

    Article  Google Scholar 

  24. Liavas, A.P., Sidiropoulos, N.D.: Parallel algorithms for constrained tensor factorization via alternating direction method of multipliers. IEEE Trans. Sig. Process. 63(20), 5450–5463 (2015)

    Article  MathSciNet  Google Scholar 

  25. Masuyama, Y., Kusano, T., Yatabe, K., Oikawa, Y.: Modal decomposition of musical instrument sound via alternating direction method of multipliers. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 631–635. IEEE (2018)

    Google Scholar 

  26. Monteiro, R.D., Svaiter, B.F.: Iteration-complexity of block-decomposition algorithms and the alternating minimization augmented Lagrangian method. Manuscript, School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA, pp. 30332–0205 (2010)

    Google Scholar 

  27. Robbins, H.E.: A stochastic approximation method. Ann. Math. Stat. 22, 400–407 (1951)

    Article  MathSciNet  Google Scholar 

  28. Shen, Y., Wen, Z., Zhang, Y.: Augmented Lagrangian alternating direction method for matrix separation based on low-rank factorization. Optim. Methods Softw. 29(2), 239–263 (2014)

    Article  MathSciNet  Google Scholar 

  29. Sun, D.L., Fevotte, C.: Alternating direction method of multipliers for non-negative matrix factorization with the beta-divergence. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6201–6205. IEEE (2014)

    Google Scholar 

  30. Sun, T., Jiang, H., Cheng, L., Zhu, W.: Iteratively linearized reweighted alternating direction method of multipliers for a class of nonconvex problems. IEEE Trans. Sig. Process. 66(20), 5380–5391 (2018)

    Article  MathSciNet  Google Scholar 

  31. Suzuki, T.: Dual averaging and proximal gradient descent for online alternating direction multiplier method. In: International Conference on Machine Learning, pp. 392–400 (2013)

    Google Scholar 

  32. Taylor, G., Burmeister, R., Xu, Z., Singh, B., Patel, A., Goldstein, T.: Training neural networks without gradients: A scalable ADMM approach. In: International Conference on Machine Learning, pp. 2722–2731 (2016)

    Google Scholar 

  33. Tieleman, T., Hinton, G.: Lecture 6.5-rmsprop, coursera: neural networks for machine learning. Technical report, University of Toronto (2012)

    Google Scholar 

  34. Wang, J., Chai, Z., Cheng, Y., Zhao, L.: Toward model parallelism for deep neural network based on gradient-free ADMM framework. In: Proceedings - IEEE International Conference on Data Mining, ICDM 2020, November, pp. 591–600 (2020). https://doi.org/10.1109/ICDM50108.2020.00068

  35. Wang, J., Li, H., Zhao, L.: Accelerated gradient-free neural network training by multi-convex alternating optimization. Neurocomputing 487, 130–143 (2022). https://doi.org/10.1016/j.neucom.2022.02.039

    Article  Google Scholar 

  36. Wang, J., Zhao, L.: Convergence and applications of ADMM on the multi-convex problems. In: Gama, J., Li, T., Yu, Y., Chen, E., Zheng, Y., Teng, F. (eds.) PAKDD 2022. LNCS (LNAI), vol. 13281, pp. 30–43. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-05936-0_3

    Chapter  Google Scholar 

  37. Wang, J., Yu, F., Chen, X., Zhao, L.: ADMM for efficient deep learning with global convergence. arXiv preprint arXiv:1905.13611 (2019)

  38. Wang, Y., Yin, W., Zeng, J.: Global convergence of ADMM in nonconvex nonsmooth optimization. J. Sci. Comput. 78(1), 29–63 (2019)

    Article  MathSciNet  Google Scholar 

  39. Xiao, H., Rasul, K., Vollgraf, R.: Fashion-mNIST: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747 (2017)

  40. Xu, Y., Yin, W., Wen, Z., Zhang, Y.: An alternating direction algorithm for matrix completion with nonnegative factors. Front. Math. China 7(2), 365–384 (2012)

    Article  MathSciNet  Google Scholar 

  41. Xu, Z., De, S., Figueiredo, M., Studer, C., Goldstein, T.: An empirical study of ADMM for nonconvex problems. arXiv preprint arXiv:1612.03349 (2016)

  42. Yang, J., Zhang, Y.: Alternating direction algorithms for \(\backslash \)ell_1-problems in compressive sensing. SIAM J. Sci. Comput. 33(1), 250–278 (2011)

    Article  MathSciNet  Google Scholar 

  43. Zheng, S., Kwok, J.T.: Stochastic variance-reduced ADMM. arXiv preprint arXiv:1604.07070 (2016)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Linbo Qiao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Shi, Y., Tang, Y., Zheng, H., Kan, Z., Qiao, L. (2024). Parallelized ADMM with General Objectives for Deep Learning. In: Tari, Z., Li, K., Wu, H. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2023. Lecture Notes in Computer Science, vol 14489. Springer, Singapore. https://doi.org/10.1007/978-981-97-0798-0_23

Download citation

  • DOI: https://doi.org/10.1007/978-981-97-0798-0_23

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-97-0797-3

  • Online ISBN: 978-981-97-0798-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics