Parallelized ADMM with General Objectives for Deep Learning

Shi, Yanqi; Tang, Yu; Zheng, Hao; Kan, Zhigang; Qiao, Linbo

doi:10.1007/978-981-97-0798-0_23

Yanqi Shi¹⁰,
Yu Tang¹⁰,
Hao Zheng¹⁰,
Zhigang Kan¹⁰ &
…
Linbo Qiao¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14489))

Included in the following conference series:

International Conference on Algorithms and Architectures for Parallel Processing

85 Accesses

Abstract

While considerable efforts have been dedicated to improving models that employ regularized functions, the direct solution of non-convex models using most stochastic gradient optimization algorithms poses significant challenges due to their inherent non-convex nature. The Alternating Direction Method of Multipliers (ADMM) has emerged as a promising approach for addressing both convex and non-convex problems, boasting rapid convergence and effective constraint-handling capabilities. However, ADMM has not yet achieved significant advancements in the realm of non-convex regularized deep learning, and the development of parallelized ADMM techniques for non-convex objectives remains lacking. To address these challenges, this paper proposes the implementation of ADMM as a solution for solving general (non-convex regularized) deep learning tasks and presents a comprehensive analysis of its convergence properties. Furthermore, a parallelized framework for ADMM is proposed to address the absence of such advancements for general objectives. Experimental results reveal the stable convergence properties of ADMM when applied to non-convex objectives, demonstrating superior performance compared to ADMM with convex objectives. Additionally, we evaluate the computational efficiency of our proposed parallelized framework for ADMM.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ames, B.P., Hong, M.: Alternating direction method of multipliers for penalized zero-variance discriminant analysis. Comput. Optim. Appl. 64(3), 725–754 (2016)
Article MathSciNet Google Scholar
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J., et al.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends® Mach. Learn. 3(1), 1–122 (2011)
Google Scholar
Candès, E.J., Li, X., Ma, Y., Wright, J.: Robust principal component analysis? J. ACM (JACM) 58(3), 11 (2011)
Article MathSciNet Google Scholar
Chang, T.H., Hong, M., Liao, W.C., Wang, X.: Asynchronous distributed ADMM for large-scale optimization-part I: algorithm and convergence analysis. IEEE Trans. Sig. Process. 64, 3118–3130 (2015)
Article MathSciNet Google Scholar
Chartrand, R., Wohlberg, B.: A nonconvex ADMM algorithm for group sparsity with sparse groups. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6009–6013. IEEE (2013)
Google Scholar
Chen, C.C., Yang, C.L., Cheng, H.Y.: Efficient and robust parallel DNN training through model parallelism on multi-GPU platform. arXiv abs/1809.02839 (2018)
Google Scholar
Dean, J., et al.: Large scale distributed deep networks. In: NIPS (2012)
Google Scholar
Fortin, M., Glowinski, R.: Augmented Lagrangian methods: applications to the numerical solution of boundary-value problems (1983)
Google Scholar
Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comput. Math. Appl. 2(1), 17–40 (1976)
Google Scholar
Glowinski, R., Tallec, P.L.: Augmented Lagrangian and operator-splitting methods in nonlinear mechanics (1987)
Google Scholar
Goldfarb, D., Ma, S., Scheinberg, K.: Fast alternating linearization methods for minimizing the sum of two convex functions. Math. Program. 141(1–2), 349–382 (2013)
Article MathSciNet Google Scholar
Goldfarb, D., Qin, Z.: Robust low-rank tensor recovery: models and algorithms. SIAM J. Matrix Anal. Appl. 35(1), 225–253 (2014)
Article MathSciNet Google Scholar
Guan, L., et al.: An efficient parallel and distributed solution to nonconvex penalized linear SVMs. Front. Inf. Technol. Electron. Eng. 21(4), 17 (2020)
Article Google Scholar
Guan, L., Yang, Z., Li, D., Lu, X.: pdlADMM: an ADMM-based framework for parallel deep learning training with efficiency. Neurocomputing 435, 264–272 (2021). https://doi.org/10.1016/j.neucom.2020.09.029
Article Google Scholar
He, B., Yuan, X.: On the o(1/n) convergence rate of the Douglas-Rachford alternating direction method. SIAM J. Numer. Anal. 50(2), 700–709 (2012)
Article MathSciNet Google Scholar
Huang, F., Chen, S.: Mini-batch stochastic ADMMs for nonconvex nonsmooth optimization. arXiv preprint arXiv:1802.03284 (2018)
Huang, F., Chen, S., Lu, Z.: Stochastic alternating direction method of multipliers with variance reduction for nonconvex optimization. arXiv preprint arXiv:1610.02758 (2016)
Huo, Z., Gu, B., Yang, Q., Huang, H.: Decoupled parallel backpropagation with convergence guarantee. arXiv abs/1804.10574 (2018)
Google Scholar
Jiang, S., Mei Lei, Y., Wang, S., Wang, D.: An asynchronous ADMM algorithm for distributed optimization with dynamic scheduling strategy. 2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS), pp. 1–8 (2019)
Google Scholar
Kingma, D., Ba, J.: Adam: a method for stochastic optimization. Comput. Sci. (2014)
Google Scholar
Kolda, T.G., Bader, B.W.: Tensor decompositions and applications. SIAM Rev. 51(3), 455–500 (2009)
Article MathSciNet Google Scholar
Lan, Q., Qiao, L.B., Wang, Y.J.: Stochastic extra-gradient based alternating direction methods for graph-guided regularized minimization. Front. Inf. Technol. Electron. Eng. (006), 019 (2018)
Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P., et al.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Liavas, A.P., Sidiropoulos, N.D.: Parallel algorithms for constrained tensor factorization via alternating direction method of multipliers. IEEE Trans. Sig. Process. 63(20), 5450–5463 (2015)
Article MathSciNet Google Scholar
Masuyama, Y., Kusano, T., Yatabe, K., Oikawa, Y.: Modal decomposition of musical instrument sound via alternating direction method of multipliers. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 631–635. IEEE (2018)
Google Scholar
Monteiro, R.D., Svaiter, B.F.: Iteration-complexity of block-decomposition algorithms and the alternating minimization augmented Lagrangian method. Manuscript, School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA, pp. 30332–0205 (2010)
Google Scholar
Robbins, H.E.: A stochastic approximation method. Ann. Math. Stat. 22, 400–407 (1951)
Article MathSciNet Google Scholar
Shen, Y., Wen, Z., Zhang, Y.: Augmented Lagrangian alternating direction method for matrix separation based on low-rank factorization. Optim. Methods Softw. 29(2), 239–263 (2014)
Article MathSciNet Google Scholar
Sun, D.L., Fevotte, C.: Alternating direction method of multipliers for non-negative matrix factorization with the beta-divergence. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6201–6205. IEEE (2014)
Google Scholar
Sun, T., Jiang, H., Cheng, L., Zhu, W.: Iteratively linearized reweighted alternating direction method of multipliers for a class of nonconvex problems. IEEE Trans. Sig. Process. 66(20), 5380–5391 (2018)
Article MathSciNet Google Scholar
Suzuki, T.: Dual averaging and proximal gradient descent for online alternating direction multiplier method. In: International Conference on Machine Learning, pp. 392–400 (2013)
Google Scholar
Taylor, G., Burmeister, R., Xu, Z., Singh, B., Patel, A., Goldstein, T.: Training neural networks without gradients: A scalable ADMM approach. In: International Conference on Machine Learning, pp. 2722–2731 (2016)
Google Scholar
Tieleman, T., Hinton, G.: Lecture 6.5-rmsprop, coursera: neural networks for machine learning. Technical report, University of Toronto (2012)
Google Scholar
Wang, J., Chai, Z., Cheng, Y., Zhao, L.: Toward model parallelism for deep neural network based on gradient-free ADMM framework. In: Proceedings - IEEE International Conference on Data Mining, ICDM 2020, November, pp. 591–600 (2020). https://doi.org/10.1109/ICDM50108.2020.00068
Wang, J., Li, H., Zhao, L.: Accelerated gradient-free neural network training by multi-convex alternating optimization. Neurocomputing 487, 130–143 (2022). https://doi.org/10.1016/j.neucom.2022.02.039
Article Google Scholar
Wang, J., Zhao, L.: Convergence and applications of ADMM on the multi-convex problems. In: Gama, J., Li, T., Yu, Y., Chen, E., Zheng, Y., Teng, F. (eds.) PAKDD 2022. LNCS (LNAI), vol. 13281, pp. 30–43. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-05936-0_3
Chapter Google Scholar
Wang, J., Yu, F., Chen, X., Zhao, L.: ADMM for efficient deep learning with global convergence. arXiv preprint arXiv:1905.13611 (2019)
Wang, Y., Yin, W., Zeng, J.: Global convergence of ADMM in nonconvex nonsmooth optimization. J. Sci. Comput. 78(1), 29–63 (2019)
Article MathSciNet Google Scholar
Xiao, H., Rasul, K., Vollgraf, R.: Fashion-mNIST: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747 (2017)
Xu, Y., Yin, W., Wen, Z., Zhang, Y.: An alternating direction algorithm for matrix completion with nonnegative factors. Front. Math. China 7(2), 365–384 (2012)
Article MathSciNet Google Scholar
Xu, Z., De, S., Figueiredo, M., Studer, C., Goldstein, T.: An empirical study of ADMM for nonconvex problems. arXiv preprint arXiv:1612.03349 (2016)
Yang, J., Zhang, Y.: Alternating direction algorithms for \(\backslash \)ell_1-problems in compressive sensing. SIAM J. Sci. Comput. 33(1), 250–278 (2011)
Article MathSciNet Google Scholar
Zheng, S., Kwok, J.T.: Stochastic variance-reduced ADMM. arXiv preprint arXiv:1604.07070 (2016)

Download references

Author information

Authors and Affiliations

National University of Defense Technology, Changsha, 410073, China
Yanqi Shi, Yu Tang, Hao Zheng, Zhigang Kan & Linbo Qiao

Authors

Yanqi Shi
View author publications
You can also search for this author in PubMed Google Scholar
Yu Tang
View author publications
You can also search for this author in PubMed Google Scholar
Hao Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Zhigang Kan
View author publications
You can also search for this author in PubMed Google Scholar
Linbo Qiao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Linbo Qiao .

Editor information

Editors and Affiliations

Royal Melbourne Institute of Technology, Melbourne, VIC, Australia
Zahir Tari
Tianjin University, Tianjin, China
Keqiu Li
University of Arizona, Tucson, AZ, USA
Hongyi Wu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shi, Y., Tang, Y., Zheng, H., Kan, Z., Qiao, L. (2024). Parallelized ADMM with General Objectives for Deep Learning. In: Tari, Z., Li, K., Wu, H. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2023. Lecture Notes in Computer Science, vol 14489. Springer, Singapore. https://doi.org/10.1007/978-981-97-0798-0_23

Download citation

DOI: https://doi.org/10.1007/978-981-97-0798-0_23
Published: 01 March 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-0797-3
Online ISBN: 978-981-97-0798-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Parallelized ADMM with General Objectives for Deep Learning