Abstract
While considerable efforts have been dedicated to improving models that employ regularized functions, the direct solution of non-convex models using most stochastic gradient optimization algorithms poses significant challenges due to their inherent non-convex nature. The Alternating Direction Method of Multipliers (ADMM) has emerged as a promising approach for addressing both convex and non-convex problems, boasting rapid convergence and effective constraint-handling capabilities. However, ADMM has not yet achieved significant advancements in the realm of non-convex regularized deep learning, and the development of parallelized ADMM techniques for non-convex objectives remains lacking. To address these challenges, this paper proposes the implementation of ADMM as a solution for solving general (non-convex regularized) deep learning tasks and presents a comprehensive analysis of its convergence properties. Furthermore, a parallelized framework for ADMM is proposed to address the absence of such advancements for general objectives. Experimental results reveal the stable convergence properties of ADMM when applied to non-convex objectives, demonstrating superior performance compared to ADMM with convex objectives. Additionally, we evaluate the computational efficiency of our proposed parallelized framework for ADMM.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ames, B.P., Hong, M.: Alternating direction method of multipliers for penalized zero-variance discriminant analysis. Comput. Optim. Appl. 64(3), 725–754 (2016)
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J., et al.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends® Mach. Learn. 3(1), 1–122 (2011)
Candès, E.J., Li, X., Ma, Y., Wright, J.: Robust principal component analysis? J. ACM (JACM) 58(3), 11 (2011)
Chang, T.H., Hong, M., Liao, W.C., Wang, X.: Asynchronous distributed ADMM for large-scale optimization-part I: algorithm and convergence analysis. IEEE Trans. Sig. Process. 64, 3118–3130 (2015)
Chartrand, R., Wohlberg, B.: A nonconvex ADMM algorithm for group sparsity with sparse groups. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6009–6013. IEEE (2013)
Chen, C.C., Yang, C.L., Cheng, H.Y.: Efficient and robust parallel DNN training through model parallelism on multi-GPU platform. arXiv abs/1809.02839 (2018)
Dean, J., et al.: Large scale distributed deep networks. In: NIPS (2012)
Fortin, M., Glowinski, R.: Augmented Lagrangian methods: applications to the numerical solution of boundary-value problems (1983)
Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comput. Math. Appl. 2(1), 17–40 (1976)
Glowinski, R., Tallec, P.L.: Augmented Lagrangian and operator-splitting methods in nonlinear mechanics (1987)
Goldfarb, D., Ma, S., Scheinberg, K.: Fast alternating linearization methods for minimizing the sum of two convex functions. Math. Program. 141(1–2), 349–382 (2013)
Goldfarb, D., Qin, Z.: Robust low-rank tensor recovery: models and algorithms. SIAM J. Matrix Anal. Appl. 35(1), 225–253 (2014)
Guan, L., et al.: An efficient parallel and distributed solution to nonconvex penalized linear SVMs. Front. Inf. Technol. Electron. Eng. 21(4), 17 (2020)
Guan, L., Yang, Z., Li, D., Lu, X.: pdlADMM: an ADMM-based framework for parallel deep learning training with efficiency. Neurocomputing 435, 264–272 (2021). https://doi.org/10.1016/j.neucom.2020.09.029
He, B., Yuan, X.: On the o(1/n) convergence rate of the Douglas-Rachford alternating direction method. SIAM J. Numer. Anal. 50(2), 700–709 (2012)
Huang, F., Chen, S.: Mini-batch stochastic ADMMs for nonconvex nonsmooth optimization. arXiv preprint arXiv:1802.03284 (2018)
Huang, F., Chen, S., Lu, Z.: Stochastic alternating direction method of multipliers with variance reduction for nonconvex optimization. arXiv preprint arXiv:1610.02758 (2016)
Huo, Z., Gu, B., Yang, Q., Huang, H.: Decoupled parallel backpropagation with convergence guarantee. arXiv abs/1804.10574 (2018)
Jiang, S., Mei Lei, Y., Wang, S., Wang, D.: An asynchronous ADMM algorithm for distributed optimization with dynamic scheduling strategy. 2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS), pp. 1–8 (2019)
Kingma, D., Ba, J.: Adam: a method for stochastic optimization. Comput. Sci. (2014)
Kolda, T.G., Bader, B.W.: Tensor decompositions and applications. SIAM Rev. 51(3), 455–500 (2009)
Lan, Q., Qiao, L.B., Wang, Y.J.: Stochastic extra-gradient based alternating direction methods for graph-guided regularized minimization. Front. Inf. Technol. Electron. Eng. (006), 019 (2018)
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P., et al.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Liavas, A.P., Sidiropoulos, N.D.: Parallel algorithms for constrained tensor factorization via alternating direction method of multipliers. IEEE Trans. Sig. Process. 63(20), 5450–5463 (2015)
Masuyama, Y., Kusano, T., Yatabe, K., Oikawa, Y.: Modal decomposition of musical instrument sound via alternating direction method of multipliers. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 631–635. IEEE (2018)
Monteiro, R.D., Svaiter, B.F.: Iteration-complexity of block-decomposition algorithms and the alternating minimization augmented Lagrangian method. Manuscript, School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA, pp. 30332–0205 (2010)
Robbins, H.E.: A stochastic approximation method. Ann. Math. Stat. 22, 400–407 (1951)
Shen, Y., Wen, Z., Zhang, Y.: Augmented Lagrangian alternating direction method for matrix separation based on low-rank factorization. Optim. Methods Softw. 29(2), 239–263 (2014)
Sun, D.L., Fevotte, C.: Alternating direction method of multipliers for non-negative matrix factorization with the beta-divergence. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6201–6205. IEEE (2014)
Sun, T., Jiang, H., Cheng, L., Zhu, W.: Iteratively linearized reweighted alternating direction method of multipliers for a class of nonconvex problems. IEEE Trans. Sig. Process. 66(20), 5380–5391 (2018)
Suzuki, T.: Dual averaging and proximal gradient descent for online alternating direction multiplier method. In: International Conference on Machine Learning, pp. 392–400 (2013)
Taylor, G., Burmeister, R., Xu, Z., Singh, B., Patel, A., Goldstein, T.: Training neural networks without gradients: A scalable ADMM approach. In: International Conference on Machine Learning, pp. 2722–2731 (2016)
Tieleman, T., Hinton, G.: Lecture 6.5-rmsprop, coursera: neural networks for machine learning. Technical report, University of Toronto (2012)
Wang, J., Chai, Z., Cheng, Y., Zhao, L.: Toward model parallelism for deep neural network based on gradient-free ADMM framework. In: Proceedings - IEEE International Conference on Data Mining, ICDM 2020, November, pp. 591–600 (2020). https://doi.org/10.1109/ICDM50108.2020.00068
Wang, J., Li, H., Zhao, L.: Accelerated gradient-free neural network training by multi-convex alternating optimization. Neurocomputing 487, 130–143 (2022). https://doi.org/10.1016/j.neucom.2022.02.039
Wang, J., Zhao, L.: Convergence and applications of ADMM on the multi-convex problems. In: Gama, J., Li, T., Yu, Y., Chen, E., Zheng, Y., Teng, F. (eds.) PAKDD 2022. LNCS (LNAI), vol. 13281, pp. 30–43. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-05936-0_3
Wang, J., Yu, F., Chen, X., Zhao, L.: ADMM for efficient deep learning with global convergence. arXiv preprint arXiv:1905.13611 (2019)
Wang, Y., Yin, W., Zeng, J.: Global convergence of ADMM in nonconvex nonsmooth optimization. J. Sci. Comput. 78(1), 29–63 (2019)
Xiao, H., Rasul, K., Vollgraf, R.: Fashion-mNIST: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747 (2017)
Xu, Y., Yin, W., Wen, Z., Zhang, Y.: An alternating direction algorithm for matrix completion with nonnegative factors. Front. Math. China 7(2), 365–384 (2012)
Xu, Z., De, S., Figueiredo, M., Studer, C., Goldstein, T.: An empirical study of ADMM for nonconvex problems. arXiv preprint arXiv:1612.03349 (2016)
Yang, J., Zhang, Y.: Alternating direction algorithms for \(\backslash \)ell_1-problems in compressive sensing. SIAM J. Sci. Comput. 33(1), 250–278 (2011)
Zheng, S., Kwok, J.T.: Stochastic variance-reduced ADMM. arXiv preprint arXiv:1604.07070 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Shi, Y., Tang, Y., Zheng, H., Kan, Z., Qiao, L. (2024). Parallelized ADMM with General Objectives for Deep Learning. In: Tari, Z., Li, K., Wu, H. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2023. Lecture Notes in Computer Science, vol 14489. Springer, Singapore. https://doi.org/10.1007/978-981-97-0798-0_23
Download citation
DOI: https://doi.org/10.1007/978-981-97-0798-0_23
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-0797-3
Online ISBN: 978-981-97-0798-0
eBook Packages: Computer ScienceComputer Science (R0)