Abstract
Large-scale machine learning problems are nowadays tackled by distributed optimization algorithms, i.e., algorithms that leverage multiple workers for training. However, collecting the information from all workers in every iteration is sometimes expensive or even prohibitive. In this paper, we propose an iterative algorithm called proximal reweighted incremental aggregated gradient (PRIAG) for solving a class of nonconvex and nonsmooth problems, which are ubiquitous in machine learning tasks and distributed optimization problems. In each iteration, this algorithm just needs the information from one worker due to the incremental aggregated method. Combined with the reweighted technique, we only require an easy-to-calculate proximal operator to deal with the nonconvex and nonsmooth properties. Using the Lyapunov function analysis method, we prove that the PRIAG algorithm is convergent under some mild assumptions. We apply this approach to nonconvex nonsmooth problems and distributed optimization tasks. Numerical experiments on both synthetic and real data sets show that our algorithm can achieve comparative learning performance, but more efficiently, compared with previous nonconvex solvers.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods. Math. Program. 137(1–2), 91–129 (2013)
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
Bertsekas, D.P.: Incremental gradient, subgradient, and proximal methods for convex optimization: a survey. Optim. Mach. Learn. 2010(1–38), 3 (2011)
Blatt, D., Hero, A.O., Gauchman, H.: A convergent incremental gradient method with a constant step size. SIAM J. Optim. 18(1), 29–51 (2007)
Chartrand, R., Yin, W.: Iteratively reweighted algorithms for compressive sensing. In: 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 3869–3872. IEEE (2008)
Chen, S.S., Donoho, D.L., Saunders, M.A.: Atomic decomposition by basis pursuit. SIAM Rev. 43(1), 129–159 (2001)
Chen, X., Ng, M.K., Zhang, C.: Non-lipschitz \(\ell _ p\)-regularization and box constrained model for image restoration. IEEE Trans. Image Process. 21(12), 4709–4721 (2012)
Combettes, P.L., Wajs, V.R.: Signal recovery by proximal forward-backward splitting. Multiscale Model. Simul. 4(4), 1168–1200 (2005)
Davis, D., Yin, W.: Convergence rate analysis of several splitting schemes. In: Glowinski, R., Osher, S.J., Yin, W. (eds.) Splitting Methods in Communication, Imaging, Science, and Engineering. SC, pp. 115–163. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-41589-5_4
Defazio, A., Bach, F., Lacoste-Julien, S.: SAGA: a fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014)
Donoho, D.L.: Compressed sensing. IEEE Trans. Inf. Theory 52(4), 1289–1306 (2006)
Duchi, J.C., Agarwal, A., Wainwright, M.J.: Dual averaging for distributed optimization: convergence analysis and network scaling. IEEE Trans. Autom. control 57(3), 592–606 (2011)
Figueiredo, M.A., Nowak, R.D., Wright, S.J.: Gradient projection for sparse reconstruction: application to compressed sensing and other inverse problems. IEEE J. Sel. Top. Signal Process. 1(4), 586–597 (2007)
Gasso, G., Rakotomamonjy, A., Canu, S.: Recovering sparse signals with a certain family of nonconvex penalties and DC programming. IEEE Trans. Signal Process. 57(12), 4686–4698 (2009)
Giannakis, G.B., Kekatos, V., Gatsis, N., Kim, S.J., Zhu, H., Wollenberg, B.F.: Monitoring and optimization for power grids: a signal processing perspective. IEEE Signal Process. Mag. 30(5), 107–128 (2013)
Gong, P., Ye, J., Zhang, C.: Robust multi-task feature learning. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 895–903 (2012)
Guo, F., Wen, C., Mao, J., Song, Y.D.: Distributed economic dispatch for smart grids with random wind power. IEEE Trans. Smart Grid 7(3), 1572–1583 (2015)
Jacob, L., Obozinski, G., Vert, J.P.: Group lasso with overlap and graph lasso. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 433–440 (2009)
Lai, M.J., Xu, Y., Yin, W.: Improved iteratively reweighted least squares for unconstrained smoothed \(\ell _q\) minimization. SIAM J. Numeric. Anal. 51(2), 927–957 (2013)
Lu, C., Wei, Y., Lin, Z., Yan, S.: Proximal iteratively reweighted algorithm with multiple splitting for nonconvex sparsity optimization. In: Twenty-Eighth AAAI Conference on Artificial Intelligence (2014)
Lu, C., Zhu, C., Xu, C., Yan, S., Lin, Z.: Generalized singular value thresholding. In: Twenty-Ninth AAAI Conference on Artificial Intelligence (2015)
Lu, Z., Zhang, Y.: Schatten-p quasi-norm regularized matrix optimization via iterative reweighted singular value minimization. arXiv preprint arXiv:1401.0869 (2015)
Mateos, G., Bazerque, J.A., Giannakis, G.B.: Distributed sparse linear regression. IEEE Trans. Signal Process. 58(10), 5262–5276 (2010)
Mordukhovich, B.S.: Variational Analysis and Generalized Differentiation I: Basic Theory, vol. 330. Springer, Heidelberg (2006)
Padakandla, A., Sundaresan, R.: Separable convex optimization problems with linear ascending constraints. SIAM J. Optim. 20(3), 1185–1204 (2010)
Parikh, N., Boyd, S., et al.: Proximal algorithms. Found. Trends® Optim. 1(3), 127–239 (2014)
Rabbat, M., Nowak, R.: Distributed optimization in sensor networks. In: Proceedings of the 3rd International Symposium on Information Processing in Sensor Networks, pp. 20–27 (2004)
Rabbat, M.G., Nowak, R.D.: Decentralized source localization and tracking [wireless sensor networks]. In: 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing. vol. 3, pp. iii–921. IEEE (2004)
Recht, B., Re, C., Wright, S., Niu, F.: Hogwild: a lock-free approach to parallelizing stochastic gradient descent. In: Advances in Neural Information Processing Systems, pp. 693–701 (2011)
Rockafellar, R.T., Wets, R.J.B.: Variational Analysis, vol. 317. Springer, Heidelberg (2009)
Shi, W., Ling, Q., Wu, G., Yin, W.: A proximal gradient algorithm for decentralized composite optimization. IEEE Trans. Signal Process. 63(22), 6013–6023 (2015)
Sun, T., Jiang, H., Cheng, L.: Convergence of proximal iteratively reweighted nuclear norm algorithm for image processing. IEEE Trans. Image Process. 26(12), 5632–5644 (2017)
Sun, T., Jiang, H., Cheng, L.: Global convergence of proximal iteratively reweighted algorithm. J. Global Optim. 68(4), 815–826 (2017). https://doi.org/10.1007/s10898-017-0507-z
Sun, T., Jiang, H., Cheng, L., Zhu, W.: Iteratively linearized reweighted alternating direction method of multipliers for a class of nonconvex problems. IEEE Trans. Signal Process. 66(20), 5380–5391 (2018)
Sun, T., Li, D., Jiang, H., Quan, Z.: Iteratively reweighted penalty alternating minimization methods with continuation for image deblurring. In: ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3757–3761. IEEE (2019)
Sun, T., Sun, Y., Li, D., Liao, Q.: General proximal incremental aggregated gradient algorithms: Better and novel results under general scheme. In: Advances in Neural Information Processing Systems, pp. 994–1004 (2019)
Vanli, N.D., Gurbuzbalaban, M., Ozdaglar, A.: Global convergence rate of proximal incremental aggregated gradient methods. SIAM J. Optim. 28(2), 1282–1300 (2018)
Weston, J., Elisseeff, A., Schölkopf, B., Tipping, M.: Use of the zero-norm with linear models and kernel methods. J. Mach. Learn. Res. 3(Mar), 1439–1461 (2003)
Wright, J., Yang, A.Y., Ganesh, A., Sastry, S.S., Ma, Y.: Robust face recognition via sparse representation. IEEE Trans. Pattern Anal. Mach. Intell. 31(2), 210–227 (2008)
Acknowledgement
This research was funded in part by the Core Electronic Devices, High-end Generic Chips, and Basic Software Major Special Projects (No. 2018ZX01028101), and in part by the National Natural Science Foundation of China (No. 61907034, No. 61932001, and No. 61906200).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
Appendix
Proof of Lemma 1
Note that \(x^{k+1}\) is obtained by (7), combining with the Definition 1 and 2 we have
With the convexity of g, we have
The assumption (b) implies that the sum function F is differentiable with L-continues gradient, i.e.,
where \(L=\sum _{i=1}^{m}L_i\). We can further get
Then we have
where \(w_j^k:=(h^{\prime }(g(x_j^k))\) and \(\tilde{g}=(g(x_1), g(x_2), \ldots , g(x_N))^{\top }\). The first inequality uses (29). The second inequality uses the concavity of h. The third inequality uses (27). In the following, we will give the bound of I. First, notice that \(\text {max}_{i,k}\{\tau _{i,k}\}\le \tau \), then
Combining with (7), we have
The first inequality uses the Lipschitz continuity of \(\nabla f_{i}\). The second inequality uses (31). Meanwhile, for any \(\xi >0\), we have the following Cauchy’s inequality
Then we have
If \(0<\gamma <\frac{2}{(2\tau +1)L}\), we can choose \(\xi >0\), such that
Then, with direct calculations and substitutions, we have
The first inequality uses (35). The last equation uses (36). We then prove the first result. By summing the inequality (37), we have:
The second then obviously holds. Using [Lemma 3, [9]], we are then led to
which directly derives the third one.
Proof of Theorem 1
By the definition of subdifferential, we have
That means
where \(H(x):=\sum _{j=1}^{n} h\left( g\left( x_{j}\right) \right) \). Thus we have
Combining with Lemma 1,
Still using [Lemma 3, [9]], the result can be proved.
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Deng, X., Sun, T., Liu, F., Huang, F. (2020). PRIAG: Proximal Reweighted Incremental Aggregated Gradient Algorithm for Distributed Optimizations. In: Qiu, M. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2020. Lecture Notes in Computer Science(), vol 12452. Springer, Cham. https://doi.org/10.1007/978-3-030-60245-1_34
Download citation
DOI: https://doi.org/10.1007/978-3-030-60245-1_34
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60244-4
Online ISBN: 978-3-030-60245-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)