Abstract
We consider incrementally updated gradient methods for minimizing the sum of smooth functions and a convex function. This method can use a (sufficiently small) constant stepsize or, more practically, an adaptive stepsize that is decreased whenever sufficient progress is not made. We show that if the gradients of the smooth functions are Lipschitz continuous on the space of n-dimensional real column vectors or the gradients of the smooth functions are bounded and Lipschitz continuous over a certain level set and the convex function is Lipschitz continuous on its domain, then every cluster point of the iterates generated by the method is a stationary point. If in addition a local Lipschitz error bound assumption holds, then the method is linearly convergent.
Similar content being viewed by others
References
Bertsekas, D.P.: A new class of incremental gradient methods for least squares problems. SIAM J. Optim. 7, 913–926 (1997)
Gaivoronski, A.A.: Convergence properties of back-propagation for neural nets via theory of stochastic gradient methods. Part I. Optim. Methods Softw. 4, 117–134 (1994)
Luo, Z.-Q., Tseng, P.: Analysis of an approximate gradient projection method with applications to the backpropagation algorithm. Optim. Methods Softw. 4, 85–101 (1994)
Mangasarian, O.L., Solodov, M.V.: Serial and parallel backpropagation convergence via nonmonotone perturbed minimization. Optim. Methods Softw. 4, 103–116 (1994)
White, H.: Learning in artificial neural networks: a statistical perspective. Neural Comput. 1, 425–464 (1989)
White, H.: Some asymptotic results for learning in single hidden-layer feedforward network models. J. Am. Stat. Assoc. 84, 1003–1013 (1989)
Luo, Z.-Q.: On the convergence of the LMS algorithm with adaptive learning rate for linear feedforward networks. Neural Comput. 3, 226–245 (1991)
Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comput. Optim. Appl. 11, 23–35 (1998)
Grippo, L.: A class of unconstrained minimization methods for neural network training. Optim. Methods Softw. 4, 135–150 (1994)
Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM J. Optim. 8, 506–531 (1998)
Blatt, D., Hero, A.O., Gauchman, H.: A convergent incremental gradient method with a constant step size. SIAM J. Optim. 18, 29–51 (2007)
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning internal representations by error propagation. In: Rumelhart, D.E., McClelland, J.L. (eds.) Parallel Distributed Processing—Explorations in the Microstructure of Cognition, pp. 318–362. MIT press, Cambridge (1986)
Tesauro, G., He, Y., Ahmad, S.: Asymptotic convergence of back propagation. Neural Comput. 1, 382–391 (1989)
Werbos, P.J.: Beyond regression: new tools for prediction and analysis in the behavioral sciences. Ph.D. Thesis, Committee on Applied Mathematics, Harvard University, Cambridge (1974)
Werbos, P.J.: Backpropagation through time: what it does and how to do it. Proc. IEEE 78, 1550–1560 (1990)
Chen, S., Donoho, D., Saunders, M.: Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 20, 33–61 (1999)
Daubechies, I., Defrise, M., De Mol, C.: An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Commun. Pure Appl. Math. 57, 1413–1457 (2004)
Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33, 1–22 (2010)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B 58, 267–288 (1996)
Tseng, P., Yun, S.: A coordinate gradient descent method for nonsmooth separable minimization. Math. Program. 117, 387–423 (2009)
Xiao, L.: Dual averaging methods for regularized stochastic learning and online optimization. J. Mach. Learn. Res. 11, 2543–2596 (2010)
Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1970)
Fukushima, M., Mine, H.: A generalized proximal point algorithm for certain non-convex minimization problems. Int. J. Syst. Sci. 12, 989–1000 (1981)
Bertsekas, D.P., Tsitsiklis, J.N.: Parallel and Distributed Computation: Numerical Methods. Prentice-Hall, Englewood Cliffs (1989)
Tseng, P.: On the rate of convergence of a partially asynchronous gradient projection algorithm. SIAM J. Optim. 1, 603–619 (1991)
Nocedal, J., Wright, S.J.: Numerical Optimization. Springer, New York (1999)
Bertsekas, D.P.: Nonlinear Programming, 2nd edn. Athena Scientific, Belmont (1999)
Hertz, J., Krogh, A., Palmer, R.G.: Introduction to the Theory of Neural Computation. Addison-Wesley, Redwood City (1991)
Denoeux, T., Lengellé, R.: Initializing back propagation networks with prototypes. Neural Netw. 6, 351–363 (1993)
Luo, Z.-Q., Tseng, P.: Error bounds and convergence analysis of feasible descent methods: a general approach. Ann. Oper. Res. 46, 157–178 (1993)
Rockafellar, R.T., Wets, R.J.-B.: Variational Analysis. Springer, New York (1998)
Koh, K., Kim, S.-J., Boyd, S.: An interior-point method for large-scale ℓ 1-regularized logistic regression. J. Mach. Learn. Res. 8, 1519–1555 (2007)
Beck, A., Teboulle, M.: Gradient-Based Algorithms with Applications in Signal Recovery Problems. In: Palomar, D., Eldar, Y. (eds.) Convex Optimization in Signal Processing and Communications, pp. 33–88. Cambribge University Press, Cambribge (2010)
Nesterov, Y.: Primal-dual subgradient methods for convex problems. Math. Program. 120, 221–259 (2009)
Juditsky, A., Lan, G., Nemirovski, A., Shapiro, A.: Stochastic approximation approach to stochastic programming. SIAM J. Optim. 19, 1574–1609 (2009)
Tseng, P., Yun, S.: Incrementally updated gradient methods for constrained and regularized optimization. Report, Department of Mathematics Education, Sungkyunkwan University, Seoul (2012)
Acknowledgements
This research was supported by the Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education, Science and Technology(2012R1A1A1006406).
We thank anonymous referees for their detailed comments to improve the paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Masao Fukusima.
Rights and permissions
About this article
Cite this article
Tseng, P., Yun, S. Incrementally Updated Gradient Methods for Constrained and Regularized Optimization. J Optim Theory Appl 160, 832–853 (2014). https://doi.org/10.1007/s10957-013-0409-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10957-013-0409-2