Skip to main content
Log in

Incremental Gradient Algorithms with Stepsizes Bounded Away from Zero

  • Published:
Computational Optimization and Applications Aims and scope Submit manuscript

Abstract

We consider the class of incremental gradient methods for minimizing a sum of continuously differentiable functions. An important novel feature of our analysis is that the stepsizes are kept bounded away from zero. We derive the first convergence results of any kind for this computationally important case. In particular, we show that a certain ε-approximate solution can be obtained and establish the linear dependence of ε on the stepsize limit. Incremental gradient methods are particularly well-suited for large neural network training problems where obtaining an approximate solution is typically sufficient and is often preferable to computing an exact solution. Thus, in the context of neural networks, the approach presented here is related to the principle of tolerant training. Our results justify numerous stepsize rules that were derived on the basis of extensive numerical experimentation but for which no theoretical analysis was previously available. In addition, convergence to (exact) stationary points is established when the gradient satisfies a certain growth property.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. D.P. Bertsekas, “A new class of incremental gradient methods for least squares problems,” SIAM J. on Optimization, vol. 7, pp. 913-926, 1997.

    Article  Google Scholar 

  2. D.P. Bertsekas, Nonlinear Programming, Athena Scientific: Belmont, MA, 1995.

    Google Scholar 

  3. D.P. Bertsekas, “Incremental least squares methods and the extended Kalman filter,” SIAM Journal on Optimization, vol. 6, pp. 807-822, 1996.

    Google Scholar 

  4. D.P. Bertsekas and J.N. Tsitsiklis, Neuro-Dynamic Programming, Athena Scientific: Belmont, MA, 1996.

    Google Scholar 

  5. A. Cichocki and R. Unbehauen, Neural Networks for Optimization and Signal Processing, John Wiley & Sons: New York, 1994.

    Google Scholar 

  6. A.A. Gaivoronski, “Convergence properties of backpropagation for neural networks via theory of stochastic gradient methods. Part I,” Optimization Methods and Software, vol. 4, pp. 117-134, 1994.

    Google Scholar 

  7. T. Khanna, Foundations of Neural Networks, Addison-Wesley: NJ, 1989.

    Google Scholar 

  8. K. Lang, A.Waibel, and G. Hinton, “A time-delay neural network architecture for isolated word recognition,” Neural Networks, vol. 3, pp. 23-43, 1990.

    Article  Google Scholar 

  9. Z.-Q. Luo, “On the convergence of the LMS algorithm with adaptive learning rate for linear feedforward networks,” Neural Computation, vol. 3, pp. 226-245, 1991.

    Google Scholar 

  10. Z.-Q. Luo and P. Tseng, “Analysis of an approximate gradient projection method with applications to the backpropagation algorithm,” Optimization Methods and Software, vol. 4, pp. 85-101, 1994.

    Google Scholar 

  11. O.L. Mangasarian, “Mathematical programming in neural networks,” ORSA Journal on Computing, vol. 5,no. 4, pp. 349-360, 1993.

    Google Scholar 

  12. O.L. Mangasarian and M.V. Solodov, “Backpropagation convergence via deterministic nonmonotone perturbed minimization,” in Advances in Neural Information Processing Systems 6, G. Tesauro, J.D. Cowan, and J. Alspector (Eds.), Morgan Kaufmann: San Francisco, CA, 1994, pp. 383-390.

    Google Scholar 

  13. O.L. Mangasarian and M.V. Solodov, “Serial and parallel backpropagation convergence via nonmonotone perturbed minimization,” Optimization Methods and Software, vol. 4, pp. 103-116, 1994.

    Google Scholar 

  14. E. Polak, Computational Methods in Optimization: A Unified Approach, Academic Press: New York, 1971.

    Google Scholar 

  15. B.T. Polyak, Introduction to Optimization, Optimization Software, Inc. Publications Division: New York, 1987.

    Google Scholar 

  16. D.E. Rumelhart, G.E. Hinton, and R.J. Williams, “Learning internal representations by error propagation,” in Parallel Distributed Processing, D.E. Rumelhart and J.L. McClelland (Eds.), MIT Press: Cambridge, MA, 1986, pp. 318-362.

    Google Scholar 

  17. S. Shah, F. Palmieri, and M. Datum, “Optimal filtering algorithms for fast learning in feedforward neural networks,” Neural Networks, vol. 5, pp. 779-787, 1992.

    Google Scholar 

  18. M.V. Solodov, “Convergence analysis of perturbed feasible descent methods,” Journal of Optimization Theory and Applications, vol. 93, no.2, pp. 337-353, May 1997.

    Article  Google Scholar 

  19. M.V. Solodov and S.K. Zavriev, “Error-stabilty properties of generalized gradient-type algorithms,” Technical Report Mathematical Programming 94-05, Computer Science Department, University of Wisconsin, 1210 West Dayton Street, Madison, Wisconsin 53706, USA, June 1994. Journal of Optimization Theory and Applications, vol. 98, no.3, September 1998.

    Google Scholar 

  20. W.N. Street and O.L. Mangasarian, “Improved generalization via tolerant training,” Technical Report 95-11, Computer Sciences Department, University of Wisconsin, Madison, Wisconsin 53706, USA, July 1995. Journal of Optimization Theory and Applications, vol. 96, pp. 259-279, 1998.

    Google Scholar 

  21. P. Tseng, “Incremental gradient(-projection) method with momentum term and adaptive stepsize rule,” SIAM J. on Optimization, vol. 8, pp. 506-531, 1998.

    Article  Google Scholar 

  22. H. White, “Some asymptotic results for learning in single hidden-layer feedforward network models,” Journal of the American Statistical Association, vol. 84, no.408, pp. 1003-1013, 1989.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Solodov, M. Incremental Gradient Algorithms with Stepsizes Bounded Away from Zero. Computational Optimization and Applications 11, 23–35 (1998). https://doi.org/10.1023/A:1018366000512

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1018366000512

Navigation