Skip to main content
Log in

Nonmonotone Levenberg–Marquardt training of recurrent neural architectures for processing symbolic sequences

  • EANN 2009
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

In this paper, we present nonmonotone variants of the Levenberg–Marquardt (LM) method for training recurrent neural networks (RNNs). These methods inherit the benefits of previously developed LM with momentum algorithms and are equipped with nonmonotone criteria, allowing temporal increase in training errors, and an adaptive scheme for tuning the size of the nonmonotone slide window. The proposed algorithms are applied to training RNNs of various sizes and architectures in symbolic sequence-processing problems. Experiments show that the proposed nonmonotone learning algorithms train more effectively RNNs for sequence processing than the original monotone methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Ampazis N, Perantonis SJ (2002) Two highly efficient second-order algorithms for training feedforward networks. IEEE Trans Neural Netw 13:1064–1074

    Article  Google Scholar 

  2. Chu YC, Huang J (1999) A neural-network method for the nonlinear servomechanism problem. IEEE Trans Neural Netw 10:1412–1423

    Article  Google Scholar 

  3. Chen K, Xu L, Chi H (1999) Improved learning algorithms for mixture of experts in multiclass classification. Neural Netw 12:1229–1252

    Article  Google Scholar 

  4. Elman JL (1990) Finding structure in time. Cogn Sci 14:179–211

    Article  Google Scholar 

  5. Erdogmus D, Fontenla-Romero O, Principe J, Alonso-Betanzos A, Castillo E (2005) Linear-least-squares initialization of multilayer perceptrons through backpropagation of the desired response. IEEE Trans Neural Netw 16:325–337

    Article  Google Scholar 

  6. Fasano G, Lampariello F, Sciandrone M (2006) A truncated nonmonotone Gauss-Newton method for large-scale nonlinear least-squares problems. Comput Optim Appl 34:343–358

    Article  MathSciNet  MATH  Google Scholar 

  7. Gill PE, Murray W, Wright MH (1981) Practical optimization. Academic Press, London

    MATH  Google Scholar 

  8. Grippo L, Lampariello F, Lucidi S (1986) A nonmonotone line search technique for Newton’s method. SIAM J Numer Anal 23:707–716

    Article  MathSciNet  MATH  Google Scholar 

  9. Grippo L, Lampariello F, Lucidi S (1990) A quasi-Newton algorithm with a nonmonotone stabilization technique. J Optim Theory Appl 64(3):495–510

    Article  MathSciNet  MATH  Google Scholar 

  10. Grippo L, Lampariello F, Lucidi S (1991) A class of nonmonotone stabilization methods in unconstrained optimization. Numer Math 59:779–805

    Article  MathSciNet  MATH  Google Scholar 

  11. Grippo L, Sciandrone M (2002) Nonmonotone globalization techniques for the Barzilai-Borwein gradient method. Comput Optim Appl 23:143–169

    Article  MathSciNet  MATH  Google Scholar 

  12. Gruber C, Sick B (2003) Fast and efficient second-order training of the dynamic neural network paradigm. In: Proceedings of IEEE international joint conference on neural networks, pp 2482–2487

  13. Hagan MT, Demuth HB, Beale MH (1996) Neural network design. PWS Publishing, Boston

    Google Scholar 

  14. Hagan MT, Menhaj MB (1994) Training feedforward networks with the Marquardt algorithm. IEEE Trans Neural Netw 5:989–993

    Article  Google Scholar 

  15. Jordanov I, Georgieva A (2007) Neural network learning with global heuristic search. IEEE Trans Neural Netw 18:937–942

    Article  Google Scholar 

  16. Lera G, Pinzolas M (2002) Neighborhood based Levenberg-Marquardt algorithm for neural network training. IEEE Trans Neural Netw 13:1200–1203

    Article  Google Scholar 

  17. Levenberg K (1944) A method for the solution of certain problems in least squares. Quart Appl Math 5:164–1686

    MathSciNet  Google Scholar 

  18. McLeod P, Plunkett K, Rolls ET (1998) Introduction to connectionist modelling of cognitive processes. Oxford University Press, Oxford, pp 148–151

    Google Scholar 

  19. Magoulas GD, Chen SY, Dimakopoulos D (2004) A personalised interface for web directories based on cognitive styles. Lecture notes in computer science, vol 3196, pp 159–166, User-centered interaction paradigms for universal access in the Information Society: Revised selected papers of the 8th ERCIM workshop on user interfaces for all, Springer

  20. Marquardt D (1963) An algorithm for least squares estimation of nonlinear parameters. SIAM J Appl Math 11:431–441

    Article  MathSciNet  MATH  Google Scholar 

  21. Medsker LR, Jain LC (2000) Recurrent neural networks: design and applications. CRC Press, Boca Raton

    Google Scholar 

  22. Morejon M, Principe J (2004) Advanced search algorithms for information-theoretic learning with kernel-based estimators. IEEE Trans Neural Netw 15:874–884

    Article  Google Scholar 

  23. Nelles O (2000) Nonlinear system identification. Springer, Berlin

    Google Scholar 

  24. Ortega JM, Rheinboldt WC (2000) Iterative solution of nonlinear equations in several variables. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA

  25. Peng CC, Magoulas GD (2007) Adaptive self-scaling non-monotone BFGS training algorithm for recurrent neural networks. In: Proceedings of international conference on artificial neural networks (ICANN’07), Porto, Portugal, pp 259–268

  26. Peng CC, Magoulas GD (2007) Adaptive nonmonotone conjugate gradient training algorithm for recurrent neural networks. In: Proceedings of 19th IEEE international conference on tools with artificial intelligence (ICTAI’07), Patras, Greece, pp 374–381

  27. Peng CC, Magoulas GD (2009) Nonmonotone learning of recurrent neural networks in symbolic sequence processing application. In: Palmer-Brown D (ed) Proceedings of 11th international conference on engineering applications of neural networks (EANN’09), 22–29 August, London, England, pp 325–335

  28. Peng H, Ozaki T, Haggan-Ozaki V, Toyoda Y (2003) A parameter optimization method for radial basis function type models. IEEE Trans Neural Netw 14:432–438

    Article  Google Scholar 

  29. Plagianakos VP, Magoulas GD, Vrahatis MN (2002) Deterministic nonmonotone strategies for effective training of multi-layer perceptrons. IEEE Trans Neural Netw 13(6):1268–1284

    Article  Google Scholar 

  30. Plagianakos VP, Magoulas GD, Vrahatis MN (2006) Improved learning of neural nets through global search. In: Pintér JD (ed) Global optimization—scientific and engineering case studies. Series: nonconvex optimization and its applications, vol 85. Springer, NY, pp 361–388

  31. Savran A (2007) Multifeedback-layer neural network. IEEE Trans Neural Netw 18:373–384

    Article  Google Scholar 

  32. Shi ZJ, Shen J (2006) Convergence of nonmonotone line search method. J Comput Appl Math 193(2):397–412

    Article  MathSciNet  MATH  Google Scholar 

  33. Sluijter R, Wuppermann F, Taori R, Kathmann E (1995) State of the art and trends in speech coding. Philips J Res 49(4):455–488

    Article  Google Scholar 

  34. Sun W, Han J, Sun J (2002) Global convergence of nonmonotone descent methods for unconstrained optimization problems. J Comput Appl Math 146:89–98

    Article  MathSciNet  MATH  Google Scholar 

  35. Tivive FHC, Bouzerdoum A (2005) Efficient training algorithms for a class of shunting inhibitory convolutional neural networks. IEEE Trans Neural Netw 16:541–556

    Article  Google Scholar 

  36. Toledo A, Pinzolas M, Ibarrola J, Lera G (2005) Improvement of the neighbourhood based Levenberg-Marquardt algorithm by local adaptation of the learning coefficient. IEEE Trans Neural Netw 16:988–992

    Article  Google Scholar 

  37. Waibel A (1989) Modular construction of time-delay neural networks for speech recognition. Neural Comput 1(1):39–46

    Article  Google Scholar 

  38. Waibel A, Hanazawa T, Hilton G, Shikano K, Lang KJ (1989) Phoneme recognition using time-delay neural networks. IEEE Trans Acoust Speech Signal Process 37:328–339

    Article  Google Scholar 

  39. Wan S, Banta L (2006) Parameter incremental learning algorithm for neural networks. IEEE Trans Neural Netw 17:1424–1438

    Article  Google Scholar 

  40. Yam J, Chow T (1997) Extended least squares based algorithm for training feedforward networks. IEEE Trans Neural Netw 8:806–810

    Article  Google Scholar 

  41. Zhang JZ, Cheng LH (1997) Nonmonotone Levenberg-Marquardt algorithms and their convergence analysis. J Optim Theory Appl 92:393–418

    Article  MathSciNet  MATH  Google Scholar 

  42. Zhou G, Si J (1998) Advanced neural-network training algorithm with reduced complexity based on Jacobian deficiency. IEEE Trans Neural Netw 9:448–453

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chun-Cheng Peng.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Peng, CC., Magoulas, G.D. Nonmonotone Levenberg–Marquardt training of recurrent neural architectures for processing symbolic sequences. Neural Comput & Applic 20, 897–908 (2011). https://doi.org/10.1007/s00521-010-0493-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-010-0493-2

Keywords

Navigation