Nonmonotone Levenberg–Marquardt training of recurrent neural architectures for processing symbolic sequences

Peng, Chun-Cheng; Magoulas, George D.

doi:10.1007/s00521-010-0493-2

Nonmonotone Levenberg–Marquardt training of recurrent neural architectures for processing symbolic sequences

EANN 2009
Published: 04 December 2010

Volume 20, pages 897–908, (2011)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Chun-Cheng Peng¹ &
George D. Magoulas¹

190 Accesses
8 Citations
Explore all metrics

Abstract

In this paper, we present nonmonotone variants of the Levenberg–Marquardt (LM) method for training recurrent neural networks (RNNs). These methods inherit the benefits of previously developed LM with momentum algorithms and are equipped with nonmonotone criteria, allowing temporal increase in training errors, and an adaptive scheme for tuning the size of the nonmonotone slide window. The proposed algorithms are applied to training RNNs of various sizes and architectures in symbolic sequence-processing problems. Experiments show that the proposed nonmonotone learning algorithms train more effectively RNNs for sequence processing than the original monotone methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Incremental Training of a Recurrent Neural Network Exploiting a Multi-scale Dynamic Memory

CS-RNN: efficient training of recurrent neural networks with continuous skips

Article 24 June 2022

A Comparison of LSTM and GRU Networks for Learning Symbolic Sequences

References

Ampazis N, Perantonis SJ (2002) Two highly efficient second-order algorithms for training feedforward networks. IEEE Trans Neural Netw 13:1064–1074
Article Google Scholar
Chu YC, Huang J (1999) A neural-network method for the nonlinear servomechanism problem. IEEE Trans Neural Netw 10:1412–1423
Article Google Scholar
Chen K, Xu L, Chi H (1999) Improved learning algorithms for mixture of experts in multiclass classification. Neural Netw 12:1229–1252
Article Google Scholar
Elman JL (1990) Finding structure in time. Cogn Sci 14:179–211
Article Google Scholar
Erdogmus D, Fontenla-Romero O, Principe J, Alonso-Betanzos A, Castillo E (2005) Linear-least-squares initialization of multilayer perceptrons through backpropagation of the desired response. IEEE Trans Neural Netw 16:325–337
Article Google Scholar
Fasano G, Lampariello F, Sciandrone M (2006) A truncated nonmonotone Gauss-Newton method for large-scale nonlinear least-squares problems. Comput Optim Appl 34:343–358
Article MathSciNet MATH Google Scholar
Gill PE, Murray W, Wright MH (1981) Practical optimization. Academic Press, London
MATH Google Scholar
Grippo L, Lampariello F, Lucidi S (1986) A nonmonotone line search technique for Newton’s method. SIAM J Numer Anal 23:707–716
Article MathSciNet MATH Google Scholar
Grippo L, Lampariello F, Lucidi S (1990) A quasi-Newton algorithm with a nonmonotone stabilization technique. J Optim Theory Appl 64(3):495–510
Article MathSciNet MATH Google Scholar
Grippo L, Lampariello F, Lucidi S (1991) A class of nonmonotone stabilization methods in unconstrained optimization. Numer Math 59:779–805
Article MathSciNet MATH Google Scholar
Grippo L, Sciandrone M (2002) Nonmonotone globalization techniques for the Barzilai-Borwein gradient method. Comput Optim Appl 23:143–169
Article MathSciNet MATH Google Scholar
Gruber C, Sick B (2003) Fast and efficient second-order training of the dynamic neural network paradigm. In: Proceedings of IEEE international joint conference on neural networks, pp 2482–2487
Hagan MT, Demuth HB, Beale MH (1996) Neural network design. PWS Publishing, Boston
Google Scholar
Hagan MT, Menhaj MB (1994) Training feedforward networks with the Marquardt algorithm. IEEE Trans Neural Netw 5:989–993
Article Google Scholar
Jordanov I, Georgieva A (2007) Neural network learning with global heuristic search. IEEE Trans Neural Netw 18:937–942
Article Google Scholar
Lera G, Pinzolas M (2002) Neighborhood based Levenberg-Marquardt algorithm for neural network training. IEEE Trans Neural Netw 13:1200–1203
Article Google Scholar
Levenberg K (1944) A method for the solution of certain problems in least squares. Quart Appl Math 5:164–1686
MathSciNet Google Scholar
McLeod P, Plunkett K, Rolls ET (1998) Introduction to connectionist modelling of cognitive processes. Oxford University Press, Oxford, pp 148–151
Google Scholar
Magoulas GD, Chen SY, Dimakopoulos D (2004) A personalised interface for web directories based on cognitive styles. Lecture notes in computer science, vol 3196, pp 159–166, User-centered interaction paradigms for universal access in the Information Society: Revised selected papers of the 8th ERCIM workshop on user interfaces for all, Springer
Marquardt D (1963) An algorithm for least squares estimation of nonlinear parameters. SIAM J Appl Math 11:431–441
Article MathSciNet MATH Google Scholar
Medsker LR, Jain LC (2000) Recurrent neural networks: design and applications. CRC Press, Boca Raton
Google Scholar
Morejon M, Principe J (2004) Advanced search algorithms for information-theoretic learning with kernel-based estimators. IEEE Trans Neural Netw 15:874–884
Article Google Scholar
Nelles O (2000) Nonlinear system identification. Springer, Berlin
Google Scholar
Ortega JM, Rheinboldt WC (2000) Iterative solution of nonlinear equations in several variables. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA
Peng CC, Magoulas GD (2007) Adaptive self-scaling non-monotone BFGS training algorithm for recurrent neural networks. In: Proceedings of international conference on artificial neural networks (ICANN’07), Porto, Portugal, pp 259–268
Peng CC, Magoulas GD (2007) Adaptive nonmonotone conjugate gradient training algorithm for recurrent neural networks. In: Proceedings of 19th IEEE international conference on tools with artificial intelligence (ICTAI’07), Patras, Greece, pp 374–381
Peng CC, Magoulas GD (2009) Nonmonotone learning of recurrent neural networks in symbolic sequence processing application. In: Palmer-Brown D (ed) Proceedings of 11th international conference on engineering applications of neural networks (EANN’09), 22–29 August, London, England, pp 325–335
Peng H, Ozaki T, Haggan-Ozaki V, Toyoda Y (2003) A parameter optimization method for radial basis function type models. IEEE Trans Neural Netw 14:432–438
Article Google Scholar
Plagianakos VP, Magoulas GD, Vrahatis MN (2002) Deterministic nonmonotone strategies for effective training of multi-layer perceptrons. IEEE Trans Neural Netw 13(6):1268–1284
Article Google Scholar
Plagianakos VP, Magoulas GD, Vrahatis MN (2006) Improved learning of neural nets through global search. In: Pintér JD (ed) Global optimization—scientific and engineering case studies. Series: nonconvex optimization and its applications, vol 85. Springer, NY, pp 361–388
Savran A (2007) Multifeedback-layer neural network. IEEE Trans Neural Netw 18:373–384
Article Google Scholar
Shi ZJ, Shen J (2006) Convergence of nonmonotone line search method. J Comput Appl Math 193(2):397–412
Article MathSciNet MATH Google Scholar
Sluijter R, Wuppermann F, Taori R, Kathmann E (1995) State of the art and trends in speech coding. Philips J Res 49(4):455–488
Article Google Scholar
Sun W, Han J, Sun J (2002) Global convergence of nonmonotone descent methods for unconstrained optimization problems. J Comput Appl Math 146:89–98
Article MathSciNet MATH Google Scholar
Tivive FHC, Bouzerdoum A (2005) Efficient training algorithms for a class of shunting inhibitory convolutional neural networks. IEEE Trans Neural Netw 16:541–556
Article Google Scholar
Toledo A, Pinzolas M, Ibarrola J, Lera G (2005) Improvement of the neighbourhood based Levenberg-Marquardt algorithm by local adaptation of the learning coefficient. IEEE Trans Neural Netw 16:988–992
Article Google Scholar
Waibel A (1989) Modular construction of time-delay neural networks for speech recognition. Neural Comput 1(1):39–46
Article Google Scholar
Waibel A, Hanazawa T, Hilton G, Shikano K, Lang KJ (1989) Phoneme recognition using time-delay neural networks. IEEE Trans Acoust Speech Signal Process 37:328–339
Article Google Scholar
Wan S, Banta L (2006) Parameter incremental learning algorithm for neural networks. IEEE Trans Neural Netw 17:1424–1438
Article Google Scholar
Yam J, Chow T (1997) Extended least squares based algorithm for training feedforward networks. IEEE Trans Neural Netw 8:806–810
Article Google Scholar
Zhang JZ, Cheng LH (1997) Nonmonotone Levenberg-Marquardt algorithms and their convergence analysis. J Optim Theory Appl 92:393–418
Article MathSciNet MATH Google Scholar
Zhou G, Si J (1998) Advanced neural-network training algorithm with reduced complexity based on Jacobian deficiency. IEEE Trans Neural Netw 9:448–453
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Information Systems, Birkbeck College, University of London, Malet Street, Bloomsbury, London, WC1E 7HX, UK
Chun-Cheng Peng & George D. Magoulas

Authors

Chun-Cheng Peng
View author publications
You can also search for this author in PubMed Google Scholar
George D. Magoulas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chun-Cheng Peng.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Peng, CC., Magoulas, G.D. Nonmonotone Levenberg–Marquardt training of recurrent neural architectures for processing symbolic sequences. Neural Comput & Applic 20, 897–908 (2011). https://doi.org/10.1007/s00521-010-0493-2

Download citation

Received: 01 February 2010
Accepted: 16 November 2010
Published: 04 December 2010
Issue Date: September 2011
DOI: https://doi.org/10.1007/s00521-010-0493-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Nonmonotone Levenberg–Marquardt training of recurrent neural architectures for processing symbolic sequences

Abstract

Access this article

Similar content being viewed by others

Incremental Training of a Recurrent Neural Network Exploiting a Multi-scale Dynamic Memory

CS-RNN: efficient training of recurrent neural networks with continuous skips

A Comparison of LSTM and GRU Networks for Learning Symbolic Sequences

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Nonmonotone Levenberg–Marquardt training of recurrent neural architectures for processing symbolic sequences

Abstract

Access this article

Similar content being viewed by others

Incremental Training of a Recurrent Neural Network Exploiting a Multi-scale Dynamic Memory

CS-RNN: efficient training of recurrent neural networks with continuous skips

A Comparison of LSTM and GRU Networks for Learning Symbolic Sequences

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation