Abstract
In this paper, we present nonmonotone variants of the Levenberg–Marquardt (LM) method for training recurrent neural networks (RNNs). These methods inherit the benefits of previously developed LM with momentum algorithms and are equipped with nonmonotone criteria, allowing temporal increase in training errors, and an adaptive scheme for tuning the size of the nonmonotone slide window. The proposed algorithms are applied to training RNNs of various sizes and architectures in symbolic sequence-processing problems. Experiments show that the proposed nonmonotone learning algorithms train more effectively RNNs for sequence processing than the original monotone methods.
Similar content being viewed by others
References
Ampazis N, Perantonis SJ (2002) Two highly efficient second-order algorithms for training feedforward networks. IEEE Trans Neural Netw 13:1064–1074
Chu YC, Huang J (1999) A neural-network method for the nonlinear servomechanism problem. IEEE Trans Neural Netw 10:1412–1423
Chen K, Xu L, Chi H (1999) Improved learning algorithms for mixture of experts in multiclass classification. Neural Netw 12:1229–1252
Elman JL (1990) Finding structure in time. Cogn Sci 14:179–211
Erdogmus D, Fontenla-Romero O, Principe J, Alonso-Betanzos A, Castillo E (2005) Linear-least-squares initialization of multilayer perceptrons through backpropagation of the desired response. IEEE Trans Neural Netw 16:325–337
Fasano G, Lampariello F, Sciandrone M (2006) A truncated nonmonotone Gauss-Newton method for large-scale nonlinear least-squares problems. Comput Optim Appl 34:343–358
Gill PE, Murray W, Wright MH (1981) Practical optimization. Academic Press, London
Grippo L, Lampariello F, Lucidi S (1986) A nonmonotone line search technique for Newton’s method. SIAM J Numer Anal 23:707–716
Grippo L, Lampariello F, Lucidi S (1990) A quasi-Newton algorithm with a nonmonotone stabilization technique. J Optim Theory Appl 64(3):495–510
Grippo L, Lampariello F, Lucidi S (1991) A class of nonmonotone stabilization methods in unconstrained optimization. Numer Math 59:779–805
Grippo L, Sciandrone M (2002) Nonmonotone globalization techniques for the Barzilai-Borwein gradient method. Comput Optim Appl 23:143–169
Gruber C, Sick B (2003) Fast and efficient second-order training of the dynamic neural network paradigm. In: Proceedings of IEEE international joint conference on neural networks, pp 2482–2487
Hagan MT, Demuth HB, Beale MH (1996) Neural network design. PWS Publishing, Boston
Hagan MT, Menhaj MB (1994) Training feedforward networks with the Marquardt algorithm. IEEE Trans Neural Netw 5:989–993
Jordanov I, Georgieva A (2007) Neural network learning with global heuristic search. IEEE Trans Neural Netw 18:937–942
Lera G, Pinzolas M (2002) Neighborhood based Levenberg-Marquardt algorithm for neural network training. IEEE Trans Neural Netw 13:1200–1203
Levenberg K (1944) A method for the solution of certain problems in least squares. Quart Appl Math 5:164–1686
McLeod P, Plunkett K, Rolls ET (1998) Introduction to connectionist modelling of cognitive processes. Oxford University Press, Oxford, pp 148–151
Magoulas GD, Chen SY, Dimakopoulos D (2004) A personalised interface for web directories based on cognitive styles. Lecture notes in computer science, vol 3196, pp 159–166, User-centered interaction paradigms for universal access in the Information Society: Revised selected papers of the 8th ERCIM workshop on user interfaces for all, Springer
Marquardt D (1963) An algorithm for least squares estimation of nonlinear parameters. SIAM J Appl Math 11:431–441
Medsker LR, Jain LC (2000) Recurrent neural networks: design and applications. CRC Press, Boca Raton
Morejon M, Principe J (2004) Advanced search algorithms for information-theoretic learning with kernel-based estimators. IEEE Trans Neural Netw 15:874–884
Nelles O (2000) Nonlinear system identification. Springer, Berlin
Ortega JM, Rheinboldt WC (2000) Iterative solution of nonlinear equations in several variables. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA
Peng CC, Magoulas GD (2007) Adaptive self-scaling non-monotone BFGS training algorithm for recurrent neural networks. In: Proceedings of international conference on artificial neural networks (ICANN’07), Porto, Portugal, pp 259–268
Peng CC, Magoulas GD (2007) Adaptive nonmonotone conjugate gradient training algorithm for recurrent neural networks. In: Proceedings of 19th IEEE international conference on tools with artificial intelligence (ICTAI’07), Patras, Greece, pp 374–381
Peng CC, Magoulas GD (2009) Nonmonotone learning of recurrent neural networks in symbolic sequence processing application. In: Palmer-Brown D (ed) Proceedings of 11th international conference on engineering applications of neural networks (EANN’09), 22–29 August, London, England, pp 325–335
Peng H, Ozaki T, Haggan-Ozaki V, Toyoda Y (2003) A parameter optimization method for radial basis function type models. IEEE Trans Neural Netw 14:432–438
Plagianakos VP, Magoulas GD, Vrahatis MN (2002) Deterministic nonmonotone strategies for effective training of multi-layer perceptrons. IEEE Trans Neural Netw 13(6):1268–1284
Plagianakos VP, Magoulas GD, Vrahatis MN (2006) Improved learning of neural nets through global search. In: Pintér JD (ed) Global optimization—scientific and engineering case studies. Series: nonconvex optimization and its applications, vol 85. Springer, NY, pp 361–388
Savran A (2007) Multifeedback-layer neural network. IEEE Trans Neural Netw 18:373–384
Shi ZJ, Shen J (2006) Convergence of nonmonotone line search method. J Comput Appl Math 193(2):397–412
Sluijter R, Wuppermann F, Taori R, Kathmann E (1995) State of the art and trends in speech coding. Philips J Res 49(4):455–488
Sun W, Han J, Sun J (2002) Global convergence of nonmonotone descent methods for unconstrained optimization problems. J Comput Appl Math 146:89–98
Tivive FHC, Bouzerdoum A (2005) Efficient training algorithms for a class of shunting inhibitory convolutional neural networks. IEEE Trans Neural Netw 16:541–556
Toledo A, Pinzolas M, Ibarrola J, Lera G (2005) Improvement of the neighbourhood based Levenberg-Marquardt algorithm by local adaptation of the learning coefficient. IEEE Trans Neural Netw 16:988–992
Waibel A (1989) Modular construction of time-delay neural networks for speech recognition. Neural Comput 1(1):39–46
Waibel A, Hanazawa T, Hilton G, Shikano K, Lang KJ (1989) Phoneme recognition using time-delay neural networks. IEEE Trans Acoust Speech Signal Process 37:328–339
Wan S, Banta L (2006) Parameter incremental learning algorithm for neural networks. IEEE Trans Neural Netw 17:1424–1438
Yam J, Chow T (1997) Extended least squares based algorithm for training feedforward networks. IEEE Trans Neural Netw 8:806–810
Zhang JZ, Cheng LH (1997) Nonmonotone Levenberg-Marquardt algorithms and their convergence analysis. J Optim Theory Appl 92:393–418
Zhou G, Si J (1998) Advanced neural-network training algorithm with reduced complexity based on Jacobian deficiency. IEEE Trans Neural Netw 9:448–453
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Peng, CC., Magoulas, G.D. Nonmonotone Levenberg–Marquardt training of recurrent neural architectures for processing symbolic sequences. Neural Comput & Applic 20, 897–908 (2011). https://doi.org/10.1007/s00521-010-0493-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-010-0493-2