Abstract
In the conventional recursive least square (RLS) algorithm for multilayer feedforward neural networks, controlling the initial error covariance matrix can limit weight magnitude. However, the weight decay effect decreases linearly as the number of learning epochs increases. Although we can modify the original RLS algorithm to maintain a constant weight decay effect, the computational and space complexities of the modified RLS algorithm are very high. This paper first presents a set of more compact RLS equations for this modified RLS algorithm. Afterwards, to reduce the computational and space complexities, we propose a decoupled version for this algorithm. The effectiveness of this decoupled algorithm is demonstrated by computer simulations.
Similar content being viewed by others
References
Leung CS, Wong KW, Sum PF, Chan LW (1996) On-line training and pruning for RLS algorithms. Electron Lett 32(23):2152–2153
Leung CS, Wong KW, Sum PF, Chan LW (2001) A pruning method for the recursive least squared algorithm. Neural Netw 14(2):147–174
Leung CS, Sum J, Young GH, Kan WK (1999) On the Kalman filtering method in neural networks training and pruning. IEEE Trans Neural Netw 10(1):161–165
Scalero R, Tepedelelenlioglu N (1992) Fast new algorithm for training feedforward neural networks. IEEE Trans Signal Process 40(1):202–210
Shah S, Palmieri F, Datum F (1992) Optimal filtering algorithms for fast learning in feedforward neural networks. Neural Netw 5(5):779–787
Rumelhart D, Hinton G, Williams R (1986) Learning internal representations by error propagation, pp 318–362
Moody JE (Sep. 1991) Note on generalization, regularization, and architecture selection in nonlinear learning systems. In: Proceedings first IEEE-SP workshop on neural networks for signal processing, pp 1–10
Krogh A, Hertz JA (1992) A simple weight decay can improve generalization. In: Adv Neural Inf Process Syst 4, [NIPS conference]. Morgan Kaufmann, pp 950–957
Chen S, Hong X, Harris C, Sharkey P (2004) Sparse modelling using orthogonal forward regression with press statistic and regularization. IEEE Trans Syst Man Cybern B 34(2):898–911
Sum J, Leung CS, Ho KI-J (2009) On objective function, regularizer, and prediction error of a learning algorithm for dealing with multiplicative weight noise. IEEE Trans Neural Netw 20(1):124–138
Bernier JL, Ortega J, Rojas I, Ros E, Prieto A (2000) Obtaining fault tolerant multilayer perceptrons using an explicit regularization. Neural Process Lett 12(2):107–113
Leung CS, Tsoi AC, Chan LW (2001) Two regularizers for recursive least square algorithms in feedforward multilayered neural networks. IEEE Trans Neural Netw 12(6):1314–1332
Hornik K (1991) Approximation capabilities of multilayer feedforward networks. Neural Netw 4:251–257
Mosca E (1995) Optimal predictive and adaptive control. Prentice-Hall, Englewood Cliffs, NJ
Haykin S (1991) Adaptive filter theory. Prentice-Hall, Englewood Cliffs, NJ
Mackay D (1992) Bayesian interpolation. Neural Comput Appl 4:415–447
Mackay D (1992) A practical bayesian framework for backpropagation networks. Neural Comput Appl 4:448–472
Moody JE, Hanson SJ, Lippmann SJ (1992) The effective number of parameters: an analysis of generalization and regularization in nonlinear learning systems. Adv Neural Inf Process Syst 4 [NIPS Conference]: 847–854
Rögnvaldsson TS (1998) A simple trick for estimating the weight decay parameter. In Neural networks: tricks of the trade, this book is an outgrowth of a 1996 NIPS workshop. Springer, London, pp 71–92
Sugiyama M, Ogawa H (2002) Optimal design of regularization term and regularization parameter by subspace information criterion. Neural Netw 15(3):349–361
Guo P (2002) Studies of model selection and regularization for generalization in neural networks with applications. Ph.D. dissertation, The Chinese University of Hong Kong, Hong Kong, supervisor-Michael R. Lyu
Guo P, Lyu M, Chen C (2003) Regularization parameter estimation for feedforward neural networks. IEEE Trans Syst Man Cybern B 33(1):35–44
Hager WW (1989) Applied numerical linear algebra. Prentice-Hall, Englewood Cliffs, NJ
Li S, Wunsch DC, O’Hair E, Giesselmann MG (2002) Extended Kalman filter training of neural networks on a simd parallel machine. J Parallel Distrib Comput 62(4):544–562
Xiao Y, Leung C-S, Ho T-Y, Lam P-M (2011) A gpu implementation for lbg and som training. Neural Comput Appl 20(7):1035–1042. doi:10.1007/s00521-010-0403-7
Ho T-Y, Lam P-M, Leung C-S (2008) Parallelization of cellular neural networks on gpu. Pattern Recogn Lett 41(8):2684–2692
Acknowledgments
The work was supported by a research grant from City University of Hong Kong (7002701).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Leung, A.CS., Xiao, Y., Xu, Y. et al. Decouple implementation of weight decay for recursive least square. Neural Comput & Applic 21, 1709–1716 (2012). https://doi.org/10.1007/s00521-012-0832-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-012-0832-6