Skip to main content
Log in

Decouple implementation of weight decay for recursive least square

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

In the conventional recursive least square (RLS) algorithm for multilayer feedforward neural networks, controlling the initial error covariance matrix can limit weight magnitude. However, the weight decay effect decreases linearly as the number of learning epochs increases. Although we can modify the original RLS algorithm to maintain a constant weight decay effect, the computational and space complexities of the modified RLS algorithm are very high. This paper first presents a set of more compact RLS equations for this modified RLS algorithm. Afterwards, to reduce the computational and space complexities, we propose a decoupled version for this algorithm. The effectiveness of this decoupled algorithm is demonstrated by computer simulations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. It should be noticed that such a technique is usually used in many numerical methods [15, 23, 24]. That means, at each training iteration, we update each decoupled weight vector separately.

References

  1. Leung CS, Wong KW, Sum PF, Chan LW (1996) On-line training and pruning for RLS algorithms. Electron Lett 32(23):2152–2153

    Article  Google Scholar 

  2. Leung CS, Wong KW, Sum PF, Chan LW (2001) A pruning method for the recursive least squared algorithm. Neural Netw 14(2):147–174

    Article  Google Scholar 

  3. Leung CS, Sum J, Young GH, Kan WK (1999) On the Kalman filtering method in neural networks training and pruning. IEEE Trans Neural Netw 10(1):161–165

    Article  Google Scholar 

  4. Scalero R, Tepedelelenlioglu N (1992) Fast new algorithm for training feedforward neural networks. IEEE Trans Signal Process 40(1):202–210

    Article  Google Scholar 

  5. Shah S, Palmieri F, Datum F (1992) Optimal filtering algorithms for fast learning in feedforward neural networks. Neural Netw 5(5):779–787

    Article  Google Scholar 

  6. Rumelhart D, Hinton G, Williams R (1986) Learning internal representations by error propagation, pp 318–362

  7. Moody JE (Sep. 1991) Note on generalization, regularization, and architecture selection in nonlinear learning systems. In: Proceedings first IEEE-SP workshop on neural networks for signal processing, pp 1–10

  8. Krogh A, Hertz JA (1992) A simple weight decay can improve generalization. In: Adv Neural Inf Process Syst 4, [NIPS conference]. Morgan Kaufmann, pp 950–957

  9. Chen S, Hong X, Harris C, Sharkey P (2004) Sparse modelling using orthogonal forward regression with press statistic and regularization. IEEE Trans Syst Man Cybern B 34(2):898–911

    Google Scholar 

  10. Sum J, Leung CS, Ho KI-J (2009) On objective function, regularizer, and prediction error of a learning algorithm for dealing with multiplicative weight noise. IEEE Trans Neural Netw 20(1):124–138

    Article  Google Scholar 

  11. Bernier JL, Ortega J, Rojas I, Ros E, Prieto A (2000) Obtaining fault tolerant multilayer perceptrons using an explicit regularization. Neural Process Lett 12(2):107–113

    Article  MATH  Google Scholar 

  12. Leung CS, Tsoi AC, Chan LW (2001) Two regularizers for recursive least square algorithms in feedforward multilayered neural networks. IEEE Trans Neural Netw 12(6):1314–1332

    Article  Google Scholar 

  13. Hornik K (1991) Approximation capabilities of multilayer feedforward networks. Neural Netw 4:251–257

    Article  Google Scholar 

  14. Mosca E (1995) Optimal predictive and adaptive control. Prentice-Hall, Englewood Cliffs, NJ

    Google Scholar 

  15. Haykin S (1991) Adaptive filter theory. Prentice-Hall, Englewood Cliffs, NJ

    MATH  Google Scholar 

  16. Mackay D (1992) Bayesian interpolation. Neural Comput Appl 4:415–447

    Article  Google Scholar 

  17. Mackay D (1992) A practical bayesian framework for backpropagation networks. Neural Comput Appl 4:448–472

    Article  Google Scholar 

  18. Moody JE, Hanson SJ, Lippmann SJ (1992) The effective number of parameters: an analysis of generalization and regularization in nonlinear learning systems. Adv Neural Inf Process Syst 4 [NIPS Conference]: 847–854

  19. Rögnvaldsson TS (1998) A simple trick for estimating the weight decay parameter. In Neural networks: tricks of the trade, this book is an outgrowth of a 1996 NIPS workshop. Springer, London, pp 71–92

  20. Sugiyama M, Ogawa H (2002) Optimal design of regularization term and regularization parameter by subspace information criterion. Neural Netw 15(3):349–361

    Article  Google Scholar 

  21. Guo P (2002) Studies of model selection and regularization for generalization in neural networks with applications. Ph.D. dissertation, The Chinese University of Hong Kong, Hong Kong, supervisor-Michael R. Lyu

  22. Guo P, Lyu M, Chen C (2003) Regularization parameter estimation for feedforward neural networks. IEEE Trans Syst Man Cybern B 33(1):35–44

    Article  Google Scholar 

  23. Hager WW (1989) Applied numerical linear algebra. Prentice-Hall, Englewood Cliffs, NJ

    Google Scholar 

  24. Li S, Wunsch DC, O’Hair E, Giesselmann MG (2002) Extended Kalman filter training of neural networks on a simd parallel machine. J Parallel Distrib Comput 62(4):544–562

    Article  MATH  Google Scholar 

  25. Xiao Y, Leung C-S, Ho T-Y, Lam P-M (2011) A gpu implementation for lbg and som training. Neural Comput Appl 20(7):1035–1042. doi:10.1007/s00521-010-0403-7

    Article  Google Scholar 

  26. Ho T-Y, Lam P-M, Leung C-S (2008) Parallelization of cellular neural networks on gpu. Pattern Recogn Lett 41(8):2684–2692

    Article  MATH  Google Scholar 

Download references

Acknowledgments

The work was supported by a research grant from City University of Hong Kong (7002701).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andrew Chi-Sing Leung.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Leung, A.CS., Xiao, Y., Xu, Y. et al. Decouple implementation of weight decay for recursive least square. Neural Comput & Applic 21, 1709–1716 (2012). https://doi.org/10.1007/s00521-012-0832-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-012-0832-6

Keywords

Navigation