Decouple implementation of weight decay for recursive least square

Leung, Andrew Chi-Sing; Xiao, Yi; Xu, Yong; Wong, Kwok-Wo

doi:10.1007/s00521-012-0832-6

Decouple implementation of weight decay for recursive least square

Original Article
Published: 01 February 2012

Volume 21, pages 1709–1716, (2012)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Andrew Chi-Sing Leung¹,
Yi Xiao¹,
Yong Xu¹ &
…
Kwok-Wo Wong¹

907 Accesses
1 Citation
Explore all metrics

Abstract

In the conventional recursive least square (RLS) algorithm for multilayer feedforward neural networks, controlling the initial error covariance matrix can limit weight magnitude. However, the weight decay effect decreases linearly as the number of learning epochs increases. Although we can modify the original RLS algorithm to maintain a constant weight decay effect, the computational and space complexities of the modified RLS algorithm are very high. This paper first presents a set of more compact RLS equations for this modified RLS algorithm. Afterwards, to reduce the computational and space complexities, we propose a decoupled version for this algorithm. The effectiveness of this decoupled algorithm is demonstrated by computer simulations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

It should be noticed that such a technique is usually used in many numerical methods [15, 23, 24]. That means, at each training iteration, we update each decoupled weight vector separately.

References

Leung CS, Wong KW, Sum PF, Chan LW (1996) On-line training and pruning for RLS algorithms. Electron Lett 32(23):2152–2153
Article Google Scholar
Leung CS, Wong KW, Sum PF, Chan LW (2001) A pruning method for the recursive least squared algorithm. Neural Netw 14(2):147–174
Article Google Scholar
Leung CS, Sum J, Young GH, Kan WK (1999) On the Kalman filtering method in neural networks training and pruning. IEEE Trans Neural Netw 10(1):161–165
Article Google Scholar
Scalero R, Tepedelelenlioglu N (1992) Fast new algorithm for training feedforward neural networks. IEEE Trans Signal Process 40(1):202–210
Article Google Scholar
Shah S, Palmieri F, Datum F (1992) Optimal filtering algorithms for fast learning in feedforward neural networks. Neural Netw 5(5):779–787
Article Google Scholar
Rumelhart D, Hinton G, Williams R (1986) Learning internal representations by error propagation, pp 318–362
Moody JE (Sep. 1991) Note on generalization, regularization, and architecture selection in nonlinear learning systems. In: Proceedings first IEEE-SP workshop on neural networks for signal processing, pp 1–10
Krogh A, Hertz JA (1992) A simple weight decay can improve generalization. In: Adv Neural Inf Process Syst 4, [NIPS conference]. Morgan Kaufmann, pp 950–957
Chen S, Hong X, Harris C, Sharkey P (2004) Sparse modelling using orthogonal forward regression with press statistic and regularization. IEEE Trans Syst Man Cybern B 34(2):898–911
Google Scholar
Sum J, Leung CS, Ho KI-J (2009) On objective function, regularizer, and prediction error of a learning algorithm for dealing with multiplicative weight noise. IEEE Trans Neural Netw 20(1):124–138
Article Google Scholar
Bernier JL, Ortega J, Rojas I, Ros E, Prieto A (2000) Obtaining fault tolerant multilayer perceptrons using an explicit regularization. Neural Process Lett 12(2):107–113
Article MATH Google Scholar
Leung CS, Tsoi AC, Chan LW (2001) Two regularizers for recursive least square algorithms in feedforward multilayered neural networks. IEEE Trans Neural Netw 12(6):1314–1332
Article Google Scholar
Hornik K (1991) Approximation capabilities of multilayer feedforward networks. Neural Netw 4:251–257
Article Google Scholar
Mosca E (1995) Optimal predictive and adaptive control. Prentice-Hall, Englewood Cliffs, NJ
Google Scholar
Haykin S (1991) Adaptive filter theory. Prentice-Hall, Englewood Cliffs, NJ
MATH Google Scholar
Mackay D (1992) Bayesian interpolation. Neural Comput Appl 4:415–447
Article Google Scholar
Mackay D (1992) A practical bayesian framework for backpropagation networks. Neural Comput Appl 4:448–472
Article Google Scholar
Moody JE, Hanson SJ, Lippmann SJ (1992) The effective number of parameters: an analysis of generalization and regularization in nonlinear learning systems. Adv Neural Inf Process Syst 4 [NIPS Conference]: 847–854
Rögnvaldsson TS (1998) A simple trick for estimating the weight decay parameter. In Neural networks: tricks of the trade, this book is an outgrowth of a 1996 NIPS workshop. Springer, London, pp 71–92
Sugiyama M, Ogawa H (2002) Optimal design of regularization term and regularization parameter by subspace information criterion. Neural Netw 15(3):349–361
Article Google Scholar
Guo P (2002) Studies of model selection and regularization for generalization in neural networks with applications. Ph.D. dissertation, The Chinese University of Hong Kong, Hong Kong, supervisor-Michael R. Lyu
Guo P, Lyu M, Chen C (2003) Regularization parameter estimation for feedforward neural networks. IEEE Trans Syst Man Cybern B 33(1):35–44
Article Google Scholar
Hager WW (1989) Applied numerical linear algebra. Prentice-Hall, Englewood Cliffs, NJ
Google Scholar
Li S, Wunsch DC, O’Hair E, Giesselmann MG (2002) Extended Kalman filter training of neural networks on a simd parallel machine. J Parallel Distrib Comput 62(4):544–562
Article MATH Google Scholar
Xiao Y, Leung C-S, Ho T-Y, Lam P-M (2011) A gpu implementation for lbg and som training. Neural Comput Appl 20(7):1035–1042. doi:10.1007/s00521-010-0403-7
Article Google Scholar
Ho T-Y, Lam P-M, Leung C-S (2008) Parallelization of cellular neural networks on gpu. Pattern Recogn Lett 41(8):2684–2692
Article MATH Google Scholar

Download references

Acknowledgments

The work was supported by a research grant from City University of Hong Kong (7002701).

Author information

Authors and Affiliations

Department of Electronic Engineering, City University of Hong Kong, Kowloon, Hong Kong
Andrew Chi-Sing Leung, Yi Xiao, Yong Xu & Kwok-Wo Wong

Authors

Andrew Chi-Sing Leung
View author publications
You can also search for this author in PubMed Google Scholar
Yi Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Yong Xu
View author publications
You can also search for this author in PubMed Google Scholar
Kwok-Wo Wong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andrew Chi-Sing Leung.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Leung, A.CS., Xiao, Y., Xu, Y. et al. Decouple implementation of weight decay for recursive least square. Neural Comput & Applic 21, 1709–1716 (2012). https://doi.org/10.1007/s00521-012-0832-6

Download citation

Received: 06 September 2010
Accepted: 05 January 2012
Published: 01 February 2012
Issue Date: October 2012
DOI: https://doi.org/10.1007/s00521-012-0832-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Decouple implementation of weight decay for recursive least square

Abstract

Access this article

Similar content being viewed by others

Method of Real Time Calculation of Learning Rate Value to Improve Convergence of Neural Network Training

Neural Networks Training Based on Recursive Least Squares (RLS)

Simplified neural network for generalized least absolute deviation

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Decouple implementation of weight decay for recursive least square

Abstract

Access this article

Similar content being viewed by others

Method of Real Time Calculation of Learning Rate Value to Improve Convergence of Neural Network Training

Neural Networks Training Based on Recursive Least Squares (RLS)

Simplified neural network for generalized least absolute deviation

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation