Abstract
This paper discusses learning algorithms of layered neural networks from the standpoint of maximum likelihood estimation. Fisher information is explicitly calculated for the network with only one neuron. It can be interpreted as a weighted covariance matrix of input vectors. A learning algorithm is presented on the basis of Fisher's scoring method. It is shown that the algorithm can be interpreted as iterations of weighted least square method. Then those results are extended to the layered network with one hidden layer. It is also shown that Fisher information is given as a weighted covariance matrix of inputs and outputs of hidden units for this network. Tow new algorithms are proposed by utilizing this information. It is experimentally shown that the algorithms converge with fewer iterations than usual BP algorithm. Especially UFS (unitwise Fisher's scoring) method reduces to the algorithm in which each unit estimates its own weights by a weighted least squares method.
Preview
Unable to display preview. Download preview PDF.
References
Rumelhart,D.E., Hinton,G.E., and Williams,R.J.: Learning representations by back-propagating errors, Nature, Vol.323-9, pp.533–536 (1986).
Rumelhart,D.E., Hinton,G.E., and Williams,R.J.: Learning internal representations by error propagation, in Parallel Distributed Processing Volume 1, McCleland, J.L., Rumelhart, D.E., and The PDP Research group, Cambridge, MA: MIT Press, 1986.
Richard,M.D. and Lippmann,R.P.: Neural network classifiers estimate Bayesian a posteriori probabilities, Neural Computation, Vol.3, No.4, pp.461–483 (1991).
Baum,E.B. and Wilczek.F.: Supervised learning of probability distributions by neural networks, In Neural Information Processing Systems, D. Anderson, ed.,pp.52–61. American Institute of Physics, New York (1988).
Hinton,G.E.: Connectionist learning procedures, Artificial Intelligence 40, 185–234 (1989).
Bridle,J.S.:Training stochastic model recognition algorithms as networks can lead to maximum mutual information estimation of parameters. In Neural Information Processing Systems 2, David S.Touretzky, ed., pp.211–217, Morgan Kaufmann (1990).
Gish,H.: A probabilistic approach to the understanding and training of neural network classifiers. In Proceedings of IEEE Conference on Acoustics Speech and Signal Processing, pp.1361–1364 (1990).
Hampshire, J.B. and Waibel, A.H.: A novel objective function for improved phoneme recognition using time-delay neural networks, IEEE Trans. on Neural Networks, Vol.1, No.2, pp.216–228 (1990).
Holt,M.J.J. and Semanani,S.: Convergence of back propagation in neural networks using a log likelihood cost function, Electronics Letters, Vol.26, No.23 (1990).
Kurita,T.: A Method to Determine the Number of Hidden Units of Three Layered Neural Networks by Information Criteria, Trans. of IEICE Japan, J73-D-II, 1872–1878, 1990 (in Japanese).
Kurita,T.: On Maximum Likelihood Estimation of Feed-Forward Neural Net Parameters, IEICE Tech. Report, NC91-36, 1991 (in Japanese).
McCullagh,P. and Nelder FRS,J.A.: Generalized Linear Models, Chapman and Hall, 1989.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1993 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kurita, T. (1993). Iterative weighted least squares algorithms for neural networks classifiers. In: Doshita, S., Furukawa, K., Jantke, K.P., Nishida, T. (eds) Algorithmic Learning Theory. ALT 1992. Lecture Notes in Computer Science, vol 743. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-57369-0_29
Download citation
DOI: https://doi.org/10.1007/3-540-57369-0_29
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-57369-2
Online ISBN: 978-3-540-48093-8
eBook Packages: Springer Book Archive