Skip to main content
Log in

Iterative weighted least squares algorithms for neural networks classifiers

  • Special Issue
  • Published:
New Generation Computing Aims and scope Submit manuscript

Abstract

This paper discusses learning algorithms of layered neural networks from the standpoint of maximum likelihood estimation. At first we discuss learning algorithms for the most simple network with only one neuron. It is shown that Fisher information of the network, namely minus expected values of Hessian matrix, is given by a weighted covariance matrix of input vectors. A learning algorithm is presented on the basis of Fisher's scoring method which makes use of Fisher information instead of Hessian matrix in Newton's method. The algorithm can be interpreted as iterations of weighted least squares method. Then these results are extended to the layered network with one hidden layer. Fisher information for the layered network is given by a weighted covariance matrix of inputs of the network and outputs of hidden units. Since Newton's method for maximization problems has the difficulty when minus Hessian matrix is not positive definite, we propose a learning algorithm which makes use of Fisher information matrix, which is non-negative, instead of Hessian matrix. Moreover, to reduce the computation of full Fisher information matrix, we propose another algorithm which uses only block diagonal elements of Fisher information. The algorithm is reduced to an iterative weighted least squares algorithm in which each unit estimates its own weights by a weighted least squares method. It is experimentally shown that the proposed algorithms converge with fewer iterations than error back-propagation (BP) algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Rumelhart, D. E., Hinton, G. E., and Williams, R. J., “Learning Representations by Back-Propagating Errors,”Nature, 323–9, pp. 533–536, 1986.

    Article  Google Scholar 

  2. Rumelhart, D. E., Hinton, G. E., and Williams, R. J., “Learning Internal Representations by Error Propagation,” inParallel Distributed Processing, Vol. 1 (McCleland, J. L., Rumelhart, D. E., and The PDP Research group, ed.), Cambridge, MA, MIT Press, 1986.

    Google Scholar 

  3. Richard, M. D. and Lippmann, R. P., “Neural Network Classifiers Estimate Bayesiana posteriori Probabilities,”Neural Computation, 3, 4, pp. 461–483, 1991.

    Article  Google Scholar 

  4. Baum, E. B. and Wilczek, F., “Supervised Learning of Probability Distributions by Neural Networks,” inNeural Information Processing Systems (D. Anderson, ed.), American Institute of Physics, New York, pp. 52–61, 1988.

    Google Scholar 

  5. Hinton, G. E., “Connectionist Learning Procedures,”Artificial Intelligence, 40, pp. 185–234, 1989.

    Article  Google Scholar 

  6. Bridle, J. S., “Training Stochastic Model Recognition Algorithms as Networks Can Lead to Maximum Mutual Information Estimation of Parameters,” inNeural Information Processing Systems 2 (David S. Touretzky, ed.), Morgan Kaufmann, pp. 211–217, 1990.

  7. Gish, H., “A Probabilistic Approach to the Understanding and Training of Neural Network Classifiers,” inProc. of IEEE Conference on Acoustics Speech and Signal Processing, pp. 1361–1364, 1990.

  8. Hampshire, J. B. and Waibel, A. H., “A Novel Objective Function for Improved Phoneme Recognition Using Time-Delay Neural Networks,”IEEE Trans. on Neural Networks, 1, 2, pp. 216–228, 1990.

    Article  Google Scholar 

  9. Holt, M. J. J. and Semanani, S., “Convergence of Back Propagation in Neural Networks Using a Log-Likelihood Cost Function,”Electron. Lett.,26,23, 1990.

  10. Kurita, T., “A Method to Determine the Number of Hidden Units of Three Layered Neural Networks by Information Criteria,”Trans. of IEICE Japan, J73-D-II, 11, pp. 1872–1878, 1990 (in Japanese).

    Google Scholar 

  11. Seber, G. A. F. and Wild, C. J.,Nonlinear Regression, John Wiley & Sons, 1989.

  12. Kurita, T., “On Maximum Likelihood Estimation of Feed-Forward Neural Net Parameters,”IEICE Technical Report, NC91-36, 1991 (in Japanese).

  13. McCullagh, P. and Nelder FRS, J. A.,Generalized Linear Models, Chapman and Hall, 1989.

  14. Fletcher, R.,Practical Methods of Optimization, John Wiley & Sons, 1987.

  15. Fahlman, S. E., “An Empirical Study of Learning Speed in Back-Propagation Networks,”Technical Report, CMU-CS-88-162, 1988.

  16. Akaike, H., “A New Look at the Statistical Model Identification,”IEEE Trans. on Automatic Control, AC-91, 6, pp. 716–723, 1974.

    Article  MathSciNet  Google Scholar 

  17. Rissanen, J., “A Universal Prior for Integers and Estimation by Minimum Description Length,”The Annals of Statistics, 11, 2, pp. 416–431, 1983.

    Article  MATH  MathSciNet  Google Scholar 

  18. Rissanen, J., “Stochastic Complexity and Modeling,”The Annals of Statistics, 14, 3, pp. 1080–1100, 1986.

    Article  MATH  MathSciNet  Google Scholar 

  19. Fisher, R. A., “The Use of Multiple Measurements in Taxonomic Problems,”Ann. Eugenics, 7, Part II, pp. 179–188, 1936.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Additional information

Takio Kurita, Ph. D: He received the B. E. degree in 1981 from Nagoya Institute of Technology and the Dr. Eng. degree in 1993 from the University of Tsukuba. Since 1981, he has been with the Electrotechnical Laboratory, AIST, MITI, Japan. From 1990 to 1991 he was a visiting research scientist at Institute for Information Technology, NRC, Ottawa, Canada. His current research interests are multivariate analysis methods, neural networks and their applications to pattern recognition.

About this article

Cite this article

Kurita, T. Iterative weighted least squares algorithms for neural networks classifiers. New Gener Comput 12, 375–394 (1994). https://doi.org/10.1007/BF03037353

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF03037353

Keywords

Navigation