Iterative weighted least squares algorithms for neural networks classifiers

Kurita, Takio

doi:10.1007/BF03037353

Iterative weighted least squares algorithms for neural networks classifiers

Special Issue
Published: September 1994

Volume 12, pages 375–394, (1994)
Cite this article

New Generation Computing Aims and scope Submit manuscript

Takio Kurita¹

139 Accesses
9 Citations
Explore all metrics

Abstract

This paper discusses learning algorithms of layered neural networks from the standpoint of maximum likelihood estimation. At first we discuss learning algorithms for the most simple network with only one neuron. It is shown that Fisher information of the network, namely minus expected values of Hessian matrix, is given by a weighted covariance matrix of input vectors. A learning algorithm is presented on the basis of Fisher's scoring method which makes use of Fisher information instead of Hessian matrix in Newton's method. The algorithm can be interpreted as iterations of weighted least squares method. Then these results are extended to the layered network with one hidden layer. Fisher information for the layered network is given by a weighted covariance matrix of inputs of the network and outputs of hidden units. Since Newton's method for maximization problems has the difficulty when minus Hessian matrix is not positive definite, we propose a learning algorithm which makes use of Fisher information matrix, which is non-negative, instead of Hessian matrix. Moreover, to reduce the computation of full Fisher information matrix, we propose another algorithm which uses only block diagonal elements of Fisher information. The algorithm is reduced to an iterative weighted least squares algorithm in which each unit estimates its own weights by a weighted least squares method. It is experimentally shown that the proposed algorithms converge with fewer iterations than error back-propagation (BP) algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Review of multilayer extreme learning machine neural networks

Article 19 April 2023

Robust Multilayer Perceptrons: Robust Loss Functions and Their Derivatives

Why Dose Layer-by-Layer Pre-training Improve Deep Neural Networks Learning?

References

Rumelhart, D. E., Hinton, G. E., and Williams, R. J., “Learning Representations by Back-Propagating Errors,”Nature, 323–9, pp. 533–536, 1986.
Article Google Scholar
Rumelhart, D. E., Hinton, G. E., and Williams, R. J., “Learning Internal Representations by Error Propagation,” inParallel Distributed Processing, Vol. 1 (McCleland, J. L., Rumelhart, D. E., and The PDP Research group, ed.), Cambridge, MA, MIT Press, 1986.
Google Scholar
Richard, M. D. and Lippmann, R. P., “Neural Network Classifiers Estimate Bayesiana posteriori Probabilities,”Neural Computation, 3, 4, pp. 461–483, 1991.
Article Google Scholar
Baum, E. B. and Wilczek, F., “Supervised Learning of Probability Distributions by Neural Networks,” inNeural Information Processing Systems (D. Anderson, ed.), American Institute of Physics, New York, pp. 52–61, 1988.
Google Scholar
Hinton, G. E., “Connectionist Learning Procedures,”Artificial Intelligence, 40, pp. 185–234, 1989.
Article Google Scholar
Bridle, J. S., “Training Stochastic Model Recognition Algorithms as Networks Can Lead to Maximum Mutual Information Estimation of Parameters,” inNeural Information Processing Systems 2 (David S. Touretzky, ed.), Morgan Kaufmann, pp. 211–217, 1990.
Gish, H., “A Probabilistic Approach to the Understanding and Training of Neural Network Classifiers,” inProc. of IEEE Conference on Acoustics Speech and Signal Processing, pp. 1361–1364, 1990.
Hampshire, J. B. and Waibel, A. H., “A Novel Objective Function for Improved Phoneme Recognition Using Time-Delay Neural Networks,”IEEE Trans. on Neural Networks, 1, 2, pp. 216–228, 1990.
Article Google Scholar
Holt, M. J. J. and Semanani, S., “Convergence of Back Propagation in Neural Networks Using a Log-Likelihood Cost Function,”Electron. Lett.,26,23, 1990.
Kurita, T., “A Method to Determine the Number of Hidden Units of Three Layered Neural Networks by Information Criteria,”Trans. of IEICE Japan, J73-D-II, 11, pp. 1872–1878, 1990 (in Japanese).
Google Scholar
Seber, G. A. F. and Wild, C. J.,Nonlinear Regression, John Wiley & Sons, 1989.
Kurita, T., “On Maximum Likelihood Estimation of Feed-Forward Neural Net Parameters,”IEICE Technical Report, NC91-36, 1991 (in Japanese).
McCullagh, P. and Nelder FRS, J. A.,Generalized Linear Models, Chapman and Hall, 1989.
Fletcher, R.,Practical Methods of Optimization, John Wiley & Sons, 1987.
Fahlman, S. E., “An Empirical Study of Learning Speed in Back-Propagation Networks,”Technical Report, CMU-CS-88-162, 1988.
Akaike, H., “A New Look at the Statistical Model Identification,”IEEE Trans. on Automatic Control, AC-91, 6, pp. 716–723, 1974.
Article MathSciNet Google Scholar
Rissanen, J., “A Universal Prior for Integers and Estimation by Minimum Description Length,”The Annals of Statistics, 11, 2, pp. 416–431, 1983.
Article MATH MathSciNet Google Scholar
Rissanen, J., “Stochastic Complexity and Modeling,”The Annals of Statistics, 14, 3, pp. 1080–1100, 1986.
Article MATH MathSciNet Google Scholar
Fisher, R. A., “The Use of Multiple Measurements in Taxonomic Problems,”Ann. Eugenics, 7, Part II, pp. 179–188, 1936.
Google Scholar

Download references

Author information

Authors and Affiliations

Electrotechnical Laboratory, 1-1-4 Umezono, 305, Tsukuba, Japan
Takio Kurita

Authors

Takio Kurita
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

Takio Kurita, Ph. D: He received the B. E. degree in 1981 from Nagoya Institute of Technology and the Dr. Eng. degree in 1993 from the University of Tsukuba. Since 1981, he has been with the Electrotechnical Laboratory, AIST, MITI, Japan. From 1990 to 1991 he was a visiting research scientist at Institute for Information Technology, NRC, Ottawa, Canada. His current research interests are multivariate analysis methods, neural networks and their applications to pattern recognition.

About this article

Cite this article

Kurita, T. Iterative weighted least squares algorithms for neural networks classifiers. New Gener Comput 12, 375–394 (1994). https://doi.org/10.1007/BF03037353

Download citation

Received: 29 September 1993
Revised: 01 March 1994
Issue Date: September 1994
DOI: https://doi.org/10.1007/BF03037353

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Iterative weighted least squares algorithms for neural networks classifiers

Abstract

Access this article

Similar content being viewed by others

A Review of multilayer extreme learning machine neural networks

Robust Multilayer Perceptrons: Robust Loss Functions and Their Derivatives

Why Dose Layer-by-Layer Pre-training Improve Deep Neural Networks Learning?

References

Author information

Authors and Affiliations

Additional information

About this article

Cite this article

Keywords

Navigation

Iterative weighted least squares algorithms for neural networks classifiers

Abstract

Access this article

Similar content being viewed by others

A Review of multilayer extreme learning machine neural networks

Robust Multilayer Perceptrons: Robust Loss Functions and Their Derivatives

Why Dose Layer-by-Layer Pre-training Improve Deep Neural Networks Learning?

References

Author information

Authors and Affiliations

Additional information

About this article

Cite this article

Share this article

Keywords

Search

Navigation