Abstract
The problem of reducing the size of a trained multilayer artificial neural network is addressed, and a method of removing hidden units is developed. The method is based on the idea of eliminating units and adjusting remaining weights in such a way that the network performance does not worsen over the entire training set. The pruning problem is formulated in terms of a system of linear equations, and a very efficient conjugate-gradient algorithm is used for solving it, in the least squares sense. The algorithm also provides a sub-optimal criterion for choosing the units to be removed, which is proved to work well in practice. Preliminary results over a simulated pattern recognition task are reported, which demonstrate the effectiveness of the proposed approach.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
R. P. Lippmann, “An introduction to computing with neural nets,” IEEE ASSP Mag., pp. 4–22, Apr. 1987.
J. Hertz, A. Krogh, and R. G. Palmer, Introduction to the Theory of Neural Computation. Redwood City, CA: Addison-Wesley, 1991.
Y. Chauvin, “Generalization performance of overtrained back-propagation networks,” in L. B. Almeida and C. J. Wellekens (eds.), Neural Networks — Proc. EURASIP Workshop 1990. Berlin: Springer-Verlag, 1990, pp. 46–55.
K. Hornik, M. Stinchcombe, and H. White, “Multilayer feedforward networks are universal approximators,” Neural Networks, vol. 2, pp. 359–366, 1989.
E. D. Karnin, “A simple procedure for pruning back-propagation trained neural networks,” IEEE Trans. Neural Networks, vol. 1, no. 2, pp. 239–242, 1990.
J. Sietsma and R. J. F. Dow, “Neural net pruning — Why and how,” in Proc. ICNN-88, San Diego, 1988, vol. 1, pp. 325–333.
J. Sietsma and R. J. F., Dow, “Creating artificial neural networks that generalize,” Neural Networks, vol. 4, pp. 67–79, 1991.
D. E. Rumelhart, G. E. Ilinton, and R. J. Williams, “Learning internal representations by error propagation,” in D. E. Rumelhart and J. L. McClelland (eds.), Parallel Distributed Processing, Vol. 1, Cambridge, MA: MIT Press, 1986, pp. 318–362.
G. H. Golub and C. F. Van Loan, Matrix Computations. Baltimore, MD: Johns Hopkins, 1989.
A. Björck, “Methods for sparse linear least squares problems,” in J. R. Bunch and D. J. Rose (eds.), Sparse Matrix Computations. New York: Academic, 1970, pp. 177–199.
A. Björck and T. Elfving, “Accelerated projection methods for computing pseudoinverse solutions of systems of linear equations,” BIT, vol. 19, pp. 145–163, 1979.
L. Niles, H. Silverman, G. Tajchman, and M. Bush, “How limited training data can allow a neural network to outperform an optimal statistical classifier,” in Proc. ICASSP-89, Glasgow, 1989, vol. 1, pp. 17–20.
P. Burrascano, “Learning vector quantization for the probabilistic neural network,” IEEE Trans. Neural Networks, vol. 2, no. 4, pp. 458–461, 1991.
M. J. J. Holt, “Comparison of generalization in multi-layer perceptrons with the log-likelihood and leastsquares cost functions,” in Proc. 11th ICPR, The Hague, 1992, vol. 2, pp. 17–20.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1993 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Pelillo, M., Fanelli, A.M. (1993). A method of pruning layered feed-forward neural networks. In: Mira, J., Cabestany, J., Prieto, A. (eds) New Trends in Neural Computation. IWANN 1993. Lecture Notes in Computer Science, vol 686. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-56798-4_160
Download citation
DOI: https://doi.org/10.1007/3-540-56798-4_160
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-56798-1
Online ISBN: 978-3-540-47741-9
eBook Packages: Springer Book Archive