Multilayer perceptron learning control

Verley, Gilles; Asselin de Beauville, Jean Pierre

doi:10.1007/BFb0024726

Gilles Verley¹ &
Jean Pierre Asselin de Beauville¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1124))

Included in the following conference series:

European Conference on Parallel Processing

Abstract

It has been shown that, when used for pattern recognition with supervised learning, a network with one hidden layer tends to the optimal Bayesian classifier provided that three parameters simultaneously tend to certain limiting values: the sample size and the number of cells in the hidden layer must both tend to infinity and some mean error function over the learning sample must tend to its absolute minimum. When at least one of the parameters is constant (in practice the size of the learning sample), then it is no longer justified mathematically to have the other two parameters tend to the values specified above in order to improve the solution. A lot of research has gone into determining the optimal value of the number of cells in the hidden layer. In this paper, we examine, in a more global manner, the joint determination of optimal values of the two free parameters: the number of hidden cells and the mean error. We exhibit an objective factor of problem complexity: the amount of overlap between classes in the representation space. Contrary to what is generally accepted, we show that networks usually regarded as oversized despite a learning phase of limited duration regularly yield better results than smaller networks designed to reach the absolute minimum of the square error during the learning phase. This phenomenon is all the more noticeable that class overlap is high. To control this latter factor, our experiments used an original pattern recognition problem generator, also described in this paper.

Download to read the full chapter text

Chapter PDF

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Bibliographic references

BAUM, E. On the capabilities of multilayer perceptrons. Journal of complexity, vol. 4, p. 193–215. 1988.
Article Google Scholar
BAUM, E., HAUSSLER, D. What size net gives valid generalization?. Neural Computation, vol. l, nℴ1,p. 151–160. 1989.
Google Scholar
BLUM, A., LI, L. K. Approximation theory and feedforward networks. Neural networks, vol. 4, nℴ4,p. 511–516. sept. 1991.
Article Google Scholar
FAHLMAN, S. E. An empirical study of learning speed in back-propagation netwworks. Technical report nℴ CMU-CS-88-162, Carnegie Mellon university, computer science department, juin 1988.
Google Scholar
FUNAHASHI, K. I. On the approximate realization of continuous mappings by neural networks. Neural networks, Vol. 2, p. 183–192. 1989.
Article Google Scholar
HAMPSHIRE, J.B., PEARLMUTTER, B.A. Equivalence proofs for multilayer perceptron classifiers and the Bayesian discriminant function. Proc. 1990 Connectionnist Models Summer School, Morgan Kaufmann, 1991.
Google Scholar
HAUSSLER, D. Quantifying inductive bias: AI Learning algorithms and valiant's learning framework. Artificial Intelligence 36, p. 177–221. 1988.
Article Google Scholar
HECHT-NIELSEN, R. Neurocomputing, Addison-Wesley. 1990
Google Scholar
HORNIK, M., STINCHCOMBE, M., WHITE, H. Multilayer feedforward networks are universal approximators, Neural networks, vol. 2, p. 359–366. 1989.
Article Google Scholar
KANAYA, F., SHIGEKI, M. Bayes statistical behavior and generalization of pattern classifying neural networks. Cognitiva 90, North-Holland: eds T. Kohonen and F. Fogelman-Soulié, p. 35–44. 1991.
Google Scholar
MOODY, J.E. The effective number of parameters: an analysis of generalization in non linear learning systems. NIPS 4, p. 847–855. 1992.
Google Scholar
PARZEN, E. An estimation of a probability density function and mode. Ann. Math. Statist., Vol. 33, p. 1065–1076. 1962.
Google Scholar
PAUGAM-MOISY, H. A Selecting and parallelizing neural networks for improving performances. Artificial Neural Networks, Kohonen & al. editors, Elsevier Sc. Pub., North-Holland, vol. I, p. 659–664. Juin 1991.
Google Scholar
VALIANT, L. G. A theory of the learnable. Communications of the ACM 27, 1134–1142, 1984.
Article Google Scholar
VAPNIK, V. N. Estimation of dependences based on empirical data. Springer series in statistics, Springer Verlag. 1982.
Google Scholar
VERLEY, G, ASSELIN DE BEAUVILLE, J.P., RAMAT E., LECLERC N. Différences théoriques et expérimentales entre les réseaux de neurones multicouches et le classifieur bayésien optimal. Actes du colloque sur le Neuromimétisme, Lyon, p. 217–220. Juin 1994.
Google Scholar
WHITE, H. Learning in artificial neural networks: a statistical perspective. Neural comp. 1, p. 425–464, 1989.
Google Scholar

Download references

Author information

Authors and Affiliations

Laboratoire d'informatique, Ecole d'Ingénieurs en Informatique pour l'Industrie, 64, avenue Jean Portalis-BoÎte 4, 37913, Tours Cedex 9
Gilles Verley & Jean Pierre Asselin de Beauville

Authors

Gilles Verley
View author publications
You can also search for this author in PubMed Google Scholar
Jean Pierre Asselin de Beauville
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Luc Bougé Pierre Fraigniaud Anne Mignotte Yves Robert

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Verley, G., Asselin de Beauville, J.P. (1996). Multilayer perceptron learning control. In: Bougé, L., Fraigniaud, P., Mignotte, A., Robert, Y. (eds) Euro-Par'96 Parallel Processing. Euro-Par 1996. Lecture Notes in Computer Science, vol 1124. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0024726

Download citation

DOI: https://doi.org/10.1007/BFb0024726
Published: 10 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-61627-6
Online ISBN: 978-3-540-70636-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics