Abstract
It has been shown that, when used for pattern recognition with supervised learning, a network with one hidden layer tends to the optimal Bayesian classifier provided that three parameters simultaneously tend to certain limiting values: the sample size and the number of cells in the hidden layer must both tend to infinity and some mean error function over the learning sample must tend to its absolute minimum. When at least one of the parameters is constant (in practice the size of the learning sample), then it is no longer justified mathematically to have the other two parameters tend to the values specified above in order to improve the solution. A lot of research has gone into determining the optimal value of the number of cells in the hidden layer. In this paper, we examine, in a more global manner, the joint determination of optimal values of the two free parameters: the number of hidden cells and the mean error. We exhibit an objective factor of problem complexity: the amount of overlap between classes in the representation space. Contrary to what is generally accepted, we show that networks usually regarded as oversized despite a learning phase of limited duration regularly yield better results than smaller networks designed to reach the absolute minimum of the square error during the learning phase. This phenomenon is all the more noticeable that class overlap is high. To control this latter factor, our experiments used an original pattern recognition problem generator, also described in this paper.
Chapter PDF
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Bibliographic references
BAUM, E. On the capabilities of multilayer perceptrons. Journal of complexity, vol. 4, p. 193–215. 1988.
BAUM, E., HAUSSLER, D. What size net gives valid generalization?. Neural Computation, vol. l, nℴ1,p. 151–160. 1989.
BLUM, A., LI, L. K. Approximation theory and feedforward networks. Neural networks, vol. 4, nℴ4,p. 511–516. sept. 1991.
FAHLMAN, S. E. An empirical study of learning speed in back-propagation netwworks. Technical report nℴ CMU-CS-88-162, Carnegie Mellon university, computer science department, juin 1988.
FUNAHASHI, K. I. On the approximate realization of continuous mappings by neural networks. Neural networks, Vol. 2, p. 183–192. 1989.
HAMPSHIRE, J.B., PEARLMUTTER, B.A. Equivalence proofs for multilayer perceptron classifiers and the Bayesian discriminant function. Proc. 1990 Connectionnist Models Summer School, Morgan Kaufmann, 1991.
HAUSSLER, D. Quantifying inductive bias: AI Learning algorithms and valiant's learning framework. Artificial Intelligence 36, p. 177–221. 1988.
HECHT-NIELSEN, R. Neurocomputing, Addison-Wesley. 1990
HORNIK, M., STINCHCOMBE, M., WHITE, H. Multilayer feedforward networks are universal approximators, Neural networks, vol. 2, p. 359–366. 1989.
KANAYA, F., SHIGEKI, M. Bayes statistical behavior and generalization of pattern classifying neural networks. Cognitiva 90, North-Holland: eds T. Kohonen and F. Fogelman-Soulié, p. 35–44. 1991.
MOODY, J.E. The effective number of parameters: an analysis of generalization in non linear learning systems. NIPS 4, p. 847–855. 1992.
PARZEN, E. An estimation of a probability density function and mode. Ann. Math. Statist., Vol. 33, p. 1065–1076. 1962.
PAUGAM-MOISY, H. A Selecting and parallelizing neural networks for improving performances. Artificial Neural Networks, Kohonen & al. editors, Elsevier Sc. Pub., North-Holland, vol. I, p. 659–664. Juin 1991.
VALIANT, L. G. A theory of the learnable. Communications of the ACM 27, 1134–1142, 1984.
VAPNIK, V. N. Estimation of dependences based on empirical data. Springer series in statistics, Springer Verlag. 1982.
VERLEY, G, ASSELIN DE BEAUVILLE, J.P., RAMAT E., LECLERC N. Différences théoriques et expérimentales entre les réseaux de neurones multicouches et le classifieur bayésien optimal. Actes du colloque sur le Neuromimétisme, Lyon, p. 217–220. Juin 1994.
WHITE, H. Learning in artificial neural networks: a statistical perspective. Neural comp. 1, p. 425–464, 1989.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1996 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Verley, G., Asselin de Beauville, J.P. (1996). Multilayer perceptron learning control. In: Bougé, L., Fraigniaud, P., Mignotte, A., Robert, Y. (eds) Euro-Par'96 Parallel Processing. Euro-Par 1996. Lecture Notes in Computer Science, vol 1124. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0024726
Download citation
DOI: https://doi.org/10.1007/BFb0024726
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-61627-6
Online ISBN: 978-3-540-70636-6
eBook Packages: Springer Book Archive