Abstract
This article proposes a stochastic method for determining the number of hidden nodes of a multilayer perceptron trained by a backpropagation algorithm. During the learning process, an auxiliary markovian algorithm controls the sizing of the hidden layers. As usual, the main idea is to promote the addition of nodes the closer the net is to a stall configuration, and to remove those units not sufficiently “lively”. The combined algorithm produces families of nets which converge fast towards well trained nets with a small number of nodes. Numerical experiments are performed both on conventional benchmarks and on realistic learning problems. These experiments show that for learning tasks of sufficiently high complexity, the additional (with respect to the conventional fixed architecture methods) complexity of our method is compensated by a greater velocity and a higher success percentage in obtaining the minimum of the error function.
Similar content being viewed by others
References
Aarts E, Korst J (1989) Simulated annealing and Boltzmann machines. Wiley, New York
Ackley DH, Hinton GE, Sejnowski TJ (1985) A learning algorithm for Boltzmann machines. Cogn Sci 9:147
Amato S, Apolloni S, Caporali P, Madesani U, Zanaboni A (1991) Simulated annealing in back-propagation. Neurocomputing 3:207–220
Apolloni B, deFalco D (1990) Learning by feedforward Boltzmann machine. Proceedings of Neuronet 90 (theory oriented contribution). World Scientific, Singapore, pp 94–102.
Apolloni B, Avanzini G, Cesa-Bianchi N, Ronchini G (1989) Diagnosis of Epilepsy via Backpropagation. Proc int joint conference on neural networks. Lawrence Associate, pp 571–574
Apolloni B, Bertoni B, Campadelli P, DeFalco D (1990a) Binary networks with parallel updating. In: Caianiello ER (ed) Proceedings of III workshop on ‘parallel architectures and neural networks’, Vietri, 1990. E.R. World Scientific, Singapore, pp 47–56
Apolloni B, Cesa-Bianchi N, Ronchini G (1990b) Training neural networks to break the knaspsack cryptosystem. In: Caianiello ER (ed) Proceedings of III workshop on parallel architectures and neural networks, Vietri, 1990. World Scientific, Singapore, pp 377–383
Apolloni B, Bertoni A, Campadelli P, DeFalco D (1991) Asymmetric Boltzmann machines. Biol Cybern 66:61–70
Apolloni B, Amelloni A, Bollani G, deFalco D (1992) Some experimental results on asymmetric Boltzmann machines. In: Garrido M, Vilela Mendes R (eds) Proceedings of workshop on complexity in physics and technology, Lisbon 1991. World Scientific, Singapore, pp 151–166
Ash T (1989) Dynamic node creation in backpropagation networks. Connectionist Sci 1:365–375
Baum EB (1988) On the capabilities of multilayer perceptrons. J Complexity 4:193–215
Bassan B, Terzi S (1993) Parameter estimation in systems of differential equations based on three-way data arrays and random time changes. Technical Report, University of Milan
Battiti R (1989) Accelerated back-propagation learning: two optimization methods. Complex Syst 3:331–342
Battiti R (1992) Firstand secon order methods for learning between steepest descent and Newton's methods. Neural Comput 4:141–166
Blumer A, Ehrenfeucht A, Haussler D, Warmuth M (1989) Learnability and the Vapnik-Chervonenkis dimension. J Assoc Comput Mach 36:929–947
Chan LW, Fallside F (1987) An abaptable training algorithm for backpropagation networks. Comput Speech Language 2:205–218
Chen JR, Mars P (1990) State size variation methods for accelerating the back-propagation algorithm. In: Caudill M (ed) Proceedings of IJCNN 90, pp 601–604
Cottrell M (1992) Mathematical analysis of a neural network with inhibitory coupling. Stoch Proc Appl 40:103–126
Cybenko G (1988) Continuous valued neural networks: approximation theoretic results. Proceedings 20th Symp Interface, Alexandria, pp: 1742-183
Denker JS, LeCun Y (1991) Transforming neural-net output levels to probability distributions. In: Lippmann RP et al. (eds) Advances in neural information processing systems 3. Morgan Kaufmann, Los Altos, pp 853–859
Drago GP, Ridella S (1991) An optimum weights initialization for improving scaling relationships in BP learning. ICANN-91, Espoo, Finland, pp 1519–1522
Fahlman SE (1988) An empirical study of the learning speed in backpropagation networks. CMU-CS-88–162, Carnegie Mellon University, Pittsburgh
Fogel DB (1991) An information criterion for optimal neural network selection. IEEE NN 2:490–497
Frigerio A (1990) Simulated annealing and quantum detailed balance. J Stat Phys 58:325–354
Frigerio A, Grillo G (1993) Simulated annealing with time-dependent energy function. Math Z 213:97–112
Funahashi K (1989) On the approximate realization of continuous mapping by neural networks. Neural Networks 2:183–192
Garey MR, Johnson DS (1979) Computer and intractability. Freeman, San Francisco
Gibson GJ, Cowan CFN (1990) On the decision regions of multilayer perceptrons. Proc IEEE 78:1590–1594
Goldwasser S, Micali S, Rackoff C (1985)The knowledge complexity of interactive proofs. Proceedings of 17th symposium on the theory of computation, Providence
Haussler D (1989) Generalizing the PAC model for neural net and other learning applications. Tech Rep UCSC-CRL-89-30, Santa Cruz
Hanson SJ, Pratt LY (1989) Comparing biases for minimal network construction with back-propagation. CSL Report 36, Princeton University Cognitive Science Laboratroy, Princeton
Hecht-Nielsen R (1989) Theory of backpropagation neural networks. Int Joint Conf Neural Networks 1:593–606
Hinton GE (1989) Connectionist learning procedures. Artif Intell 40:185
Hirose Y, Yamashita K, Hijiya S (1991) Back-propagation algorithm which varies the number of hidden units. Neural Networks 4:61–67
Holland JH (1975) Adaptation in natural and artificial systems. University of Michigan Press, Ann Arbor
Hornik K, Stinchcombe M, White H (1989) Multi-layer feed-forward networks are universal approximators. Neural Networks 2:359–366
Huang SC, Huang YF (1991) Bounds on number of hidden neurons in multilayer perceptrons. IEEE NN 2:47–55
Irie B, Miyake S (1988) Capabilities of three-layered perceptrons. Proceedings of ICNN-88, San Diego, I:641–648
Isaacson DI, Madesn RW (1976) Markov chains theory and applications. Krieger, Malabar, Fla
Jacpbd RA (1988) Increased rates of convergence through learning rate adaptation. Neural Networks 1:295–307
Ji C, Snapp RR, Psaltis D (1990) Generalizing smoothness constraints from discrete samples. Neural Comput 2:188–197
Karnin ED (1990) A simple procedure for pruning back-propagation trained neural networks. IEEE NN 1:2 239–242
Kitano H (1990) Designing neural networks using genetic algorithms with graph generation system. Complex Systems 4:461–476
Kung SY, Hwang JN (1988) An algebraic projection analysis for optimal hidden units size and learning rate in back-propagation learning. Proceedings IEEE Int Conf Neural Networks, San Diego, pp 363–370
Lapedes A, Farber R (1986) A self-optimizing, nonsymmetrical neural net for content addressable memory and pattern recogniton. Physica 22 D:247–259
LeCun Y, Denker JS, Solla SS (1991) Optimal brain damage. In: Touretzsky DS (ed) Advances in neural information processing systems 2. Morgan Kaufmann, Los Altos, pp 598–605
Levi E, Tishby N, Solla S (1989) A statistical approach to learning and generalization in layered neural netowrks. Proceedings of COLT 89, pp 245–260
Lipmann RP (1987) An introduction to computing with neural nets. IEEE Trans Acoust Speech Signal Process, pp 4–22
Luo Z-Q (1991) On the convergence of the LMS algorithm with adaptive learning rate for linear feed-forward networks. Neural Comput 3:227–245
MacKey DJC (1991) Bayesian methods for adaptive models. Thesis, California Institute for Technology, Pasadena
Marchesi M, Orlandi G, Piazza F, Pignotti G, Uncini A (1990) Dynamic topology neural networks. In: Caianiello ER (ed) Proceedings of III workshop on parallel architectures and neural networks, Vietri, 1990. World Scientific, Singapore, pp 107–115
Marcus R, Meilijson I, Taplaz H (1991) Parameter estimation in differential equations using random time transformations. TR Tel Aviv University
Mozer M, Smolensky P (1989) Using relevance to reduce network size automatically. Connectionist Sci 1:3–16
Nowlan SJ, Hinton GE (1992) Simplifying neural networks by soft weight-sharing. Neural Comput 4:473–493
Omlin CW, Giles CL (1993) Pruning recurrent neural networks for improved generalization performance. TR 93–6 Comp Sci Dept Rensselaer Polyt Inst, Troy, NY
Poggio T, Girosi F (1990) Regularization algorithms for learning that are equivalent to multilayer networks. Science 247:978–985
Prabhu NU (1965) Queues and inventories: a study of the basic stochastic processes. Wiley, New York
Rumelhart DE (1988) Parallel distributed processing. Plenary lecture presented at ICNN-88, San Diego
Rumelhart DE, Hinton GE, Williams RJ (1985) Learning internal representation by error propagation. Institute of Cognitive Sciences TR 8506, UCSD, La Jolla
Sartori MA, Antsaklis PJ (1991) A simple method to drive bounds on the size and to train multilayer neural networks. IEEE NN 2:467–471
Sietsma J, Dow RJF (1991) Creating neural networks that generalize. Neural Networks 4:67–70
Thodberg HH (1991) Improving generalization on neural networks through pruning. Int J Neural Systems 1:317–326
Valiant LG (1984) A theory of learnable. Communication of ACM 27:1134–1142
Vapnick VN, Chervonenkis A Ya (1968) On the uniform convergence of relative frequencies of events to their probabilities. Theory Prob Appl XVI: 264–268
Vogl TP, Rigler JK, Zink WT, Alkon DL (1988) Accelerating the convergence of the back-propagation method. Biol Cybern 59:257–263
Weigened AS, Rumelhart DE, Huberman BA (1991) Generalization by weight elimination with application to forecasting. In: Lippmann RP et al. (eds) Advances in neural information processing systems 3. Morgan Kaufmann, Los Altos, pp 875–882
Author information
Authors and Affiliations
Additional information
This work has been supported by Progetto Finalizzato Sistemi Informativi e Calcolo Parallelo of CNR under grant no. 91.00.884. PF 69
Rights and permissions
About this article
Cite this article
Apolloni, B., Ronchini, G. Dynamic sizing of multilayer perceptrons. Biol. Cybern. 71, 49–63 (1994). https://doi.org/10.1007/BF00198911
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/BF00198911