Skip to main content
Log in

Dynamic sizing of multilayer perceptrons

  • Published:
Biological Cybernetics Aims and scope Submit manuscript

Abstract

This article proposes a stochastic method for determining the number of hidden nodes of a multilayer perceptron trained by a backpropagation algorithm. During the learning process, an auxiliary markovian algorithm controls the sizing of the hidden layers. As usual, the main idea is to promote the addition of nodes the closer the net is to a stall configuration, and to remove those units not sufficiently “lively”. The combined algorithm produces families of nets which converge fast towards well trained nets with a small number of nodes. Numerical experiments are performed both on conventional benchmarks and on realistic learning problems. These experiments show that for learning tasks of sufficiently high complexity, the additional (with respect to the conventional fixed architecture methods) complexity of our method is compensated by a greater velocity and a higher success percentage in obtaining the minimum of the error function.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Aarts E, Korst J (1989) Simulated annealing and Boltzmann machines. Wiley, New York

    Google Scholar 

  • Ackley DH, Hinton GE, Sejnowski TJ (1985) A learning algorithm for Boltzmann machines. Cogn Sci 9:147

    Google Scholar 

  • Amato S, Apolloni S, Caporali P, Madesani U, Zanaboni A (1991) Simulated annealing in back-propagation. Neurocomputing 3:207–220

    Google Scholar 

  • Apolloni B, deFalco D (1990) Learning by feedforward Boltzmann machine. Proceedings of Neuronet 90 (theory oriented contribution). World Scientific, Singapore, pp 94–102.

    Google Scholar 

  • Apolloni B, Avanzini G, Cesa-Bianchi N, Ronchini G (1989) Diagnosis of Epilepsy via Backpropagation. Proc int joint conference on neural networks. Lawrence Associate, pp 571–574

  • Apolloni B, Bertoni B, Campadelli P, DeFalco D (1990a) Binary networks with parallel updating. In: Caianiello ER (ed) Proceedings of III workshop on ‘parallel architectures and neural networks’, Vietri, 1990. E.R. World Scientific, Singapore, pp 47–56

    Google Scholar 

  • Apolloni B, Cesa-Bianchi N, Ronchini G (1990b) Training neural networks to break the knaspsack cryptosystem. In: Caianiello ER (ed) Proceedings of III workshop on parallel architectures and neural networks, Vietri, 1990. World Scientific, Singapore, pp 377–383

    Google Scholar 

  • Apolloni B, Bertoni A, Campadelli P, DeFalco D (1991) Asymmetric Boltzmann machines. Biol Cybern 66:61–70

    Google Scholar 

  • Apolloni B, Amelloni A, Bollani G, deFalco D (1992) Some experimental results on asymmetric Boltzmann machines. In: Garrido M, Vilela Mendes R (eds) Proceedings of workshop on complexity in physics and technology, Lisbon 1991. World Scientific, Singapore, pp 151–166

    Google Scholar 

  • Ash T (1989) Dynamic node creation in backpropagation networks. Connectionist Sci 1:365–375

    Google Scholar 

  • Baum EB (1988) On the capabilities of multilayer perceptrons. J Complexity 4:193–215

    Google Scholar 

  • Bassan B, Terzi S (1993) Parameter estimation in systems of differential equations based on three-way data arrays and random time changes. Technical Report, University of Milan

  • Battiti R (1989) Accelerated back-propagation learning: two optimization methods. Complex Syst 3:331–342

    Google Scholar 

  • Battiti R (1992) Firstand secon order methods for learning between steepest descent and Newton's methods. Neural Comput 4:141–166

    Google Scholar 

  • Blumer A, Ehrenfeucht A, Haussler D, Warmuth M (1989) Learnability and the Vapnik-Chervonenkis dimension. J Assoc Comput Mach 36:929–947

    Google Scholar 

  • Chan LW, Fallside F (1987) An abaptable training algorithm for backpropagation networks. Comput Speech Language 2:205–218

    Google Scholar 

  • Chen JR, Mars P (1990) State size variation methods for accelerating the back-propagation algorithm. In: Caudill M (ed) Proceedings of IJCNN 90, pp 601–604

  • Cottrell M (1992) Mathematical analysis of a neural network with inhibitory coupling. Stoch Proc Appl 40:103–126

    Google Scholar 

  • Cybenko G (1988) Continuous valued neural networks: approximation theoretic results. Proceedings 20th Symp Interface, Alexandria, pp: 1742-183

  • Denker JS, LeCun Y (1991) Transforming neural-net output levels to probability distributions. In: Lippmann RP et al. (eds) Advances in neural information processing systems 3. Morgan Kaufmann, Los Altos, pp 853–859

    Google Scholar 

  • Drago GP, Ridella S (1991) An optimum weights initialization for improving scaling relationships in BP learning. ICANN-91, Espoo, Finland, pp 1519–1522

    Google Scholar 

  • Fahlman SE (1988) An empirical study of the learning speed in backpropagation networks. CMU-CS-88–162, Carnegie Mellon University, Pittsburgh

    Google Scholar 

  • Fogel DB (1991) An information criterion for optimal neural network selection. IEEE NN 2:490–497

    Google Scholar 

  • Frigerio A (1990) Simulated annealing and quantum detailed balance. J Stat Phys 58:325–354

    Google Scholar 

  • Frigerio A, Grillo G (1993) Simulated annealing with time-dependent energy function. Math Z 213:97–112

    Google Scholar 

  • Funahashi K (1989) On the approximate realization of continuous mapping by neural networks. Neural Networks 2:183–192

    Google Scholar 

  • Garey MR, Johnson DS (1979) Computer and intractability. Freeman, San Francisco

    Google Scholar 

  • Gibson GJ, Cowan CFN (1990) On the decision regions of multilayer perceptrons. Proc IEEE 78:1590–1594

    Google Scholar 

  • Goldwasser S, Micali S, Rackoff C (1985)The knowledge complexity of interactive proofs. Proceedings of 17th symposium on the theory of computation, Providence

  • Haussler D (1989) Generalizing the PAC model for neural net and other learning applications. Tech Rep UCSC-CRL-89-30, Santa Cruz

  • Hanson SJ, Pratt LY (1989) Comparing biases for minimal network construction with back-propagation. CSL Report 36, Princeton University Cognitive Science Laboratroy, Princeton

    Google Scholar 

  • Hecht-Nielsen R (1989) Theory of backpropagation neural networks. Int Joint Conf Neural Networks 1:593–606

    Google Scholar 

  • Hinton GE (1989) Connectionist learning procedures. Artif Intell 40:185

    Google Scholar 

  • Hirose Y, Yamashita K, Hijiya S (1991) Back-propagation algorithm which varies the number of hidden units. Neural Networks 4:61–67

    Google Scholar 

  • Holland JH (1975) Adaptation in natural and artificial systems. University of Michigan Press, Ann Arbor

    Google Scholar 

  • Hornik K, Stinchcombe M, White H (1989) Multi-layer feed-forward networks are universal approximators. Neural Networks 2:359–366

    Google Scholar 

  • Huang SC, Huang YF (1991) Bounds on number of hidden neurons in multilayer perceptrons. IEEE NN 2:47–55

    Google Scholar 

  • Irie B, Miyake S (1988) Capabilities of three-layered perceptrons. Proceedings of ICNN-88, San Diego, I:641–648

    Google Scholar 

  • Isaacson DI, Madesn RW (1976) Markov chains theory and applications. Krieger, Malabar, Fla

    Google Scholar 

  • Jacpbd RA (1988) Increased rates of convergence through learning rate adaptation. Neural Networks 1:295–307

    Google Scholar 

  • Ji C, Snapp RR, Psaltis D (1990) Generalizing smoothness constraints from discrete samples. Neural Comput 2:188–197

    Google Scholar 

  • Karnin ED (1990) A simple procedure for pruning back-propagation trained neural networks. IEEE NN 1:2 239–242

    Google Scholar 

  • Kitano H (1990) Designing neural networks using genetic algorithms with graph generation system. Complex Systems 4:461–476

    Google Scholar 

  • Kung SY, Hwang JN (1988) An algebraic projection analysis for optimal hidden units size and learning rate in back-propagation learning. Proceedings IEEE Int Conf Neural Networks, San Diego, pp 363–370

  • Lapedes A, Farber R (1986) A self-optimizing, nonsymmetrical neural net for content addressable memory and pattern recogniton. Physica 22 D:247–259

    Google Scholar 

  • LeCun Y, Denker JS, Solla SS (1991) Optimal brain damage. In: Touretzsky DS (ed) Advances in neural information processing systems 2. Morgan Kaufmann, Los Altos, pp 598–605

    Google Scholar 

  • Levi E, Tishby N, Solla S (1989) A statistical approach to learning and generalization in layered neural netowrks. Proceedings of COLT 89, pp 245–260

  • Lipmann RP (1987) An introduction to computing with neural nets. IEEE Trans Acoust Speech Signal Process, pp 4–22

  • Luo Z-Q (1991) On the convergence of the LMS algorithm with adaptive learning rate for linear feed-forward networks. Neural Comput 3:227–245

    Google Scholar 

  • MacKey DJC (1991) Bayesian methods for adaptive models. Thesis, California Institute for Technology, Pasadena

    Google Scholar 

  • Marchesi M, Orlandi G, Piazza F, Pignotti G, Uncini A (1990) Dynamic topology neural networks. In: Caianiello ER (ed) Proceedings of III workshop on parallel architectures and neural networks, Vietri, 1990. World Scientific, Singapore, pp 107–115

    Google Scholar 

  • Marcus R, Meilijson I, Taplaz H (1991) Parameter estimation in differential equations using random time transformations. TR Tel Aviv University

  • Mozer M, Smolensky P (1989) Using relevance to reduce network size automatically. Connectionist Sci 1:3–16

    Google Scholar 

  • Nowlan SJ, Hinton GE (1992) Simplifying neural networks by soft weight-sharing. Neural Comput 4:473–493

    Google Scholar 

  • Omlin CW, Giles CL (1993) Pruning recurrent neural networks for improved generalization performance. TR 93–6 Comp Sci Dept Rensselaer Polyt Inst, Troy, NY

    Google Scholar 

  • Poggio T, Girosi F (1990) Regularization algorithms for learning that are equivalent to multilayer networks. Science 247:978–985

    Google Scholar 

  • Prabhu NU (1965) Queues and inventories: a study of the basic stochastic processes. Wiley, New York

    Google Scholar 

  • Rumelhart DE (1988) Parallel distributed processing. Plenary lecture presented at ICNN-88, San Diego

  • Rumelhart DE, Hinton GE, Williams RJ (1985) Learning internal representation by error propagation. Institute of Cognitive Sciences TR 8506, UCSD, La Jolla

    Google Scholar 

  • Sartori MA, Antsaklis PJ (1991) A simple method to drive bounds on the size and to train multilayer neural networks. IEEE NN 2:467–471

    Google Scholar 

  • Sietsma J, Dow RJF (1991) Creating neural networks that generalize. Neural Networks 4:67–70

    Google Scholar 

  • Thodberg HH (1991) Improving generalization on neural networks through pruning. Int J Neural Systems 1:317–326

    Google Scholar 

  • Valiant LG (1984) A theory of learnable. Communication of ACM 27:1134–1142

    Google Scholar 

  • Vapnick VN, Chervonenkis A Ya (1968) On the uniform convergence of relative frequencies of events to their probabilities. Theory Prob Appl XVI: 264–268

    Google Scholar 

  • Vogl TP, Rigler JK, Zink WT, Alkon DL (1988) Accelerating the convergence of the back-propagation method. Biol Cybern 59:257–263

    Google Scholar 

  • Weigened AS, Rumelhart DE, Huberman BA (1991) Generalization by weight elimination with application to forecasting. In: Lippmann RP et al. (eds) Advances in neural information processing systems 3. Morgan Kaufmann, Los Altos, pp 875–882

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Additional information

This work has been supported by Progetto Finalizzato Sistemi Informativi e Calcolo Parallelo of CNR under grant no. 91.00.884. PF 69

Rights and permissions

Reprints and permissions

About this article

Cite this article

Apolloni, B., Ronchini, G. Dynamic sizing of multilayer perceptrons. Biol. Cybern. 71, 49–63 (1994). https://doi.org/10.1007/BF00198911

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF00198911

Keywords

Navigation