Dynamic sizing of multilayer perceptrons

Apolloni, B.; Ronchini, G.

doi:10.1007/BF00198911

Dynamic sizing of multilayer perceptrons

Published: May 1994

Volume 71, pages 49–63, (1994)
Cite this article

Biological Cybernetics Aims and scope Submit manuscript

B. Apolloni¹ &
G. Ronchini¹

42 Accesses
5 Citations
Explore all metrics

Abstract

This article proposes a stochastic method for determining the number of hidden nodes of a multilayer perceptron trained by a backpropagation algorithm. During the learning process, an auxiliary markovian algorithm controls the sizing of the hidden layers. As usual, the main idea is to promote the addition of nodes the closer the net is to a stall configuration, and to remove those units not sufficiently “lively”. The combined algorithm produces families of nets which converge fast towards well trained nets with a small number of nodes. Numerical experiments are performed both on conventional benchmarks and on realistic learning problems. These experiments show that for learning tasks of sufficiently high complexity, the additional (with respect to the conventional fixed architecture methods) complexity of our method is compensated by a greater velocity and a higher success percentage in obtaining the minimum of the error function.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Machine Learning: Algorithms, Real-World Applications and Research Directions

Article 22 March 2021

Machine learning and deep learning

Article Open access 08 April 2021

Development and Application of Artificial Neural Network

Article 30 December 2017

References

Aarts E, Korst J (1989) Simulated annealing and Boltzmann machines. Wiley, New York
Google Scholar
Ackley DH, Hinton GE, Sejnowski TJ (1985) A learning algorithm for Boltzmann machines. Cogn Sci 9:147
Google Scholar
Amato S, Apolloni S, Caporali P, Madesani U, Zanaboni A (1991) Simulated annealing in back-propagation. Neurocomputing 3:207–220
Google Scholar
Apolloni B, deFalco D (1990) Learning by feedforward Boltzmann machine. Proceedings of Neuronet 90 (theory oriented contribution). World Scientific, Singapore, pp 94–102.
Google Scholar
Apolloni B, Avanzini G, Cesa-Bianchi N, Ronchini G (1989) Diagnosis of Epilepsy via Backpropagation. Proc int joint conference on neural networks. Lawrence Associate, pp 571–574
Apolloni B, Bertoni B, Campadelli P, DeFalco D (1990a) Binary networks with parallel updating. In: Caianiello ER (ed) Proceedings of III workshop on ‘parallel architectures and neural networks’, Vietri, 1990. E.R. World Scientific, Singapore, pp 47–56
Google Scholar
Apolloni B, Cesa-Bianchi N, Ronchini G (1990b) Training neural networks to break the knaspsack cryptosystem. In: Caianiello ER (ed) Proceedings of III workshop on parallel architectures and neural networks, Vietri, 1990. World Scientific, Singapore, pp 377–383
Google Scholar
Apolloni B, Bertoni A, Campadelli P, DeFalco D (1991) Asymmetric Boltzmann machines. Biol Cybern 66:61–70
Google Scholar
Apolloni B, Amelloni A, Bollani G, deFalco D (1992) Some experimental results on asymmetric Boltzmann machines. In: Garrido M, Vilela Mendes R (eds) Proceedings of workshop on complexity in physics and technology, Lisbon 1991. World Scientific, Singapore, pp 151–166
Google Scholar
Ash T (1989) Dynamic node creation in backpropagation networks. Connectionist Sci 1:365–375
Google Scholar
Baum EB (1988) On the capabilities of multilayer perceptrons. J Complexity 4:193–215
Google Scholar
Bassan B, Terzi S (1993) Parameter estimation in systems of differential equations based on three-way data arrays and random time changes. Technical Report, University of Milan
Battiti R (1989) Accelerated back-propagation learning: two optimization methods. Complex Syst 3:331–342
Google Scholar
Battiti R (1992) Firstand secon order methods for learning between steepest descent and Newton's methods. Neural Comput 4:141–166
Google Scholar
Blumer A, Ehrenfeucht A, Haussler D, Warmuth M (1989) Learnability and the Vapnik-Chervonenkis dimension. J Assoc Comput Mach 36:929–947
Google Scholar
Chan LW, Fallside F (1987) An abaptable training algorithm for backpropagation networks. Comput Speech Language 2:205–218
Google Scholar
Chen JR, Mars P (1990) State size variation methods for accelerating the back-propagation algorithm. In: Caudill M (ed) Proceedings of IJCNN 90, pp 601–604
Cottrell M (1992) Mathematical analysis of a neural network with inhibitory coupling. Stoch Proc Appl 40:103–126
Google Scholar
Cybenko G (1988) Continuous valued neural networks: approximation theoretic results. Proceedings 20th Symp Interface, Alexandria, pp: 1742-183
Denker JS, LeCun Y (1991) Transforming neural-net output levels to probability distributions. In: Lippmann RP et al. (eds) Advances in neural information processing systems 3. Morgan Kaufmann, Los Altos, pp 853–859
Google Scholar
Drago GP, Ridella S (1991) An optimum weights initialization for improving scaling relationships in BP learning. ICANN-91, Espoo, Finland, pp 1519–1522
Google Scholar
Fahlman SE (1988) An empirical study of the learning speed in backpropagation networks. CMU-CS-88–162, Carnegie Mellon University, Pittsburgh
Google Scholar
Fogel DB (1991) An information criterion for optimal neural network selection. IEEE NN 2:490–497
Google Scholar
Frigerio A (1990) Simulated annealing and quantum detailed balance. J Stat Phys 58:325–354
Google Scholar
Frigerio A, Grillo G (1993) Simulated annealing with time-dependent energy function. Math Z 213:97–112
Google Scholar
Funahashi K (1989) On the approximate realization of continuous mapping by neural networks. Neural Networks 2:183–192
Google Scholar
Garey MR, Johnson DS (1979) Computer and intractability. Freeman, San Francisco
Google Scholar
Gibson GJ, Cowan CFN (1990) On the decision regions of multilayer perceptrons. Proc IEEE 78:1590–1594
Google Scholar
Goldwasser S, Micali S, Rackoff C (1985)The knowledge complexity of interactive proofs. Proceedings of 17th symposium on the theory of computation, Providence
Haussler D (1989) Generalizing the PAC model for neural net and other learning applications. Tech Rep UCSC-CRL-89-30, Santa Cruz
Hanson SJ, Pratt LY (1989) Comparing biases for minimal network construction with back-propagation. CSL Report 36, Princeton University Cognitive Science Laboratroy, Princeton
Google Scholar
Hecht-Nielsen R (1989) Theory of backpropagation neural networks. Int Joint Conf Neural Networks 1:593–606
Google Scholar
Hinton GE (1989) Connectionist learning procedures. Artif Intell 40:185
Google Scholar
Hirose Y, Yamashita K, Hijiya S (1991) Back-propagation algorithm which varies the number of hidden units. Neural Networks 4:61–67
Google Scholar
Holland JH (1975) Adaptation in natural and artificial systems. University of Michigan Press, Ann Arbor
Google Scholar
Hornik K, Stinchcombe M, White H (1989) Multi-layer feed-forward networks are universal approximators. Neural Networks 2:359–366
Google Scholar
Huang SC, Huang YF (1991) Bounds on number of hidden neurons in multilayer perceptrons. IEEE NN 2:47–55
Google Scholar
Irie B, Miyake S (1988) Capabilities of three-layered perceptrons. Proceedings of ICNN-88, San Diego, I:641–648
Google Scholar
Isaacson DI, Madesn RW (1976) Markov chains theory and applications. Krieger, Malabar, Fla
Google Scholar
Jacpbd RA (1988) Increased rates of convergence through learning rate adaptation. Neural Networks 1:295–307
Google Scholar
Ji C, Snapp RR, Psaltis D (1990) Generalizing smoothness constraints from discrete samples. Neural Comput 2:188–197
Google Scholar
Karnin ED (1990) A simple procedure for pruning back-propagation trained neural networks. IEEE NN 1:2 239–242
Google Scholar
Kitano H (1990) Designing neural networks using genetic algorithms with graph generation system. Complex Systems 4:461–476
Google Scholar
Kung SY, Hwang JN (1988) An algebraic projection analysis for optimal hidden units size and learning rate in back-propagation learning. Proceedings IEEE Int Conf Neural Networks, San Diego, pp 363–370
Lapedes A, Farber R (1986) A self-optimizing, nonsymmetrical neural net for content addressable memory and pattern recogniton. Physica 22 D:247–259
Google Scholar
LeCun Y, Denker JS, Solla SS (1991) Optimal brain damage. In: Touretzsky DS (ed) Advances in neural information processing systems 2. Morgan Kaufmann, Los Altos, pp 598–605
Google Scholar
Levi E, Tishby N, Solla S (1989) A statistical approach to learning and generalization in layered neural netowrks. Proceedings of COLT 89, pp 245–260
Lipmann RP (1987) An introduction to computing with neural nets. IEEE Trans Acoust Speech Signal Process, pp 4–22
Luo Z-Q (1991) On the convergence of the LMS algorithm with adaptive learning rate for linear feed-forward networks. Neural Comput 3:227–245
Google Scholar
MacKey DJC (1991) Bayesian methods for adaptive models. Thesis, California Institute for Technology, Pasadena
Google Scholar
Marchesi M, Orlandi G, Piazza F, Pignotti G, Uncini A (1990) Dynamic topology neural networks. In: Caianiello ER (ed) Proceedings of III workshop on parallel architectures and neural networks, Vietri, 1990. World Scientific, Singapore, pp 107–115
Google Scholar
Marcus R, Meilijson I, Taplaz H (1991) Parameter estimation in differential equations using random time transformations. TR Tel Aviv University
Mozer M, Smolensky P (1989) Using relevance to reduce network size automatically. Connectionist Sci 1:3–16
Google Scholar
Nowlan SJ, Hinton GE (1992) Simplifying neural networks by soft weight-sharing. Neural Comput 4:473–493
Google Scholar
Omlin CW, Giles CL (1993) Pruning recurrent neural networks for improved generalization performance. TR 93–6 Comp Sci Dept Rensselaer Polyt Inst, Troy, NY
Google Scholar
Poggio T, Girosi F (1990) Regularization algorithms for learning that are equivalent to multilayer networks. Science 247:978–985
Google Scholar
Prabhu NU (1965) Queues and inventories: a study of the basic stochastic processes. Wiley, New York
Google Scholar
Rumelhart DE (1988) Parallel distributed processing. Plenary lecture presented at ICNN-88, San Diego
Rumelhart DE, Hinton GE, Williams RJ (1985) Learning internal representation by error propagation. Institute of Cognitive Sciences TR 8506, UCSD, La Jolla
Google Scholar
Sartori MA, Antsaklis PJ (1991) A simple method to drive bounds on the size and to train multilayer neural networks. IEEE NN 2:467–471
Google Scholar
Sietsma J, Dow RJF (1991) Creating neural networks that generalize. Neural Networks 4:67–70
Google Scholar
Thodberg HH (1991) Improving generalization on neural networks through pruning. Int J Neural Systems 1:317–326
Google Scholar
Valiant LG (1984) A theory of learnable. Communication of ACM 27:1134–1142
Google Scholar
Vapnick VN, Chervonenkis A Ya (1968) On the uniform convergence of relative frequencies of events to their probabilities. Theory Prob Appl XVI: 264–268
Google Scholar
Vogl TP, Rigler JK, Zink WT, Alkon DL (1988) Accelerating the convergence of the back-propagation method. Biol Cybern 59:257–263
Google Scholar
Weigened AS, Rumelhart DE, Huberman BA (1991) Generalization by weight elimination with application to forecasting. In: Lippmann RP et al. (eds) Advances in neural information processing systems 3. Morgan Kaufmann, Los Altos, pp 875–882
Google Scholar

Download references

Author information

Authors and Affiliations

LAREN-DSI, Università di Milano, Via Comelico 39, I-20133, Milan, Italy
B. Apolloni & G. Ronchini

Authors

B. Apolloni
View author publications
You can also search for this author in PubMed Google Scholar
G. Ronchini
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

This work has been supported by Progetto Finalizzato Sistemi Informativi e Calcolo Parallelo of CNR under grant no. 91.00.884. PF 69

Rights and permissions

Reprints and permissions

About this article

Cite this article

Apolloni, B., Ronchini, G. Dynamic sizing of multilayer perceptrons. Biol. Cybern. 71, 49–63 (1994). https://doi.org/10.1007/BF00198911

Download citation

Received: 07 December 1992
Accepted: 23 September 1993
Issue Date: May 1994
DOI: https://doi.org/10.1007/BF00198911

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dynamic sizing of multilayer perceptrons

Abstract

Access this article

Similar content being viewed by others

Machine Learning: Algorithms, Real-World Applications and Research Directions

Machine learning and deep learning

Development and Application of Artificial Neural Network

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Dynamic sizing of multilayer perceptrons

Abstract

Access this article

Similar content being viewed by others

Machine Learning: Algorithms, Real-World Applications and Research Directions

Machine learning and deep learning

Development and Application of Artificial Neural Network

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation