Abstract
It is a common practice to adjust the number of hidden neurons in training, and the removal of neurons in neural networks plays an indispensable role in this architecture manipulation. In this paper, a succinct and unified mathematical form is upgraded to the generic case for removing neurons based on orthogonal projection and crosswise propagation in a feedforward layer with different architectures of neural networks, and further developed for several neural networks with different architectures. For a trained neural network, the method is divided into three stages. In the first stage, the output vectors of the feedforward observation layer are classified to clusters. In the second stage, the orthogonal projection is performed to locate a neuron whose output vector can be approximated by the other output vectors in the same cluster with the least information loss. In the third stage, the previous located neuron is removed and the crosswise propagation is implemented in each cluster. On accomplishment of the three stages, the neural network with the pruned architecture is retrained. If the number of clusters is one, the method is degenerated into its special case with only one neuron being removed. The applications to different architectures of neural networks with an extension to the support vector machine are exemplified. The methodology supports in theory large-scale applications of neural networks in the real world. In addition, with minor modifications, the unified method is instructive in pruning other networks as far as they have similar network structure to the ones in this paper. It is concluded that the unified pruning method in this paper equips us an effective and powerful tool to simplify the architecture in neural networks.








Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.References
Bertlett EB (1994) Dynamic node architecture learning: an information theoretic approach. Neural Netw 7:129–140
Cantu-Paz E (2003) Pruning neural networks with distribution estimation algorithms. In: Cantu-Paz E (ed) Lecture notes computer science, vol 2723. Springer, Heidelberg, pp 790–800
Cun YL, Denker JS, Solla SA (1989) Optimal brain damage. In Proceedings of IEEE conference neural information processing systems, Denver, pp 598–605
Eigel-Danielson V, Augusteijn MF (1993) Neural network pruning and its effect on generalization, some experimental results. Neural Parallel Sci Comput 1(1):59–70
Engelbrecht AP (2001) A new pruning heuristic based on variance analysis of sensitivity information. IEEE Trans Neural Netw 12(6):1386–1399
Engelbrecht AP, Fetcher L, Cloete I (1999) Variance analysis of sensitivity information for pruning multilayer feedforward neural networks. In: Proceedings international joint conference neural networks, pp 379–385
Fetcher L, Katkovnik V, Steffens FE (1998) Optimizing the number of hidden nodes of a feedforward artificial neural network. In: Proceedings IEEE World congress computational intelligence, Anchorage, pp 1608–1612
Hagiwara M (1994) A simple and effective method for removal of hidden units and weights. Neurocomputing 6:207–218
Han J, Kamber M (2001) Data mining concepts and techniques. Morgan Kaufmann, New York, p 350
Hirose Y, Koichi Y, Hijiya S (1991) Back-propagation algorithm which varies the number of hidden units. Neural Netw 4:61–66
Laar V, Heskes J (1999) Pruning using parameter and neuronal metrics. Neural Comput 11:977–993
Liang X (1993) Methods of digging tunnels into the error hypersurface. Neural Parallel Sci Comput 1(4):381–394
Liang X (2007) Removal of hidden neurons in multilayer perceptrons by orthogonal projection and weight crosswise propagation. Neural Comput Appl 16(1):57–68
Liang X, Chen R, Yang J (2008) An architecture-adaptive neural network online control system. Neural Comput Appl 17(4):413–423
Bauer FL (1971) Elimination with weighted row, combinations for solving linear equations and least square problems. In: Linear Algebra, pp 119–133
Liang X, Ma L (2004) A study of removing hidden neurons in cascade-correlation neural networks. In: Proceedings IEEE international joint conference neural networks, Budapest, pp 1015–1020
Hassibi B, Stork DG (1992) Second order derivatives for network pruning: optimal brain surgeon. In: Proceedings neural information processing systems, vol 5, pp 293–299
Reed R (1993) Pruning algorithms—a survey. IEEE Trans Neural Netw 4(5):740–747
Camargo LS, Yoneyama T (2001) Specification of training sets and the number of hidden neurons for multilayer perceptrons. Neural Comput 13:2673–2680
Zhang Z, Ma X, Yang Y (2003) Bounds on the number of hidden neurons in three-layer binary neural networks. Neural Netw 16:995–1002
Brown DA (1993) Solving the N-bit parity problem with only one hidden unit. Neural Netw 6:607–608
Huang Z (1998) Extensions to the K-means algorithm for clustering large data sets with categorical values. Data Min Knowl Disc 2(3):283–304
Liang X, Xia S, Du J (1995) How to solve N-bit encoder problem with just one hidden unit and polynomially increasing weights and thresholds. Neurocomputing 7(1):85–87
Stork DG, Allen JD (1992) How to solve the N-bit parity problem with two hidden units. Neural Netw 5:923–926
Stork DG, Allen JD (1993) How to solve the N-bit encoder problem with just one hidden unit. Neurocomputing 5:141–143
Zhang Z, Ma X, Yang Y (2001) A unified method to construct neural network decoders for arbitrary codes and decoding rules. Discret Math 238:171–181
Devito CL (1990) Functional analysis and linear operator theory. Addison-Wesley, Reading
Ben-Israel A, Greville TE (1974) Generalized inverses—theory and application. Wiley, New York
Setiono R (1997) A penalty-function approach for pruning feedforward neural networks. Neural Comput 9(1):185–204
Cormen TH, Leiserson CE, Rivest RL, Stein C (2001) Introduction to algorithms. MIT Press, Cambridge, pp 755–756
Bojanczyk A (1984) Complexity of solving linear systems in different models of computation. SIAM J Numer Anal 21(3):591–603
Elman JL (1990) Finding structure in time. Cogn Sci 14:179–211
Gabrijel I, Dobnikar A (2003) On-line identification and reconstruction of finite automata with generalized recurrent neural networks. Neural Netw 16:101–120
Liang X-B (2002) On the analysis of a recurrent neural network for solving nonlinear monotone variational inequality problems. IEEE Trans Neural Netw 13(2):481–486
Zhang Y, Ge SS (2005) Design and analysis of a general recurrent neural network model for time-varying matrix inversion. IEEE Trans Neural Netw 16(6):1477–1490
Choi JY, Farrell JA (2001) Adaptive observer backstepping control using neural networks. IEEE Trans Neural Netw 12(5):1103–1112
Narendra S, Parthasarathy K (1991) Gradient methods for the optimization of dynamical systems containing neural networks. IEEE Trans Neural Netw 2:252–262
Tino P, Schittenkopf C, Dorffner G (2001) Financial volatility trading using recurrent neural networks. IEEE Trans Neural Netw 12(4):865–874
Hoehfeld M, Fahlman SE (1992) Learning with limited numerical precision using the cascade-correlation algorithm. IEEE Trans Neural Netw 3(2):602–611
Burges CJC (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Disc 2:121–167
Camps-Valls G, Bruzzone L (2005) Kernel-based methods for hyperspectral image classification. IEEE Trans Geosci Remote Sens 43(6):1–12
Cao LJ, Tay FEH (2003) Support vector machine with adaptive parameters in financial time series forecasting. IEEE Trans Neural Netw 14(6):1506–1518
Chang M, Lin C (2005) Leave-one-out bounds for support vector regression model selection. Neural Comput 17:1188–1222
Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines and other Kernel-based learning methods. Cambridge Press, Cambridge
Vapnik V, Chapelle O (2000) Bounds on error expectation for support vector machines. Neural Comput 12:2013–2036
Shilton A, Palaniswami M, Ralph D, Tsoi AC (2005) Incremental training of support vector machines. IEEE Trans Neural Netw 16(1):114–131
Liang F (2003) An effective Bayesian neural network classifier with a comparison study to support vector machine. Neural Comput 15:1959–1989
Liang X, Chen R, Guo X (2008) Pruning support vector machines without altering performances. IEEE Trans Neural Netw 19(10):1792–1803
Liang X (2004) Complexity of error hypersurfaces in multilayer perceptrons with binary pattern sets. Int J Neural Syst 14(3):189–200
Yu X-H (1992) Can backpropagation error surface not have local minima. IEEE Trans Neural Netw 3:1019–1021
Acknowledgments
The project was sponsored by the NSF of China under grant numbers 70571003 and 70871001, and by the 863 Project of China under grant number 2007AA01Z437.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Liang, X., Chen, RC. A unified mathematical form for removing neurons based on orthogonal projection and crosswise propagation. Neural Comput & Applic 19, 445–457 (2010). https://doi.org/10.1007/s00521-009-0321-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-009-0321-8