A unified mathematical form for removing neurons based on orthogonal projection and crosswise propagation

Liang, Xun; Chen, Rong-Chang

doi:10.1007/s00521-009-0321-8

A unified mathematical form for removing neurons based on orthogonal projection and crosswise propagation

Original Article
Published: 10 November 2009

Volume 19, pages 445–457, (2010)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Xun Liang^1,2 &
Rong-Chang Chen³

180 Accesses
7 Citations
Explore all metrics

Abstract

It is a common practice to adjust the number of hidden neurons in training, and the removal of neurons in neural networks plays an indispensable role in this architecture manipulation. In this paper, a succinct and unified mathematical form is upgraded to the generic case for removing neurons based on orthogonal projection and crosswise propagation in a feedforward layer with different architectures of neural networks, and further developed for several neural networks with different architectures. For a trained neural network, the method is divided into three stages. In the first stage, the output vectors of the feedforward observation layer are classified to clusters. In the second stage, the orthogonal projection is performed to locate a neuron whose output vector can be approximated by the other output vectors in the same cluster with the least information loss. In the third stage, the previous located neuron is removed and the crosswise propagation is implemented in each cluster. On accomplishment of the three stages, the neural network with the pruned architecture is retrained. If the number of clusters is one, the method is degenerated into its special case with only one neuron being removed. The applications to different architectures of neural networks with an extension to the support vector machine are exemplified. The methodology supports in theory large-scale applications of neural networks in the real world. In addition, with minor modifications, the unified method is instructive in pruning other networks as far as they have similar network structure to the ones in this paper. It is concluded that the unified pruning method in this paper equips us an effective and powerful tool to simplify the architecture in neural networks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Combining ELM with Random Projections for Low and High Dimensional Data Classification and Clustering

Optimizing classifier performance using PCA-FLANN: a fast and reliable approach

Article 23 January 2025

Supervised data transformation and dimensionality reduction with a 3-layer multi-layer perceptron for classification problems

Article 04 January 2021

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

References

Bertlett EB (1994) Dynamic node architecture learning: an information theoretic approach. Neural Netw 7:129–140
Article Google Scholar
Cantu-Paz E (2003) Pruning neural networks with distribution estimation algorithms. In: Cantu-Paz E (ed) Lecture notes computer science, vol 2723. Springer, Heidelberg, pp 790–800
Cun YL, Denker JS, Solla SA (1989) Optimal brain damage. In Proceedings of IEEE conference neural information processing systems, Denver, pp 598–605
Eigel-Danielson V, Augusteijn MF (1993) Neural network pruning and its effect on generalization, some experimental results. Neural Parallel Sci Comput 1(1):59–70
MATH Google Scholar
Engelbrecht AP (2001) A new pruning heuristic based on variance analysis of sensitivity information. IEEE Trans Neural Netw 12(6):1386–1399
Article Google Scholar
Engelbrecht AP, Fetcher L, Cloete I (1999) Variance analysis of sensitivity information for pruning multilayer feedforward neural networks. In: Proceedings international joint conference neural networks, pp 379–385
Fetcher L, Katkovnik V, Steffens FE (1998) Optimizing the number of hidden nodes of a feedforward artificial neural network. In: Proceedings IEEE World congress computational intelligence, Anchorage, pp 1608–1612
Hagiwara M (1994) A simple and effective method for removal of hidden units and weights. Neurocomputing 6:207–218
Article Google Scholar
Han J, Kamber M (2001) Data mining concepts and techniques. Morgan Kaufmann, New York, p 350
Google Scholar
Hirose Y, Koichi Y, Hijiya S (1991) Back-propagation algorithm which varies the number of hidden units. Neural Netw 4:61–66
Article Google Scholar
Laar V, Heskes J (1999) Pruning using parameter and neuronal metrics. Neural Comput 11:977–993
Article Google Scholar
Liang X (1993) Methods of digging tunnels into the error hypersurface. Neural Parallel Sci Comput 1(4):381–394
MATH MathSciNet Google Scholar
Liang X (2007) Removal of hidden neurons in multilayer perceptrons by orthogonal projection and weight crosswise propagation. Neural Comput Appl 16(1):57–68
Google Scholar
Liang X, Chen R, Yang J (2008) An architecture-adaptive neural network online control system. Neural Comput Appl 17(4):413–423
Article Google Scholar
Bauer FL (1971) Elimination with weighted row, combinations for solving linear equations and least square problems. In: Linear Algebra, pp 119–133
Liang X, Ma L (2004) A study of removing hidden neurons in cascade-correlation neural networks. In: Proceedings IEEE international joint conference neural networks, Budapest, pp 1015–1020
Hassibi B, Stork DG (1992) Second order derivatives for network pruning: optimal brain surgeon. In: Proceedings neural information processing systems, vol 5, pp 293–299
Reed R (1993) Pruning algorithms—a survey. IEEE Trans Neural Netw 4(5):740–747
Article Google Scholar
Camargo LS, Yoneyama T (2001) Specification of training sets and the number of hidden neurons for multilayer perceptrons. Neural Comput 13:2673–2680
Article MATH Google Scholar
Zhang Z, Ma X, Yang Y (2003) Bounds on the number of hidden neurons in three-layer binary neural networks. Neural Netw 16:995–1002
Article Google Scholar
Brown DA (1993) Solving the N-bit parity problem with only one hidden unit. Neural Netw 6:607–608
Article Google Scholar
Huang Z (1998) Extensions to the K-means algorithm for clustering large data sets with categorical values. Data Min Knowl Disc 2(3):283–304
Article Google Scholar
Liang X, Xia S, Du J (1995) How to solve N-bit encoder problem with just one hidden unit and polynomially increasing weights and thresholds. Neurocomputing 7(1):85–87
Article Google Scholar
Stork DG, Allen JD (1992) How to solve the N-bit parity problem with two hidden units. Neural Netw 5:923–926
Article Google Scholar
Stork DG, Allen JD (1993) How to solve the N-bit encoder problem with just one hidden unit. Neurocomputing 5:141–143
Article Google Scholar
Zhang Z, Ma X, Yang Y (2001) A unified method to construct neural network decoders for arbitrary codes and decoding rules. Discret Math 238:171–181
Article MATH MathSciNet Google Scholar
Devito CL (1990) Functional analysis and linear operator theory. Addison-Wesley, Reading
Ben-Israel A, Greville TE (1974) Generalized inverses—theory and application. Wiley, New York
Google Scholar
Setiono R (1997) A penalty-function approach for pruning feedforward neural networks. Neural Comput 9(1):185–204
Article MATH Google Scholar
Cormen TH, Leiserson CE, Rivest RL, Stein C (2001) Introduction to algorithms. MIT Press, Cambridge, pp 755–756
MATH Google Scholar
Bojanczyk A (1984) Complexity of solving linear systems in different models of computation. SIAM J Numer Anal 21(3):591–603
Article MATH MathSciNet Google Scholar
Elman JL (1990) Finding structure in time. Cogn Sci 14:179–211
Article Google Scholar
Gabrijel I, Dobnikar A (2003) On-line identification and reconstruction of finite automata with generalized recurrent neural networks. Neural Netw 16:101–120
Article Google Scholar
Liang X-B (2002) On the analysis of a recurrent neural network for solving nonlinear monotone variational inequality problems. IEEE Trans Neural Netw 13(2):481–486
Article Google Scholar
Zhang Y, Ge SS (2005) Design and analysis of a general recurrent neural network model for time-varying matrix inversion. IEEE Trans Neural Netw 16(6):1477–1490
Article Google Scholar
Choi JY, Farrell JA (2001) Adaptive observer backstepping control using neural networks. IEEE Trans Neural Netw 12(5):1103–1112
Article Google Scholar
Narendra S, Parthasarathy K (1991) Gradient methods for the optimization of dynamical systems containing neural networks. IEEE Trans Neural Netw 2:252–262
Article Google Scholar
Tino P, Schittenkopf C, Dorffner G (2001) Financial volatility trading using recurrent neural networks. IEEE Trans Neural Netw 12(4):865–874
Article MathSciNet Google Scholar
Hoehfeld M, Fahlman SE (1992) Learning with limited numerical precision using the cascade-correlation algorithm. IEEE Trans Neural Netw 3(2):602–611
Article Google Scholar
Burges CJC (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Disc 2:121–167
Article Google Scholar
Camps-Valls G, Bruzzone L (2005) Kernel-based methods for hyperspectral image classification. IEEE Trans Geosci Remote Sens 43(6):1–12
Article Google Scholar
Cao LJ, Tay FEH (2003) Support vector machine with adaptive parameters in financial time series forecasting. IEEE Trans Neural Netw 14(6):1506–1518
Article Google Scholar
Chang M, Lin C (2005) Leave-one-out bounds for support vector regression model selection. Neural Comput 17:1188–1222
Article MATH Google Scholar
Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines and other Kernel-based learning methods. Cambridge Press, Cambridge
Google Scholar
Vapnik V, Chapelle O (2000) Bounds on error expectation for support vector machines. Neural Comput 12:2013–2036
Article Google Scholar
Shilton A, Palaniswami M, Ralph D, Tsoi AC (2005) Incremental training of support vector machines. IEEE Trans Neural Netw 16(1):114–131
Article Google Scholar
Liang F (2003) An effective Bayesian neural network classifier with a comparison study to support vector machine. Neural Comput 15:1959–1989
Article MATH Google Scholar
Liang X, Chen R, Guo X (2008) Pruning support vector machines without altering performances. IEEE Trans Neural Netw 19(10):1792–1803
Article Google Scholar
Liang X (2004) Complexity of error hypersurfaces in multilayer perceptrons with binary pattern sets. Int J Neural Syst 14(3):189–200
Article Google Scholar
Yu X-H (1992) Can backpropagation error surface not have local minima. IEEE Trans Neural Netw 3:1019–1021
Article Google Scholar

Download references

Acknowledgments

The project was sponsored by the NSF of China under grant numbers 70571003 and 70871001, and by the 863 Project of China under grant number 2007AA01Z437.

Author information

Authors and Affiliations

Institute of Computer Science and Technology, Peking University, 100871, Beijing, China
Xun Liang
The Department of Economics and Operations Research, Stanford University, Stanford, CA, 95035, USA
Xun Liang
Department of Logistics Engineering and Management, National Taichung Institute of Technology, Taichung, 404, Taiwan
Rong-Chang Chen

Authors

Xun Liang
View author publications
You can also search for this author inPubMed Google Scholar
Rong-Chang Chen
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Xun Liang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liang, X., Chen, RC. A unified mathematical form for removing neurons based on orthogonal projection and crosswise propagation. Neural Comput & Applic 19, 445–457 (2010). https://doi.org/10.1007/s00521-009-0321-8

Download citation

Received: 17 August 2007
Accepted: 15 October 2009
Published: 10 November 2009
Issue Date: April 2010
DOI: https://doi.org/10.1007/s00521-009-0321-8

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A unified mathematical form for removing neurons based on orthogonal projection and crosswise propagation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Combining ELM with Random Projections for Low and High Dimensional Data Classification and Clustering

Optimizing classifier performance using PCA-FLANN: a fast and reliable approach

Supervised data transformation and dimensionality reduction with a 3-layer multi-layer perceptron for classification problems

Explore related subjects

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now