Skip to main content

Advertisement

Log in

A unified mathematical form for removing neurons based on orthogonal projection and crosswise propagation

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

It is a common practice to adjust the number of hidden neurons in training, and the removal of neurons in neural networks plays an indispensable role in this architecture manipulation. In this paper, a succinct and unified mathematical form is upgraded to the generic case for removing neurons based on orthogonal projection and crosswise propagation in a feedforward layer with different architectures of neural networks, and further developed for several neural networks with different architectures. For a trained neural network, the method is divided into three stages. In the first stage, the output vectors of the feedforward observation layer are classified to clusters. In the second stage, the orthogonal projection is performed to locate a neuron whose output vector can be approximated by the other output vectors in the same cluster with the least information loss. In the third stage, the previous located neuron is removed and the crosswise propagation is implemented in each cluster. On accomplishment of the three stages, the neural network with the pruned architecture is retrained. If the number of clusters is one, the method is degenerated into its special case with only one neuron being removed. The applications to different architectures of neural networks with an extension to the support vector machine are exemplified. The methodology supports in theory large-scale applications of neural networks in the real world. In addition, with minor modifications, the unified method is instructive in pruning other networks as far as they have similar network structure to the ones in this paper. It is concluded that the unified pruning method in this paper equips us an effective and powerful tool to simplify the architecture in neural networks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Explore related subjects

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

References

  1. Bertlett EB (1994) Dynamic node architecture learning: an information theoretic approach. Neural Netw 7:129–140

    Article  Google Scholar 

  2. Cantu-Paz E (2003) Pruning neural networks with distribution estimation algorithms. In: Cantu-Paz E (ed) Lecture notes computer science, vol 2723. Springer, Heidelberg, pp 790–800

  3. Cun YL, Denker JS, Solla SA (1989) Optimal brain damage. In Proceedings of IEEE conference neural information processing systems, Denver, pp 598–605

  4. Eigel-Danielson V, Augusteijn MF (1993) Neural network pruning and its effect on generalization, some experimental results. Neural Parallel Sci Comput 1(1):59–70

    MATH  Google Scholar 

  5. Engelbrecht AP (2001) A new pruning heuristic based on variance analysis of sensitivity information. IEEE Trans Neural Netw 12(6):1386–1399

    Article  Google Scholar 

  6. Engelbrecht AP, Fetcher L, Cloete I (1999) Variance analysis of sensitivity information for pruning multilayer feedforward neural networks. In: Proceedings international joint conference neural networks, pp 379–385

  7. Fetcher L, Katkovnik V, Steffens FE (1998) Optimizing the number of hidden nodes of a feedforward artificial neural network. In: Proceedings IEEE World congress computational intelligence, Anchorage, pp 1608–1612

  8. Hagiwara M (1994) A simple and effective method for removal of hidden units and weights. Neurocomputing 6:207–218

    Article  Google Scholar 

  9. Han J, Kamber M (2001) Data mining concepts and techniques. Morgan Kaufmann, New York, p 350

    Google Scholar 

  10. Hirose Y, Koichi Y, Hijiya S (1991) Back-propagation algorithm which varies the number of hidden units. Neural Netw 4:61–66

    Article  Google Scholar 

  11. Laar V, Heskes J (1999) Pruning using parameter and neuronal metrics. Neural Comput 11:977–993

    Article  Google Scholar 

  12. Liang X (1993) Methods of digging tunnels into the error hypersurface. Neural Parallel Sci Comput 1(4):381–394

    MATH  MathSciNet  Google Scholar 

  13. Liang X (2007) Removal of hidden neurons in multilayer perceptrons by orthogonal projection and weight crosswise propagation. Neural Comput Appl 16(1):57–68

    Google Scholar 

  14. Liang X, Chen R, Yang J (2008) An architecture-adaptive neural network online control system. Neural Comput Appl 17(4):413–423

    Article  Google Scholar 

  15. Bauer FL (1971) Elimination with weighted row, combinations for solving linear equations and least square problems. In: Linear Algebra, pp 119–133

  16. Liang X, Ma L (2004) A study of removing hidden neurons in cascade-correlation neural networks. In: Proceedings IEEE international joint conference neural networks, Budapest, pp 1015–1020

  17. Hassibi B, Stork DG (1992) Second order derivatives for network pruning: optimal brain surgeon. In: Proceedings neural information processing systems, vol 5, pp 293–299

  18. Reed R (1993) Pruning algorithms—a survey. IEEE Trans Neural Netw 4(5):740–747

    Article  Google Scholar 

  19. Camargo LS, Yoneyama T (2001) Specification of training sets and the number of hidden neurons for multilayer perceptrons. Neural Comput 13:2673–2680

    Article  MATH  Google Scholar 

  20. Zhang Z, Ma X, Yang Y (2003) Bounds on the number of hidden neurons in three-layer binary neural networks. Neural Netw 16:995–1002

    Article  Google Scholar 

  21. Brown DA (1993) Solving the N-bit parity problem with only one hidden unit. Neural Netw 6:607–608

    Article  Google Scholar 

  22. Huang Z (1998) Extensions to the K-means algorithm for clustering large data sets with categorical values. Data Min Knowl Disc 2(3):283–304

    Article  Google Scholar 

  23. Liang X, Xia S, Du J (1995) How to solve N-bit encoder problem with just one hidden unit and polynomially increasing weights and thresholds. Neurocomputing 7(1):85–87

    Article  Google Scholar 

  24. Stork DG, Allen JD (1992) How to solve the N-bit parity problem with two hidden units. Neural Netw 5:923–926

    Article  Google Scholar 

  25. Stork DG, Allen JD (1993) How to solve the N-bit encoder problem with just one hidden unit. Neurocomputing 5:141–143

    Article  Google Scholar 

  26. Zhang Z, Ma X, Yang Y (2001) A unified method to construct neural network decoders for arbitrary codes and decoding rules. Discret Math 238:171–181

    Article  MATH  MathSciNet  Google Scholar 

  27. Devito CL (1990) Functional analysis and linear operator theory. Addison-Wesley, Reading

  28. Ben-Israel A, Greville TE (1974) Generalized inverses—theory and application. Wiley, New York

    Google Scholar 

  29. Setiono R (1997) A penalty-function approach for pruning feedforward neural networks. Neural Comput 9(1):185–204

    Article  MATH  Google Scholar 

  30. Cormen TH, Leiserson CE, Rivest RL, Stein C (2001) Introduction to algorithms. MIT Press, Cambridge, pp 755–756

    MATH  Google Scholar 

  31. Bojanczyk A (1984) Complexity of solving linear systems in different models of computation. SIAM J Numer Anal 21(3):591–603

    Article  MATH  MathSciNet  Google Scholar 

  32. Elman JL (1990) Finding structure in time. Cogn Sci 14:179–211

    Article  Google Scholar 

  33. Gabrijel I, Dobnikar A (2003) On-line identification and reconstruction of finite automata with generalized recurrent neural networks. Neural Netw 16:101–120

    Article  Google Scholar 

  34. Liang X-B (2002) On the analysis of a recurrent neural network for solving nonlinear monotone variational inequality problems. IEEE Trans Neural Netw 13(2):481–486

    Article  Google Scholar 

  35. Zhang Y, Ge SS (2005) Design and analysis of a general recurrent neural network model for time-varying matrix inversion. IEEE Trans Neural Netw 16(6):1477–1490

    Article  Google Scholar 

  36. Choi JY, Farrell JA (2001) Adaptive observer backstepping control using neural networks. IEEE Trans Neural Netw 12(5):1103–1112

    Article  Google Scholar 

  37. Narendra S, Parthasarathy K (1991) Gradient methods for the optimization of dynamical systems containing neural networks. IEEE Trans Neural Netw 2:252–262

    Article  Google Scholar 

  38. Tino P, Schittenkopf C, Dorffner G (2001) Financial volatility trading using recurrent neural networks. IEEE Trans Neural Netw 12(4):865–874

    Article  MathSciNet  Google Scholar 

  39. Hoehfeld M, Fahlman SE (1992) Learning with limited numerical precision using the cascade-correlation algorithm. IEEE Trans Neural Netw 3(2):602–611

    Article  Google Scholar 

  40. Burges CJC (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Disc 2:121–167

    Article  Google Scholar 

  41. Camps-Valls G, Bruzzone L (2005) Kernel-based methods for hyperspectral image classification. IEEE Trans Geosci Remote Sens 43(6):1–12

    Article  Google Scholar 

  42. Cao LJ, Tay FEH (2003) Support vector machine with adaptive parameters in financial time series forecasting. IEEE Trans Neural Netw 14(6):1506–1518

    Article  Google Scholar 

  43. Chang M, Lin C (2005) Leave-one-out bounds for support vector regression model selection. Neural Comput 17:1188–1222

    Article  MATH  Google Scholar 

  44. Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines and other Kernel-based learning methods. Cambridge Press, Cambridge

    Google Scholar 

  45. Vapnik V, Chapelle O (2000) Bounds on error expectation for support vector machines. Neural Comput 12:2013–2036

    Article  Google Scholar 

  46. Shilton A, Palaniswami M, Ralph D, Tsoi AC (2005) Incremental training of support vector machines. IEEE Trans Neural Netw 16(1):114–131

    Article  Google Scholar 

  47. Liang F (2003) An effective Bayesian neural network classifier with a comparison study to support vector machine. Neural Comput 15:1959–1989

    Article  MATH  Google Scholar 

  48. Liang X, Chen R, Guo X (2008) Pruning support vector machines without altering performances. IEEE Trans Neural Netw 19(10):1792–1803

    Article  Google Scholar 

  49. Liang X (2004) Complexity of error hypersurfaces in multilayer perceptrons with binary pattern sets. Int J Neural Syst 14(3):189–200

    Article  Google Scholar 

  50. Yu X-H (1992) Can backpropagation error surface not have local minima. IEEE Trans Neural Netw 3:1019–1021

    Article  Google Scholar 

Download references

Acknowledgments

The project was sponsored by the NSF of China under grant numbers 70571003 and 70871001, and by the 863 Project of China under grant number 2007AA01Z437.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xun Liang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liang, X., Chen, RC. A unified mathematical form for removing neurons based on orthogonal projection and crosswise propagation. Neural Comput & Applic 19, 445–457 (2010). https://doi.org/10.1007/s00521-009-0321-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-009-0321-8

Keywords