A novel weight pruning method for MLP classifiers based on the MAXCORE principle

Medeiros, Cláudio M. S.; Barreto, Guilherme A.

doi:10.1007/s00521-011-0748-6

A novel weight pruning method for MLP classifiers based on the MAXCORE principle

Cont. Dev. of Neural Compt. & Appln.
Published: 14 October 2011

Volume 22, pages 71–84, (2013)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Cláudio M. S. Medeiros¹ &
Guilherme A. Barreto²

364 Accesses
11 Citations
Explore all metrics

Abstract

We introduce a novel weight pruning methodology for MLP classifiers that can be used for model and/or feature selection purposes. The main concept underlying the proposed method is the MAXCORE principle, which is based on the observation that relevant synaptic weights tend to generate higher correlations between error signals associated with the neurons of a given layer and the error signals propagated back to the previous layer. Nonrelevant (i.e. prunable) weights tend to generate smaller correlations. Using the MAXCORE as a guiding principle, we perform a cross-correlation analysis of the error signals at successive layers. Weights for which the cross-correlations are smaller than a user-defined error tolerance are gradually discarded. Computer simulations using synthetic and real-world data sets show that the proposed method performs consistently better than standard pruning techniques, with much lower computational costs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Feature importance measure of a multilayer perceptron based on the presingle-connection layer

Article 04 September 2023

A Computationally Efficient Weight Pruning Algorithm for Artificial Neural Network Classifiers

Article 24 October 2017

A New Multilayer Perceptron Pruning Algorithm for Classification and Regression Applications

Article 22 June 2014

Notes

The AIC has the follow structure $AIC=-2\ln(\varepsilon_{\rm train})+2N_c $ [23].
Since the proposed approach is dependent on the classifier model, it belongs to the class of wrappers for feature subset selection ([16]).
Recall that the task now is feature selection, not pattern classification. Thus, we can train the network with all the available pattern vectors.

References

Aran O, Yildiz OT, Alpaydin E (2009) An incremental framework based on cross-validation for estimating the architecture of a multilayer perceptron. Int J Pattern Recogn Artif Intell 23(2):159–190
Article Google Scholar
Benardos PG, Vosniakos GC (2007) Optimizing feedforward artificial neural network architecture. Eng Appl Artif Intell 20(3):365–382
Article Google Scholar
Berthonnaud E, Dimnet J, Roussouly P, Labelle H (2005) Analysis of the sagittal balance of the spine and pelvis using shape and orientation parameters. J Spinal Disorders Tech 18(1):40–47
Article Google Scholar
Bishop CM (1992) Exact calculation of the hessian matrix for the multi-layer perceptron. Neural Comput 4(4):494–501
Article Google Scholar
Bishop CM (1995) Neural networks for pattern recognition. Oxford University Press, Oxford
Castellano G, Fanelli AM, Pelillo M (1997) An iterative pruning algorithm for feedforward neural networks. IEEE Trans Neural Netw 8(3):519–531
Article Google Scholar
Cataltepe Z, Abu-Mostafa YS, Magdon-Ismail M (1999) No free lunch for early stopping. Neural Comput 11(4):995–1009
Article Google Scholar
Curry B, Morgan PH (2006) Model selection in neural networks: some dificulties. Eur J Oper Res 170(2):567–577
Article MathSciNet MATH Google Scholar
Dandurand F, Berthiaume V, Shultz TR (2007) A systematic comparison of flat and standard cascade-correlation using a student-teacher network approximation task. Connect Sci 19(3):223–244
Article Google Scholar
Delogu R, Fanni A, Montisci A (2008) Geometrical synthesis of MLP neural networks. Neurocomputing 71:919–930
Article Google Scholar
Engelbrecht AP (2001) A new pruning heuristic based on variance analysis of sensitivity information. IEEE Trans Neural Netw 12(6):1386–1399
Article Google Scholar
Fahlman SE, Lebiere C (1990) The cascade-correlation learning architecture. In: Touretzky DS (ed) Advances in neural information processing systems. Morgan Kaufmann, San Mateo, vol 2, pp 524–532
Google Scholar
Gómez I, Franco L, Jerez JM (2009) Neural network architecture selection: can function complexity help? Neural Process Lett 30:71–87
Article Google Scholar
Hammer B, Micheli A, Sperduti A (2006) Universal approximation capability of cascade correlation for structures. Neural Comput 17(5):1109–1159
Article MathSciNet Google Scholar
Hassibi B, Stork DG (1993) Second order derivatives for network pruning: optimal brain surgeon. In: Hanson SJ, Cowan JD, Giles CL (eds) Advances in neural information processing systems. Morgan Kaufmann, San Mateo, vol 5, pp 164–171
Google Scholar
Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1–2):273–324
Article MATH Google Scholar
Littmann E, Ritter H (1996) Learning and generalization in cascade network architectures. Neural Comput 8(7):1521–1539
Article Google Scholar
Luukka P (2011) Feature selection using fuzzy entropy measures with similarity classifier. Exp Syst Appl 38(4):4600–4607
Article Google Scholar
Moustakidis S, Theocharis J (2010) SVM-FuzCoC: a novel SVM-based feature selection method using a fuzzy complementary criterion. Pattern Recogn 43(11):3712–3729
Article MATH Google Scholar
Nakamura T, Judd K, Mees AI, Small M (2006) A comparative study of information criteria for model selection. Int J Bifur Chaos 16(8):2153–2175
Article MathSciNet MATH Google Scholar
Parekh R, Yang J, Honavar V (2000) Constructive neural-network learning algorithms for pattern classification. IEEE Trans Neural Netw 11(2):436–451
Article Google Scholar
Platt JC (1998) Fast training of support vector machines using sequential minimal optimization. In: Advances in Kernel methods: support vector learning. MIT Press, Cambridge, pp 185–208
Principe JC, Euliano NR, Lefebvre WC (2000) Neural and adaptive systems. Wiley, London
Reed R (1993) Pruning algorithms—a survey. IEEE Trans Neural Netw 4(5):740–747
Article Google Scholar
Rocha M, Cortez P, Neves J (2007) Evolution of neural networks for classification and regression. Neurocomputing 70(16–18):1054–1060
Google Scholar
Rocha Neto AR, Barreto GA (2009) On the application of ensembles of classifiers to the diagnosis of pathologies of the vertebral column: a comparative analysis. IEEE Latin Am Trans 7(4):487–496
Article Google Scholar
Saxena A, Saad A (2007) Evolving an artificial neural network classifier for condition monitoring of rotating mechanical systems. Appl Soft Comput 7(1):441–454
Article Google Scholar
Seghouane AK, Amari SI (2007) The AIC criterion and symmetrizing the kullback-leibler divergence. IEEE Trans Neural Netw 18(1):97–106
Article Google Scholar
Stathakis D, Kanellopoulos I (2008) Global optimization versus deterministic pruning for the classification of remotely sensed imagery. Photogrammetr Eng Remote Sens 74(10):1259–1265
Google Scholar
Trenn S (2008) Multilayer perceptrons: approximation order and necessary number of hidden units. IEEE Trans Neural Netw 19(5):836–844
Article Google Scholar
Wan W, Mabu S, Shimada K, Hirasawa K, Hu J (2009) Enhancing the generalization ability of neural networks through controlling the hidden layers. Appl Soft Comput 9(1):404–414
Article Google Scholar
Weigend AS, Rumelhart DE, Huberman AB (1990) Generalization by weight-elimination with application to forecasting. In: Lippmann RP, Moody J, Touretzky DS (eds) Advances in neural information processing systems. Morgan Kauffman, San Mateo, vol 3, pp 875–882
Google Scholar
Xiang C, Ding SQ, Lee TH (2005) Geometric interpretation and architecture selection of the MLP. IEEE Trans Neural Netw 16(1):84–96
Article Google Scholar
Yao X (1999) Evolving artificial neural networks. Proc IEEE 87(9):1423–1447
Article Google Scholar
Yu J, Wanga S, Xi L (2008) Evolving artificial neural networks using an improved PSO and DPSO. Neurocomputing 71(4–6):1054–1060
Article Google Scholar

Download references

Acknowledgments

The authors thank Prof. Ajalmar Rêgo da Rocha Neto (Federal Institute of Ceará—IFCE) for running the experiments with the SVM classifiers on the vertebral column data set. We also thank the anonymous reviewers for their valuable suggestions for improving this paper.

Author information

Authors and Affiliations

Department of Industry, Federal Institute of Ceará, Av. Treze de Maio, 2081—Campus of Benfica, Fortaleza, Ceará, CEP 60040-531, Brazil
Cláudio M. S. Medeiros
Department of Teleinformatics Engineering, Federal University of Ceará, Av. Mister Hull, S/N—Campus of Pici, Center of Technology, CP 6005, Fortaleza, Ceará, CEP 60455-970, Brazil
Guilherme A. Barreto

Authors

Cláudio M. S. Medeiros
View author publications
You can also search for this author in PubMed Google Scholar
Guilherme A. Barreto
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guilherme A. Barreto.

Appendix

The WDE algorithm originates from a regularization method that modifies the error function by adding a term that penalizes large weights. As a consequence, Eqs. 7, 8 are now written as [23]

$$ \begin{aligned} m_{ki}(t+1) &= m_{ki}(t)\left( 1 - \frac{\lambda}{(1 + m_{ki}^2(t))^2}\right) + \eta \delta_{k}^{(o)}(t) y_{i}^{(h)}(t),\\ w_{ij}(t+1) &= w_{ij}(t)\left( 1 -\frac{ \lambda}{(1 + w_{ij}^2(t))^2}\right) + \eta \delta_{i}^{(h)}(t) x_j(t), \end{aligned} $$

where 0 < λ < 1 is a user-defined parameter.

The OBS algorithm [15] requires that the weights are ranked based on the computation of weight saliencies defined as

$$ S_i = \Updelta E_i =\frac{1}{2} \frac{\omega_i^2}{[{{\mathbf{H}}}^{-1}]_{ii}} $$

(21)

where ω_i is the ith weight (or bias) of interest and $[{\mathbf{H}}^{-1}]_{ii}$ is the ith diagonal entry of the inverse of the Hessian matrix ${\mathbf{H}} = [H_{ij}] = \frac{\partial^2 E }{\partial \omega_i \partial \omega_j}$.

Pruning by weight magnitude (PWM) is a pruning method based on the elimination of small magnitude weights ([5]). Weights are sort in increasing order of magnitude. Starting from the smallest weight, a given weight is pruned as long as its elimination does not decrease the classification rate in training data set to a value below a predefined value.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Medeiros, C.M.S., Barreto, G.A. A novel weight pruning method for MLP classifiers based on the MAXCORE principle. Neural Comput & Applic 22, 71–84 (2013). https://doi.org/10.1007/s00521-011-0748-6

Download citation

Received: 16 February 2011
Accepted: 17 September 2011
Published: 14 October 2011
Issue Date: January 2013
DOI: https://doi.org/10.1007/s00521-011-0748-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A novel weight pruning method for MLP classifiers based on the MAXCORE principle

Abstract

Access this article

Similar content being viewed by others

Feature importance measure of a multilayer perceptron based on the presingle-connection layer

A Computationally Efficient Weight Pruning Algorithm for Artificial Neural Network Classifiers

A New Multilayer Perceptron Pruning Algorithm for Classification and Regression Applications

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A novel weight pruning method for MLP classifiers based on the MAXCORE principle

Abstract

Access this article

Similar content being viewed by others

Feature importance measure of a multilayer perceptron based on the presingle-connection layer

A Computationally Efficient Weight Pruning Algorithm for Artificial Neural Network Classifiers

A New Multilayer Perceptron Pruning Algorithm for Classification and Regression Applications

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation