Skip to main content
Log in

Feature importance measure of a multilayer perceptron based on the presingle-connection layer

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

In many fields, the interpretability of machine learning models holds equal importance to their prediction accuracy. Highly accurate predictions are possible with a multilayer perceptron (MLP) neural network, but its application in high-risk fields is constrained by its lack of interpretability. To solve this issue, this paper introduces an MLP with a presingle-connection layer (SMLP). The SMLP incorporates a single-to-single connection layer with the ReLU function before the original MLP. By examining the weights of the single-connection layer after training the model, the significance of the input features can be determined. The experimental results demonstrate that this method can accurately measure the feature importance with the MLP. It offers advantages such as a straightforward theory, practical implementation, strong stability, and high reliability when compared with other widely used feature importance algorithms. Moreover, this measure effectively reveals the black box of the MLP, indicates the influence of input features on the prediction, and provides a quantitative standard for feature selection in MLP.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Li J, Hassani A, Walton S, Shi H (2021) Convmlp: hierarchical convolutional mlps for vision. arXiv e-prints. https://doi.org/10.48550/ARXIV.2109.04454

  2. He K, Gkioxari G, Dollár P, Girshick R (2020) Mask r-cnn. IEEE Trans Pattern Anal Mach Intell 42(2):386–397. https://doi.org/10.1109/TPAMI.2018.2844175

    Article  Google Scholar 

  3. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR). IEEE Computer Society, Los Alamitos, CA, USA, pp 770–778. https://doi.org/10.1109/CVPR.2016.90

  4. Xiao T, Liu Y, Zhou B, Jiang Y, Sun J (2018) Unified perceptual parsing for scene understanding. In: Proceedings of the European conference on computer vision (ECCV). https://doi.org/10.48550/arXiv.1807.10221

  5. Tolstikhin IO, Houlsby N, Kolesnikov A, Beyer L, Zhai X, Unterthiner T, Yung J, Steiner A, Keysers D, Uszkoreit J, Lucic M, Dosovitskiy A (2021) Mlp-mixer: an all-mlp architecture for vision. In: Ranzato M, Beygelzimer A, Dauphin Y, Liang PS, Vaughan JW (eds) Advances in neural information processing systems, vol 34, pp 24261–24272. https://doi.org/10.48550/arXiv.2105.01601

  6. Desai M, Shah M (2021) An anatomization on breast cancer detection and diagnosis employing multi-layer perceptron neural network (mlp) and convolutional neural network (cnn). Clin eHealth 4:1–11. https://doi.org/10.1016/j.ceh.2020.11.002

    Article  Google Scholar 

  7. Meghanadh D, Kumar Maurya V, Tiwari A, Dwivedi R (2022) A multi-criteria landslide susceptibility mapping using deep multi-layer perceptron network: a case study of Srinagar-Rudraprayag Region (India). Adv Space Res 69(4):1883–1893. https://doi.org/10.1016/j.asr.2021.10.021

    Article  Google Scholar 

  8. Sharma R, Kim M, Gupta A (2022) Motor imagery classification in brain-machine interface with machine learning algorithms: classical approach to multi-layer perceptron model. Biomed Signal Process Control 71:103101. https://doi.org/10.1016/j.bspc.2021.103101

    Article  Google Scholar 

  9. Shen Z, Bi Y, Wang Y, Guo C (2020) Mlp neural network-based recursive sliding mode dynamic surface control for trajectory tracking of fully actuated surface vessel subject to unknown dynamics and input saturation. Neurocomputing 377:103–112. https://doi.org/10.1016/j.neucom.2019.08.090

    Article  Google Scholar 

  10. Casalicchio G, Molnar C, Bischl B (2019) Visualizing the feature importance for black box models. In: Berlingerio M, Bonchi F, Gärtner T, Hurley N, Ifrim G (eds) Machine learning and knowledge discovery in databases, pp 655–670. https://doi.org/10.1007/978-3-030-10925-7_40

  11. Luíza da Costa N, Dias de Lima M, Barbosa R (2021) Evaluation of feature selection methods based on artificial neural network weights. Expert Syst Appl 168:114312. https://doi.org/10.1016/j.eswa.2020.114312

    Article  Google Scholar 

  12. Doshi-Velez F, Kim B (2017) Towards a rigorous science of interpretable machine learning. arXiv e-prints. https://doi.org/10.48550/ARXIV.1702.08608

  13. Mihaljević B, Bielza C, Larrañaga P (2021) Bayesian networks for interpretable machine learning and optimization. Neurocomputing 456:648–665. https://doi.org/10.1016/j.neucom.2021.01.138

    Article  Google Scholar 

  14. Miller T (2019) Explanation in artificial intelligence: Insights from the social sciences. Artif Intell 267:1–38. https://doi.org/10.1016/j.artint.2018.07.007

    Article  MathSciNet  Google Scholar 

  15. Stiglic G, Kocbek P, Fijacko N, Zitnik M, Verbert K, Cilar L (2020) Interpretability of machine learning-based prediction models in healthcare. WIREs Data Min Knowl Discov 10(5):1379. https://doi.org/10.1002/widm.1379

    Article  Google Scholar 

  16. Khemphila A, Boonjing V (2011) Heart disease classification using neural network and feature selection. In: 2011 21st international conference on systems engineering. IEEE Computer Society, Los Alamitos, CA, USA, pp 406–409. https://doi.org/10.1109/ICSEng.2011.80

  17. Shang R, Kong J, Wang L, Zhang W, Wang C, Li Y, Jiao L (2023) Unsupervised feature selection via discrete spectral clustering and feature weights. Neurocomputing 517:106–117. https://doi.org/10.1016/j.neucom.2022.10.053

    Article  Google Scholar 

  18. Jerome HF (1991) Multivariate adaptive regression splines. Ann Stat 19(1):1–67. https://doi.org/10.1214/aos/1176347963

    Article  MathSciNet  Google Scholar 

  19. Breiman L (2001) Random forests. Mach Learn 45(1):5–32. https://doi.org/10.1023/A:1010933404324

    Article  Google Scholar 

  20. Gevrey M, Dimopoulos I, Lek S (2003) Review and comparison of methods to study the contribution of variables in artificial neural network models. Ecol Model 160(3):249–264. https://doi.org/10.1016/S0304-3800(02)00257-0

    Article  Google Scholar 

  21. Ribeiro MT, Singh S, Guestrin C (2016) “why should i trust you?”: Explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. KDD’16. Association for Computing Machinery, New York, NY, USA, pp 1135–1144. https://doi.org/10.1145/2939672.2939778

  22. Lundberg SM, Lee S-I (2017) A unified approach to interpreting model predictions. In: Guyon I, Luxburg UV, Bengio S, Wallach, H, Fergus R, Vishwanathan S, Garnett R (eds) Advances in neural information processing systems, vol 30, pp 4768–4777. https://doi.org/10.48550/arXiv.1705.07874

  23. Fisher A, Rudin C, Dominici F (2019) All models are wrong, but many are useful: learning a variable’s importance by studying an entire class of prediction models simultaneously. J Mach Learn Res 20:177. https://doi.org/10.48550/arXiv.1801.01489

    Article  MathSciNet  Google Scholar 

  24. Hastie T, Tibshirani R, Friedman JH, Friedman JH (2009) The elements of statistical learning: data mining, inference, and prediction, vol 2. Springer, New York. https://doi.org/10.1007/978-0-387-84858-7

    Book  Google Scholar 

  25. Molnar C (2020) Interpretable machine learning. Lulu.com, Morrisville

    Google Scholar 

  26. Ventura F, Greco S, Apiletti D, Cerquitelli T (2022) Trusting deep learning natural-language models via local and global explanations. Knowl Inf Syst 64(7):1863–1907. https://doi.org/10.1007/s10115-022-01690-9

    Article  Google Scholar 

  27. Han K, Wang Y, Zhang C, Li C, Xu C (2018) Autoencoder inspired unsupervised feature selection. In: 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 2941–2945. https://doi.org/10.1109/ICASSP.2018.8462261

  28. Wu X, Cheng Q (2020) Fractal autoencoders for feature selection. arXiv e-prints. https://doi.org/10.48550/arXiv.2010.09430

  29. Wang S, Ding Z, Fu Y (2017) Feature selection guided auto-encoder. In: Proceedings of the thirty-first AAAI conference on artificial intelligence, pp 2725–2731. https://doi.org/10.5555/3298483.3298631

  30. Wang X, Wang Z, Zhang Y, Jiang X, Cai Z (2022) Latent representation learning based autoencoder for unsupervised feature selection in hyperspectral imagery. Multimed Tools Appl 81:1–15. https://doi.org/10.1007/s11042-020-10474-8

    Article  Google Scholar 

  31. Garson GD (1991) Interpreting neural-network connection weights. AI Expert 6(4):46–51

    Google Scholar 

  32. Olden JD, Joy MK, Death RG (2004) An accurate comparison of methods for quantifying variable importance in artificial neural networks using simulated data. Ecol Model 178(3):389–397. https://doi.org/10.1016/j.ecolmodel.2004.03.013

    Article  Google Scholar 

  33. Fischer A (2015) How to determine the unique contributions of input-variables to the nonlinear regression function of a multilayer perceptron. Ecol Model 309–310:60–63. https://doi.org/10.1016/j.ecolmodel.2015.04.015

    Article  Google Scholar 

  34. Kemp SJ, Zaradic P, Hansen F (2007) An approach for determining relative input parameter importance and significance in artificial neural networks. Ecol Model 204(3):326–334. https://doi.org/10.1016/j.ecolmodel.2007.01.009

    Article  Google Scholar 

  35. Pires dos Santos R, Dean DL, Weaver JM, Hovanski Y (2019) Identifying the relative importance of predictive variables in artificial neural networks based on data produced through a discrete event simulation of a manufacturing environment. Int J Model Simul 39(4):234–245. https://doi.org/10.1080/02286203.2018.1558736

    Article  Google Scholar 

  36. Cui P, Athey S (2022) Stable learning establishes some common ground between causal inference and machine learning. Nat Mach Intell 4:110–115. https://doi.org/10.1038/s42256-022-00445-z

    Article  Google Scholar 

  37. Bishop CM (1995) Neural networks for pattern recognition. Oxford University Press, Oxford

    Book  Google Scholar 

  38. Hartman EJ, Keeler JD, Kowalski JM (1990) Layered neural networks with gaussian hidden units as universal approximations. Neural Comput 2(2):210–215. https://doi.org/10.1162/neco.1990.2.2.210

    Article  Google Scholar 

  39. Lek S, Park YS (2008) Multilayer perceptron. Academic Press, Oxford, pp 2455–2462. https://doi.org/10.1016/B978-008045405-4.00162-2

  40. Rumelhart DE, Hinton GE, Williams RJ (1986) Learning internal representations by error propagation. MIT Press, Cambridge, pp 318–362

    Google Scholar 

  41. Hornik K, Stinchcombe M, White H (1989) Multilayer feedforward networks are universal approximators. Neural Netw 2(5):359–366. https://doi.org/10.1016/0893-6080(89)90020-8

    Article  Google Scholar 

  42. Zhang Z, Beck MW, Winkler DA, Huang B, Sibanda W, Goyal H (2018) Opening the black box of neural networks: methods for interpreting neural network models in clinical applications. Ann Transl Med 6(11):216. https://doi.org/10.21037/atm.2018.05.32

    Article  Google Scholar 

  43. Benitez JM, Castro JL, Requena I (1997) Are artificial neural networks black boxes? IEEE Trans Neural Netw 8(5):1156–1164. https://doi.org/10.1109/72.623216

    Article  Google Scholar 

  44. Castelvecchi D (2016) Can we open the black box of AI? Nat News 538(7623):20–23. https://doi.org/10.1038/538020a

    Article  Google Scholar 

  45. Dayhoff JE, DeLeo JM (2001) Artificial neural networks. Cancer 91(S8):1615–1635. https://doi.org/10.1002/1097-0142(20010415)91:8+<1615::AID-CNCR1175>3.0.CO;2-L

    Article  Google Scholar 

  46. Roberts JD, Caserio MC (1977) Basic principles of organic chemistry. WA Benjamin, Menlo Park

    Google Scholar 

  47. Kokaly RF, Clark RN, Swayze GA, Livo KE, Hoefen TM, Pearson NC, Wise RA, Benzel WM, Lowers HA, Driscoll RL, Klein AJ (2017) Usgs spectral library version 7. Report. https://doi.org/10.3133/ds1035

  48. Fisher RA (1921) On the ‘probable error’ of a coefficient of correlation deduced from a small sample. Metron 1:1–32

    Google Scholar 

  49. Sawyer SF (2009) Analysis of variance: the fundamental concepts. J Manual Manipulat Ther 17(2):27–38. https://doi.org/10.1179/jmt.2009.17.2.27E

    Article  Google Scholar 

  50. Lin H, Ding H (2011) Predicting ion channels and their types by the dipeptide mode of pseudo amino acid composition. J Theor Biol 269(1):64–69. https://doi.org/10.1016/j.jtbi.2010.10.019

    Article  MathSciNet  Google Scholar 

  51. Han J, Pei J, Kamber M (2011) Data mining: concepts and techniques. Elsevier Science, Amsterdam

    Google Scholar 

  52. Al Shalabi L, Shaaban Z, Kasasbeh B (2006) Data mining: a preprocessing engine. J Comput Sci 2(9):735–739. https://doi.org/10.3844/jcssp.2006.735.739

    Article  Google Scholar 

  53. Patro SGK, Sahu KK (2015) Normalization: a preprocessing stage. arXiv e-prints. https://doi.org/10.48550/ARXIV.1503.06462

  54. Hong Y, Bonhomme C, Soheilian B, Chebbo G (2017) Effects of using different sources of remote sensing and geographic information system data on urban stormwater 2d–1d modeling. Appl Sci. https://doi.org/10.3390/app7090904

    Article  Google Scholar 

  55. Strobl C, Boulesteix A-L, Zeileis A, Hothorn T (2007) Bias in random forest variable importance measures: illustrations, sources and a solution. BMC Bioinform 8:25. https://doi.org/10.1186/1471-2105-8-25

    Article  Google Scholar 

  56. Altmann A, Toloşi L, Sander O, Lengauer T (2010) Permutation importance: a corrected feature importance measure. Bioinformatics 26(10):1340–1347. https://doi.org/10.1093/bioinformatics/btq134

    Article  Google Scholar 

Download references

Acknowledgements

This research was supported by the NATIONAL NATURAL SCIENCE FOUNDATION OF CHINA [Grant Nos. 41872214 and 42072232].

Author information

Authors and Affiliations

Authors

Contributions

WZ, LZ and XS wrote the main manuscript text. WZ and ZY helped to write code and conduct experiments. WZ, LZ, XS, HZ, JS and XZ helped to revise the paper. All authors reviewed the manuscript.

Corresponding author

Correspondence to Lejun Zou.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A

Appendix A

See Tables 6, 7 and 8.

Table 6 Training and testing accuracy of MLP and SMLP for simulation datasets
Table 7 Training and testing accuracy of MLP and SMLP for real-world datasets
Table 8 Feature importance for real-world datasets based on SMLP

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, W., Shen, X., Zhang, H. et al. Feature importance measure of a multilayer perceptron based on the presingle-connection layer. Knowl Inf Syst 66, 511–533 (2024). https://doi.org/10.1007/s10115-023-01959-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-023-01959-7

Keywords

Navigation