Feature importance measure of a multilayer perceptron based on the presingle-connection layer

Zhang, Wenyi; Shen, Xiaohua; Zhang, Haoran; Yin, Zhaohui; Sun, Jiayu; Zhang, Xisheng; Zou, Lejun

doi:10.1007/s10115-023-01959-7

Feature importance measure of a multilayer perceptron based on the presingle-connection layer

Regular Paper
Published: 04 September 2023

Volume 66, pages 511–533, (2024)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Wenyi Zhang¹,
Xiaohua Shen¹,
Haoran Zhang¹,
Zhaohui Yin¹,
Jiayu Sun¹,
Xisheng Zhang¹ &
…
Lejun Zou¹

332 Accesses
Explore all metrics

Abstract

In many fields, the interpretability of machine learning models holds equal importance to their prediction accuracy. Highly accurate predictions are possible with a multilayer perceptron (MLP) neural network, but its application in high-risk fields is constrained by its lack of interpretability. To solve this issue, this paper introduces an MLP with a presingle-connection layer (SMLP). The SMLP incorporates a single-to-single connection layer with the ReLU function before the original MLP. By examining the weights of the single-connection layer after training the model, the significance of the input features can be determined. The experimental results demonstrate that this method can accurately measure the feature importance with the MLP. It offers advantages such as a straightforward theory, practical implementation, strong stability, and high reliability when compared with other widely used feature importance algorithms. Moreover, this measure effectively reveals the black box of the MLP, indicates the influence of input features on the prediction, and provides a quantitative standard for feature selection in MLP.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On Cross-Validation for MLP Model Evaluation

Optimal Classifier Parameter Status Selection Based on Bayes Boundary-ness for Multi-ProtoType and Multi-Layer Perceptron Classifiers

A New Multilayer Perceptron Pruning Algorithm for Classification and Regression Applications

Article 22 June 2014

References

Li J, Hassani A, Walton S, Shi H (2021) Convmlp: hierarchical convolutional mlps for vision. arXiv e-prints. https://doi.org/10.48550/ARXIV.2109.04454
He K, Gkioxari G, Dollár P, Girshick R (2020) Mask r-cnn. IEEE Trans Pattern Anal Mach Intell 42(2):386–397. https://doi.org/10.1109/TPAMI.2018.2844175
Article Google Scholar
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR). IEEE Computer Society, Los Alamitos, CA, USA, pp 770–778. https://doi.org/10.1109/CVPR.2016.90
Xiao T, Liu Y, Zhou B, Jiang Y, Sun J (2018) Unified perceptual parsing for scene understanding. In: Proceedings of the European conference on computer vision (ECCV). https://doi.org/10.48550/arXiv.1807.10221
Tolstikhin IO, Houlsby N, Kolesnikov A, Beyer L, Zhai X, Unterthiner T, Yung J, Steiner A, Keysers D, Uszkoreit J, Lucic M, Dosovitskiy A (2021) Mlp-mixer: an all-mlp architecture for vision. In: Ranzato M, Beygelzimer A, Dauphin Y, Liang PS, Vaughan JW (eds) Advances in neural information processing systems, vol 34, pp 24261–24272. https://doi.org/10.48550/arXiv.2105.01601
Desai M, Shah M (2021) An anatomization on breast cancer detection and diagnosis employing multi-layer perceptron neural network (mlp) and convolutional neural network (cnn). Clin eHealth 4:1–11. https://doi.org/10.1016/j.ceh.2020.11.002
Article Google Scholar
Meghanadh D, Kumar Maurya V, Tiwari A, Dwivedi R (2022) A multi-criteria landslide susceptibility mapping using deep multi-layer perceptron network: a case study of Srinagar-Rudraprayag Region (India). Adv Space Res 69(4):1883–1893. https://doi.org/10.1016/j.asr.2021.10.021
Article Google Scholar
Sharma R, Kim M, Gupta A (2022) Motor imagery classification in brain-machine interface with machine learning algorithms: classical approach to multi-layer perceptron model. Biomed Signal Process Control 71:103101. https://doi.org/10.1016/j.bspc.2021.103101
Article Google Scholar
Shen Z, Bi Y, Wang Y, Guo C (2020) Mlp neural network-based recursive sliding mode dynamic surface control for trajectory tracking of fully actuated surface vessel subject to unknown dynamics and input saturation. Neurocomputing 377:103–112. https://doi.org/10.1016/j.neucom.2019.08.090
Article Google Scholar
Casalicchio G, Molnar C, Bischl B (2019) Visualizing the feature importance for black box models. In: Berlingerio M, Bonchi F, Gärtner T, Hurley N, Ifrim G (eds) Machine learning and knowledge discovery in databases, pp 655–670. https://doi.org/10.1007/978-3-030-10925-7_40
Luíza da Costa N, Dias de Lima M, Barbosa R (2021) Evaluation of feature selection methods based on artificial neural network weights. Expert Syst Appl 168:114312. https://doi.org/10.1016/j.eswa.2020.114312
Article Google Scholar
Doshi-Velez F, Kim B (2017) Towards a rigorous science of interpretable machine learning. arXiv e-prints. https://doi.org/10.48550/ARXIV.1702.08608
Mihaljević B, Bielza C, Larrañaga P (2021) Bayesian networks for interpretable machine learning and optimization. Neurocomputing 456:648–665. https://doi.org/10.1016/j.neucom.2021.01.138
Article Google Scholar
Miller T (2019) Explanation in artificial intelligence: Insights from the social sciences. Artif Intell 267:1–38. https://doi.org/10.1016/j.artint.2018.07.007
Article MathSciNet Google Scholar
Stiglic G, Kocbek P, Fijacko N, Zitnik M, Verbert K, Cilar L (2020) Interpretability of machine learning-based prediction models in healthcare. WIREs Data Min Knowl Discov 10(5):1379. https://doi.org/10.1002/widm.1379
Article Google Scholar
Khemphila A, Boonjing V (2011) Heart disease classification using neural network and feature selection. In: 2011 21st international conference on systems engineering. IEEE Computer Society, Los Alamitos, CA, USA, pp 406–409. https://doi.org/10.1109/ICSEng.2011.80
Shang R, Kong J, Wang L, Zhang W, Wang C, Li Y, Jiao L (2023) Unsupervised feature selection via discrete spectral clustering and feature weights. Neurocomputing 517:106–117. https://doi.org/10.1016/j.neucom.2022.10.053
Article Google Scholar
Jerome HF (1991) Multivariate adaptive regression splines. Ann Stat 19(1):1–67. https://doi.org/10.1214/aos/1176347963
Article MathSciNet Google Scholar
Breiman L (2001) Random forests. Mach Learn 45(1):5–32. https://doi.org/10.1023/A:1010933404324
Article Google Scholar
Gevrey M, Dimopoulos I, Lek S (2003) Review and comparison of methods to study the contribution of variables in artificial neural network models. Ecol Model 160(3):249–264. https://doi.org/10.1016/S0304-3800(02)00257-0
Article Google Scholar
Ribeiro MT, Singh S, Guestrin C (2016) “why should i trust you?”: Explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. KDD’16. Association for Computing Machinery, New York, NY, USA, pp 1135–1144. https://doi.org/10.1145/2939672.2939778
Lundberg SM, Lee S-I (2017) A unified approach to interpreting model predictions. In: Guyon I, Luxburg UV, Bengio S, Wallach, H, Fergus R, Vishwanathan S, Garnett R (eds) Advances in neural information processing systems, vol 30, pp 4768–4777. https://doi.org/10.48550/arXiv.1705.07874
Fisher A, Rudin C, Dominici F (2019) All models are wrong, but many are useful: learning a variable’s importance by studying an entire class of prediction models simultaneously. J Mach Learn Res 20:177. https://doi.org/10.48550/arXiv.1801.01489
Article MathSciNet Google Scholar
Hastie T, Tibshirani R, Friedman JH, Friedman JH (2009) The elements of statistical learning: data mining, inference, and prediction, vol 2. Springer, New York. https://doi.org/10.1007/978-0-387-84858-7
Book Google Scholar
Molnar C (2020) Interpretable machine learning. Lulu.com, Morrisville
Google Scholar
Ventura F, Greco S, Apiletti D, Cerquitelli T (2022) Trusting deep learning natural-language models via local and global explanations. Knowl Inf Syst 64(7):1863–1907. https://doi.org/10.1007/s10115-022-01690-9
Article Google Scholar
Han K, Wang Y, Zhang C, Li C, Xu C (2018) Autoencoder inspired unsupervised feature selection. In: 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 2941–2945. https://doi.org/10.1109/ICASSP.2018.8462261
Wu X, Cheng Q (2020) Fractal autoencoders for feature selection. arXiv e-prints. https://doi.org/10.48550/arXiv.2010.09430
Wang S, Ding Z, Fu Y (2017) Feature selection guided auto-encoder. In: Proceedings of the thirty-first AAAI conference on artificial intelligence, pp 2725–2731. https://doi.org/10.5555/3298483.3298631
Wang X, Wang Z, Zhang Y, Jiang X, Cai Z (2022) Latent representation learning based autoencoder for unsupervised feature selection in hyperspectral imagery. Multimed Tools Appl 81:1–15. https://doi.org/10.1007/s11042-020-10474-8
Article Google Scholar
Garson GD (1991) Interpreting neural-network connection weights. AI Expert 6(4):46–51
Google Scholar
Olden JD, Joy MK, Death RG (2004) An accurate comparison of methods for quantifying variable importance in artificial neural networks using simulated data. Ecol Model 178(3):389–397. https://doi.org/10.1016/j.ecolmodel.2004.03.013
Article Google Scholar
Fischer A (2015) How to determine the unique contributions of input-variables to the nonlinear regression function of a multilayer perceptron. Ecol Model 309–310:60–63. https://doi.org/10.1016/j.ecolmodel.2015.04.015
Article Google Scholar
Kemp SJ, Zaradic P, Hansen F (2007) An approach for determining relative input parameter importance and significance in artificial neural networks. Ecol Model 204(3):326–334. https://doi.org/10.1016/j.ecolmodel.2007.01.009
Article Google Scholar
Pires dos Santos R, Dean DL, Weaver JM, Hovanski Y (2019) Identifying the relative importance of predictive variables in artificial neural networks based on data produced through a discrete event simulation of a manufacturing environment. Int J Model Simul 39(4):234–245. https://doi.org/10.1080/02286203.2018.1558736
Article Google Scholar
Cui P, Athey S (2022) Stable learning establishes some common ground between causal inference and machine learning. Nat Mach Intell 4:110–115. https://doi.org/10.1038/s42256-022-00445-z
Article Google Scholar
Bishop CM (1995) Neural networks for pattern recognition. Oxford University Press, Oxford
Book Google Scholar
Hartman EJ, Keeler JD, Kowalski JM (1990) Layered neural networks with gaussian hidden units as universal approximations. Neural Comput 2(2):210–215. https://doi.org/10.1162/neco.1990.2.2.210
Article Google Scholar
Lek S, Park YS (2008) Multilayer perceptron. Academic Press, Oxford, pp 2455–2462. https://doi.org/10.1016/B978-008045405-4.00162-2
Rumelhart DE, Hinton GE, Williams RJ (1986) Learning internal representations by error propagation. MIT Press, Cambridge, pp 318–362
Google Scholar
Hornik K, Stinchcombe M, White H (1989) Multilayer feedforward networks are universal approximators. Neural Netw 2(5):359–366. https://doi.org/10.1016/0893-6080(89)90020-8
Article Google Scholar
Zhang Z, Beck MW, Winkler DA, Huang B, Sibanda W, Goyal H (2018) Opening the black box of neural networks: methods for interpreting neural network models in clinical applications. Ann Transl Med 6(11):216. https://doi.org/10.21037/atm.2018.05.32
Article Google Scholar
Benitez JM, Castro JL, Requena I (1997) Are artificial neural networks black boxes? IEEE Trans Neural Netw 8(5):1156–1164. https://doi.org/10.1109/72.623216
Article Google Scholar
Castelvecchi D (2016) Can we open the black box of AI? Nat News 538(7623):20–23. https://doi.org/10.1038/538020a
Article Google Scholar
Dayhoff JE, DeLeo JM (2001) Artificial neural networks. Cancer 91(S8):1615–1635. https://doi.org/10.1002/1097-0142(20010415)91:8+<1615::AID-CNCR1175>3.0.CO;2-L
Article Google Scholar
Roberts JD, Caserio MC (1977) Basic principles of organic chemistry. WA Benjamin, Menlo Park
Google Scholar
Kokaly RF, Clark RN, Swayze GA, Livo KE, Hoefen TM, Pearson NC, Wise RA, Benzel WM, Lowers HA, Driscoll RL, Klein AJ (2017) Usgs spectral library version 7. Report. https://doi.org/10.3133/ds1035
Fisher RA (1921) On the ‘probable error’ of a coefficient of correlation deduced from a small sample. Metron 1:1–32
Google Scholar
Sawyer SF (2009) Analysis of variance: the fundamental concepts. J Manual Manipulat Ther 17(2):27–38. https://doi.org/10.1179/jmt.2009.17.2.27E
Article Google Scholar
Lin H, Ding H (2011) Predicting ion channels and their types by the dipeptide mode of pseudo amino acid composition. J Theor Biol 269(1):64–69. https://doi.org/10.1016/j.jtbi.2010.10.019
Article MathSciNet Google Scholar
Han J, Pei J, Kamber M (2011) Data mining: concepts and techniques. Elsevier Science, Amsterdam
Google Scholar
Al Shalabi L, Shaaban Z, Kasasbeh B (2006) Data mining: a preprocessing engine. J Comput Sci 2(9):735–739. https://doi.org/10.3844/jcssp.2006.735.739
Article Google Scholar
Patro SGK, Sahu KK (2015) Normalization: a preprocessing stage. arXiv e-prints. https://doi.org/10.48550/ARXIV.1503.06462
Hong Y, Bonhomme C, Soheilian B, Chebbo G (2017) Effects of using different sources of remote sensing and geographic information system data on urban stormwater 2d–1d modeling. Appl Sci. https://doi.org/10.3390/app7090904
Article Google Scholar
Strobl C, Boulesteix A-L, Zeileis A, Hothorn T (2007) Bias in random forest variable importance measures: illustrations, sources and a solution. BMC Bioinform 8:25. https://doi.org/10.1186/1471-2105-8-25
Article Google Scholar
Altmann A, Toloşi L, Sander O, Lengauer T (2010) Permutation importance: a corrected feature importance measure. Bioinformatics 26(10):1340–1347. https://doi.org/10.1093/bioinformatics/btq134
Article Google Scholar

Download references

Acknowledgements

This research was supported by the NATIONAL NATURAL SCIENCE FOUNDATION OF CHINA [Grant Nos. 41872214 and 42072232].

Author information

Authors and Affiliations

Key Laboratory of Geoscience Big Data and Deep Resource of Zhejiang Province, Zhejiang University, Yuhangtang Road, Hangzhou, 310058, Zhejiang, China
Wenyi Zhang, Xiaohua Shen, Haoran Zhang, Zhaohui Yin, Jiayu Sun, Xisheng Zhang & Lejun Zou

Authors

Wenyi Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaohua Shen
View author publications
You can also search for this author in PubMed Google Scholar
Haoran Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zhaohui Yin
View author publications
You can also search for this author in PubMed Google Scholar
Jiayu Sun
View author publications
You can also search for this author in PubMed Google Scholar
Xisheng Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Lejun Zou
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

WZ, LZ and XS wrote the main manuscript text. WZ and ZY helped to write code and conduct experiments. WZ, LZ, XS, HZ, JS and XZ helped to revise the paper. All authors reviewed the manuscript.

Corresponding author

Correspondence to Lejun Zou.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A

See Tables 6, 7 and 8.

Table 6 Training and testing accuracy of MLP and SMLP for simulation datasets

Full size table

Table 7 Training and testing accuracy of MLP and SMLP for real-world datasets

Full size table

Table 8 Feature importance for real-world datasets based on SMLP

Full size table

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhang, W., Shen, X., Zhang, H. et al. Feature importance measure of a multilayer perceptron based on the presingle-connection layer. Knowl Inf Syst 66, 511–533 (2024). https://doi.org/10.1007/s10115-023-01959-7

Download citation

Received: 20 January 2023
Revised: 27 July 2023
Accepted: 05 August 2023
Published: 04 September 2023
Issue Date: January 2024
DOI: https://doi.org/10.1007/s10115-023-01959-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Feature importance measure of a multilayer perceptron based on the presingle-connection layer

Abstract

Access this article

Similar content being viewed by others

On Cross-Validation for MLP Model Evaluation

Optimal Classifier Parameter Status Selection Based on Bayes Boundary-ness for Multi-ProtoType and Multi-Layer Perceptron Classifiers

A New Multilayer Perceptron Pruning Algorithm for Classification and Regression Applications

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix A

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Feature importance measure of a multilayer perceptron based on the presingle-connection layer

Abstract

Access this article

Similar content being viewed by others

On Cross-Validation for MLP Model Evaluation

Optimal Classifier Parameter Status Selection Based on Bayes Boundary-ness for Multi-ProtoType and Multi-Layer Perceptron Classifiers

A New Multilayer Perceptron Pruning Algorithm for Classification and Regression Applications

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix A

Appendix A

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation