Towards Refined Classifications Driven by SHAP Explanations

Arslan, Yusuf; Lebichot, Bertrand; Allix, Kevin; Veiber, Lisa; Lefebvre, Clément; Boytsov, Andrey; Goujon, Anne; Bissyandé, Tegawendé F.; Klein, Jacques

doi:10.1007/978-3-031-14463-9_5

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13480))

Included in the following conference series:

International Cross-Domain Conference for Machine Learning and Knowledge Extraction

1654 Accesses

Abstract

Machine Learning (ML) models are inherently approximate; as a result, the predictions of an ML model can be wrong. In applications where errors can jeopardize a company’s reputation, human experts often have to manually check the alarms raised by the ML models by hand, as wrong or delayed decisions can have a significant business impact. These experts often use interpretable ML tools for the verification of predictions. However, post-prediction verification is also costly. In this paper, we hypothesize that the outputs of interpretable ML tools, such as SHAP explanations, can be exploited by machine learning techniques to improve classifier performance. By doing so, the cost of the post-prediction analysis can be reduced. To confirm our intuition, we conduct several experiments where we use SHAP explanations directly as new features. In particular, by considering nine datasets, we first compare the performance of these “SHAP features” against traditional “base features” on binary classification tasks. Then, we add a second-step classifier relying on SHAP features, with the goal of reducing false-positive and false-negative results of typical classifiers. We show that SHAP explanations used as SHAP features can help to improve classification performance, especially for false-negative reduction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Calibrated explanations for regression

Article Open access 21 February 2025

Comparing Strategies for Post-Hoc Explanations in Machine Learning Models

Automated Machine Learning for Studying the Trade-Off Between Predictive Accuracy and Interpretability

Notes

References

Antwarg, L., Miller, R.M., Shapira, B., Rokach, L.: Explaining anomalies detected by autoencoders using Shapley Additive Explanations. Expert Syst. Appl. 186, 115736 (2021)
Article Google Scholar
Arslan, Y., et al.: On the suitability of SHAP explanations for refining classifications. In: Proceedings of the 14th International Conference on Agents and Artificial Intelligence (ICAART 2022) (2022)
Google Scholar
Bank of England: Machine learning in UK financial services (2019). https://www.bankofengland.co.uk/-/media/boe/files/report/2019/machine-learning-in-uk-financial-services.pdf. Accessed Apr 2022
Becker, T.E., Robertson, M.M., Vandenberg, R.J.: Nonlinear transformations in organizational research: possible problems and potential solutions. Organ. Res. Methods 22(4), 831–866 (2019)
Article Google Scholar
Berger, C., Dohoon, K.: A two-step process for detecting fraud using ADW, oracle machine learning, APEX and oracle analytics cloud (2020). https://blogs.oracle.com/machinelearning/a-two-step-process-for-detecting-fraud-using-oracle-machine-learning. Accessed Apr 2022
Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn. 30(7), 1145–1159 (1997)
Article Google Scholar
Darwish, S.M.: A bio-inspired credit card fraud detection model based on user behavior analysis suitable for business management in electronic banking. J. Ambient Intell. Human. Comput. 11, 4873–48871 (2020). https://doi.org/10.1007/s12652-020-01759-9
Article Google Scholar
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
MathSciNet MATH Google Scholar
Ghamizi, S., et al.: Search-based adversarial testing and improvement of constrained credit scoring systems. In: 28th ACM Joint Meeting on ESEC/FSE, pp. 1089–1100 (2020)
Google Scholar
Misheva, B.H., Hirsa, A., Osterrieder, J., Kulkarni, O., Lin, S.F.: Explainable AI in credit risk management. Credit Risk Management, 1 March 2021
Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction, vol. 2. Springer, New York (2009). https://doi.org/10.1007/978-0-387-84858-7
Book MATH Google Scholar
Ishida, T., Niu, G., Sugiyama, M.: Binary classification from positive-confidence data. In: Advances in Neural Information Processing Systems, vol. 31 (2018)
Google Scholar
Jia, Y., Frank, E., Pfahringer, B., Bifet, A., Lim, N.: Studying and exploiting the relationship between model accuracy and explanation quality. In: Oliver, N., Pérez-Cruz, F., Kramer, S., Read, J., Lozano, J.A. (eds.) ECML PKDD 2021. LNCS (LNAI), vol. 12976, pp. 699–714. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86520-7_43
Chapter Google Scholar
Khormuji, M.K., Bazrafkan, M., Sharifian, M., Mirabedini, S.J., Harounabadi, A.: Credit card fraud detection with a cascade artificial neural network and imperialist competitive algorithm. IJCA 96(25), 1–9 (2014)
Article Google Scholar
Komatsu, M., Takada, C., Neshi, C., Unoki, T., Shikida, M.: Feature extraction with SHAP value analysis for student performance evaluation in remote collaboration. In: 2020 15th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP), pp. 1–5 (2020)
Google Scholar
Le Borgne, Y.A., Siblini, W., Lebichot, B., Bontempi, G.: Reproducible Machine Learning for Credit Card Fraud Detection - Practical Handbook. Université Libre de Bruxelles (2022)
Google Scholar
Li, R., et al.: Machine learning-based interpretation and visualization of nonlinear interactions in prostate cancer survival. JCO Clin. Cancer Inform. 4, 637–646 (2020)
Article Google Scholar
Lin, C.F.: Application-grounded evaluation of predictive model explanation methods. Master’s thesis, Eindhoven University of Technology (2018)
Google Scholar
Lopez-Rojas, E., Elmir, A., Axelsson, S.: PaySim: a financial mobile money simulator for fraud detection. In: 28th European Modeling and Simulation Symposium, EMSS, Larnaca, pp. 249–255. Dime University of Genoa (2016)
Google Scholar
Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 4768–4777 (2017)
Google Scholar
Molnar, C.: Interpretable machine learning. Lulu.com (2020)
Google Scholar
Moro, S., Cortez, P., Rita, P.: A data-driven approach to predict the success of bank telemarketing. Decis. Support Syst. 62, 22–31 (2014)
Article Google Scholar
Pascual, A., Marchini, K., Van Dyke, A.: Overcoming false positives: saving the sale and the customer relationship. White paper, Javelin strategy and research reports (2015). Accessed Apr 2022
Google Scholar
Quigley, J., Walls, L.: Trading reliability targets within a supply chain using Shapley’s value. Reliab. Eng. Syst. Saf. 92(10), 1448–1457 (2007)
Article Google Scholar
Ribeiro, M.T., Singh, S., Guestrin, C.: Why should I trust you?: Explaining the predictions of any classifier. In: ACM SIGKDD, pp. 1135–1144 (2016)
Google Scholar
Saito, T., Rehmsmeier, M.: The precision-recall plot is more informative than the roc plot when evaluating binary classifiers on imbalanced datasets. PLoS ONE 10(3), e0118432 (2015)
Article Google Scholar
Shachar, N., et al.: The importance of nonlinear transformations use in medical data analysis. JMIR Med. Inform. 6(2), e27 (2018)
Article Google Scholar
Shalev-Shwartz, S., Ben-David, S.: Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, Cambridge (2014)
Book Google Scholar
Shapley, L.S.: A value for n-person games. In: Contributions to the Theory of Games, vol. 2, no. 28, pp. 307–317 (1953)
Google Scholar
Sheng, H., Shi, H., et al.: Research on cost allocation model of telecom infrastructure co-construction based on value Shapley algorithm. Int. J. Future Gener. Commun. Netw. 9(7), 165–172 (2016)
Article Google Scholar
Song, C., Liu, F., Huang, Y., Wang, L., Tan, T.: Auto-encoder based data clustering. In: Ruiz-Shulcloper, J., Sanniti di Baja, G. (eds.) CIARP 2013. LNCS, vol. 8258, pp. 117–124. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41822-8_15
Chapter Google Scholar
Tharwat, A.: Classification assessment methods. New Engl. J. Entrep. 17(1), 168–192 (2020). https://www.emerald.com/insight/content/doi/10.1016/j.aci.2018.08.003/full/html
Thejas, G., Dheeshjith, S., Iyengar, S., Sunitha, N., Badrinath, P.: A hybrid and effective learning approach for click fraud detection. Mach. Learn. Appl. 3, 100016 (2021)
Google Scholar
Veiber, L., Allix, K., Arslan, Y., Bissyandé, T.F., Klein, J.: Challenges towards production-ready explainable machine learning. In: 2020 USENIX Conference on Operational Machine Learning (OpML 2020) (2020)
Google Scholar
Wedge, R., Kanter, J.M., Veeramachaneni, K., Rubio, S.M., Perez, S.I.: Solving the false positives problem in fraud prediction using automated feature engineering. In: Brefeld, U., et al. (eds.) ECML PKDD 2018. LNCS (LNAI), vol. 11053, pp. 372–388. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-10997-4_23
Chapter Google Scholar
Weerts, H.J.: Interpretable machine learning as decision support for processing fraud alerts. Ph.D. thesis, Master’s Thesis, Eindhoven University of Technology, 17 May 2019
Google Scholar
Weerts, H.J., van Ipenburg, W., Pechenizkiy, M.: Case-based reasoning for assisting domain experts in processing fraud alerts of black-box machine learning models. In: KDD Workshop on Anomaly Detection in Finance (KDD-ADF 2019) (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

SnT – University of Luxembourg, Esch-sur-Alzette, Luxembourg
Yusuf Arslan, Bertrand Lebichot, Kevin Allix, Lisa Veiber, Tegawendé F. Bissyandé & Jacques Klein
BGL BNP Paribas, Luxembourg, Luxembourg
Clément Lefebvre, Andrey Boytsov & Anne Goujon

Authors

Yusuf Arslan
View author publications
You can also search for this author in PubMed Google Scholar
Bertrand Lebichot
View author publications
You can also search for this author in PubMed Google Scholar
Kevin Allix
View author publications
You can also search for this author in PubMed Google Scholar
Lisa Veiber
View author publications
You can also search for this author in PubMed Google Scholar
Clément Lefebvre
View author publications
You can also search for this author in PubMed Google Scholar
Andrey Boytsov
View author publications
You can also search for this author in PubMed Google Scholar
Anne Goujon
View author publications
You can also search for this author in PubMed Google Scholar
Tegawendé F. Bissyandé
View author publications
You can also search for this author in PubMed Google Scholar
Jacques Klein
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yusuf Arslan .

Editor information

Editors and Affiliations

University of Natural Resources and Life Sciences Vienna, Vienna, Austria
Andreas Holzinger
St. Pölten University of Applied Sciences, St. Pölten, Austria
Peter Kieseberg
TU Wien, Vienna, Austria
A Min Tjoa
SBA Research, Vienna, Austria
Edgar Weippl

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Arslan, Y. et al. (2022). Towards Refined Classifications Driven by SHAP Explanations. In: Holzinger, A., Kieseberg, P., Tjoa, A.M., Weippl, E. (eds) Machine Learning and Knowledge Extraction. CD-MAKE 2022. Lecture Notes in Computer Science, vol 13480. Springer, Cham. https://doi.org/10.1007/978-3-031-14463-9_5

Download citation

DOI: https://doi.org/10.1007/978-3-031-14463-9_5
Published: 11 August 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-14462-2
Online ISBN: 978-3-031-14463-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Federation for Information Processing (opens in a new tab)

Towards Refined Classifications Driven by SHAP Explanations