Abstract
Machine Learning (ML) models are inherently approximate; as a result, the predictions of an ML model can be wrong. In applications where errors can jeopardize a company’s reputation, human experts often have to manually check the alarms raised by the ML models by hand, as wrong or delayed decisions can have a significant business impact. These experts often use interpretable ML tools for the verification of predictions. However, post-prediction verification is also costly. In this paper, we hypothesize that the outputs of interpretable ML tools, such as SHAP explanations, can be exploited by machine learning techniques to improve classifier performance. By doing so, the cost of the post-prediction analysis can be reduced. To confirm our intuition, we conduct several experiments where we use SHAP explanations directly as new features. In particular, by considering nine datasets, we first compare the performance of these “SHAP features” against traditional “base features” on binary classification tasks. Then, we add a second-step classifier relying on SHAP features, with the goal of reducing false-positive and false-negative results of typical classifiers. We show that SHAP explanations used as SHAP features can help to improve classification performance, especially for false-negative reduction.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
References
Antwarg, L., Miller, R.M., Shapira, B., Rokach, L.: Explaining anomalies detected by autoencoders using Shapley Additive Explanations. Expert Syst. Appl. 186, 115736 (2021)
Arslan, Y., et al.: On the suitability of SHAP explanations for refining classifications. In: Proceedings of the 14th International Conference on Agents and Artificial Intelligence (ICAART 2022) (2022)
Bank of England: Machine learning in UK financial services (2019). https://www.bankofengland.co.uk/-/media/boe/files/report/2019/machine-learning-in-uk-financial-services.pdf. Accessed Apr 2022
Becker, T.E., Robertson, M.M., Vandenberg, R.J.: Nonlinear transformations in organizational research: possible problems and potential solutions. Organ. Res. Methods 22(4), 831–866 (2019)
Berger, C., Dohoon, K.: A two-step process for detecting fraud using ADW, oracle machine learning, APEX and oracle analytics cloud (2020). https://blogs.oracle.com/machinelearning/a-two-step-process-for-detecting-fraud-using-oracle-machine-learning. Accessed Apr 2022
Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn. 30(7), 1145–1159 (1997)
Darwish, S.M.: A bio-inspired credit card fraud detection model based on user behavior analysis suitable for business management in electronic banking. J. Ambient Intell. Human. Comput. 11, 4873–48871 (2020). https://doi.org/10.1007/s12652-020-01759-9
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
Ghamizi, S., et al.: Search-based adversarial testing and improvement of constrained credit scoring systems. In: 28th ACM Joint Meeting on ESEC/FSE, pp. 1089–1100 (2020)
Misheva, B.H., Hirsa, A., Osterrieder, J., Kulkarni, O., Lin, S.F.: Explainable AI in credit risk management. Credit Risk Management, 1 March 2021
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction, vol. 2. Springer, New York (2009). https://doi.org/10.1007/978-0-387-84858-7
Ishida, T., Niu, G., Sugiyama, M.: Binary classification from positive-confidence data. In: Advances in Neural Information Processing Systems, vol. 31 (2018)
Jia, Y., Frank, E., Pfahringer, B., Bifet, A., Lim, N.: Studying and exploiting the relationship between model accuracy and explanation quality. In: Oliver, N., Pérez-Cruz, F., Kramer, S., Read, J., Lozano, J.A. (eds.) ECML PKDD 2021. LNCS (LNAI), vol. 12976, pp. 699–714. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86520-7_43
Khormuji, M.K., Bazrafkan, M., Sharifian, M., Mirabedini, S.J., Harounabadi, A.: Credit card fraud detection with a cascade artificial neural network and imperialist competitive algorithm. IJCA 96(25), 1–9 (2014)
Komatsu, M., Takada, C., Neshi, C., Unoki, T., Shikida, M.: Feature extraction with SHAP value analysis for student performance evaluation in remote collaboration. In: 2020 15th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP), pp. 1–5 (2020)
Le Borgne, Y.A., Siblini, W., Lebichot, B., Bontempi, G.: Reproducible Machine Learning for Credit Card Fraud Detection - Practical Handbook. Université Libre de Bruxelles (2022)
Li, R., et al.: Machine learning-based interpretation and visualization of nonlinear interactions in prostate cancer survival. JCO Clin. Cancer Inform. 4, 637–646 (2020)
Lin, C.F.: Application-grounded evaluation of predictive model explanation methods. Master’s thesis, Eindhoven University of Technology (2018)
Lopez-Rojas, E., Elmir, A., Axelsson, S.: PaySim: a financial mobile money simulator for fraud detection. In: 28th European Modeling and Simulation Symposium, EMSS, Larnaca, pp. 249–255. Dime University of Genoa (2016)
Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 4768–4777 (2017)
Molnar, C.: Interpretable machine learning. Lulu.com (2020)
Moro, S., Cortez, P., Rita, P.: A data-driven approach to predict the success of bank telemarketing. Decis. Support Syst. 62, 22–31 (2014)
Pascual, A., Marchini, K., Van Dyke, A.: Overcoming false positives: saving the sale and the customer relationship. White paper, Javelin strategy and research reports (2015). Accessed Apr 2022
Quigley, J., Walls, L.: Trading reliability targets within a supply chain using Shapley’s value. Reliab. Eng. Syst. Saf. 92(10), 1448–1457 (2007)
Ribeiro, M.T., Singh, S., Guestrin, C.: Why should I trust you?: Explaining the predictions of any classifier. In: ACM SIGKDD, pp. 1135–1144 (2016)
Saito, T., Rehmsmeier, M.: The precision-recall plot is more informative than the roc plot when evaluating binary classifiers on imbalanced datasets. PLoS ONE 10(3), e0118432 (2015)
Shachar, N., et al.: The importance of nonlinear transformations use in medical data analysis. JMIR Med. Inform. 6(2), e27 (2018)
Shalev-Shwartz, S., Ben-David, S.: Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, Cambridge (2014)
Shapley, L.S.: A value for n-person games. In: Contributions to the Theory of Games, vol. 2, no. 28, pp. 307–317 (1953)
Sheng, H., Shi, H., et al.: Research on cost allocation model of telecom infrastructure co-construction based on value Shapley algorithm. Int. J. Future Gener. Commun. Netw. 9(7), 165–172 (2016)
Song, C., Liu, F., Huang, Y., Wang, L., Tan, T.: Auto-encoder based data clustering. In: Ruiz-Shulcloper, J., Sanniti di Baja, G. (eds.) CIARP 2013. LNCS, vol. 8258, pp. 117–124. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41822-8_15
Tharwat, A.: Classification assessment methods. New Engl. J. Entrep. 17(1), 168–192 (2020). https://www.emerald.com/insight/content/doi/10.1016/j.aci.2018.08.003/full/html
Thejas, G., Dheeshjith, S., Iyengar, S., Sunitha, N., Badrinath, P.: A hybrid and effective learning approach for click fraud detection. Mach. Learn. Appl. 3, 100016 (2021)
Veiber, L., Allix, K., Arslan, Y., Bissyandé, T.F., Klein, J.: Challenges towards production-ready explainable machine learning. In: 2020 USENIX Conference on Operational Machine Learning (OpML 2020) (2020)
Wedge, R., Kanter, J.M., Veeramachaneni, K., Rubio, S.M., Perez, S.I.: Solving the false positives problem in fraud prediction using automated feature engineering. In: Brefeld, U., et al. (eds.) ECML PKDD 2018. LNCS (LNAI), vol. 11053, pp. 372–388. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-10997-4_23
Weerts, H.J.: Interpretable machine learning as decision support for processing fraud alerts. Ph.D. thesis, Master’s Thesis, Eindhoven University of Technology, 17 May 2019
Weerts, H.J., van Ipenburg, W., Pechenizkiy, M.: Case-based reasoning for assisting domain experts in processing fraud alerts of black-box machine learning models. In: KDD Workshop on Anomaly Detection in Finance (KDD-ADF 2019) (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 IFIP International Federation for Information Processing
About this paper
Cite this paper
Arslan, Y. et al. (2022). Towards Refined Classifications Driven by SHAP Explanations. In: Holzinger, A., Kieseberg, P., Tjoa, A.M., Weippl, E. (eds) Machine Learning and Knowledge Extraction. CD-MAKE 2022. Lecture Notes in Computer Science, vol 13480. Springer, Cham. https://doi.org/10.1007/978-3-031-14463-9_5
Download citation
DOI: https://doi.org/10.1007/978-3-031-14463-9_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-14462-2
Online ISBN: 978-3-031-14463-9
eBook Packages: Computer ScienceComputer Science (R0)