Explainable Fraud Detection with Deep Symbolic Classification

Visbeek, Samantha; Acar, Erman; den Hengst, Floris

doi:10.1007/978-3-031-63800-8_18

Samantha Visbeek⁷,
Erman Acar⁸ &
Floris den Hengst⁹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 2155))

Included in the following conference series:

World Conference on Explainable Artificial Intelligence

729 Accesses

Abstract

There is a growing demand for explainable, transparent, and data-driven models within the domain of fraud detection. Decisions made by the fraud detection model need to be explainable in the event of a customer dispute. Additionally, the decision-making process in the model must be transparent to win the trust of regulators, analysts, and business stakeholders. At the same time, fraud detection solutions can benefit from data due to the noisy and dynamic nature of fraud detection and the availability of large historical data sets. Finally, fraud detection is notorious for its class imbalance: there are typically several orders of magnitude more legitimate transactions than fraudulent ones. In this paper, we present Deep Symbolic Classification (DSC), an extension of the Deep Symbolic Regression framework to classification problems. DSC casts classification as a search problem in the space of all analytic functions composed of a vocabulary of variables, constants, and operations and optimizes for an arbitrary evaluation metric directly. The search is guided by a deep neural network trained with reinforcement learning. Because the functions are mathematical expressions that are in closed-form and concise, the model is inherently explainable both at the level of a single classification decision and at the model’s decision process level. Furthermore, the class imbalance problem is successfully addressed by optimizing for metrics that are robust to class imbalance such as the F1 score. This eliminates the need for problematic oversampling and undersampling techniques that plague traditional approaches. Finally, the model allows to explicitly balance between the prediction accuracy and the explainability. An evaluation on the PaySim data set demonstrates competitive predictive performance with state-of-the-art models, while surpassing them in terms of explainability. This establishes DSC as a promising model for fraud detection systems.

E. Acar and F. den Hengst—Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 74.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Enhancing financial risk prediction with symbolic classifiers: addressing class imbalance and the accuracy–interpretability trade–off

Article Open access 14 November 2024

A Comprehensive Study: Evaluating Machine Learning Algorithms with Credit Card Transaction Data

A Comparative Analysis of SHAP, LIME, ANCHORS, and DICE for Interpreting a Dense Neural Network in Credit Card Fraud Detection

Notes

1.
Source code available at https://github.com/samanthav24/DSC_Fraud_Detection.
2.
See https://www.kaggle.com/datasets/ealaxi/paysim1/discussion/ 99799.
3.
It is important to acknowledge that the insights are derived from the PaySim data set do not necessarily reflect current fraudulent behavior.
4.
https://www.abnamro.com/nl/nieuws/meer-over-financiele-criminaliteit.

References

Alarfaj, F.K., Malik, I., Khan, H.U., Almusallam, N., Ramzan, M., Ahmed, M.: Credit card fraud detection using state-of-the-art machine learning and deep learning algorithms. IEEE Access 10, 39700–39715 (2022)
Article Google Scholar
Alvarez-Melis, D., Jaakkola, T.S.: On the robustness of interpretability methods (June 2018)
Google Scholar
Aria, M., Cuccurullo, C., Gnasso, A.: A comparison among interpretative proposals for random forests. Mach. Learn. Appl. 6, 100094 (2021)
Google Scholar
Bahnsen, A.C., Aouada, D., Stojanovic, A., Ottersten, B.: Feature engineering strategies for credit card fraud detection. Expert Syst. Appl. 51, 134–142 (2016)
Article Google Scholar
Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp. 785–794. ACM, New York, NY (2016)
Google Scholar
Dal Pozzolo, A., Boracchi, G., Caelen, O., Alippi, C., Bontempi, G.: Credit card fraud detection and concept-drift adaptation with delayed supervised information. In: 2015 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE, iscataway, New Jersey (2015)
Google Scholar
Diveev, A., Shmalko, E.: Machine Learning Control by Symbolic Regression. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-83213-1
Book Google Scholar
Garreau, D., Luxburg, U.: Explaining the explainer: a first theoretical analysis of lime. In: International Conference on Artificial Intelligence and Statistics, pp. 1287–1296. Springer, Cham, Switzerland (2020)
Google Scholar
Goodman, B., Flaxman, S.: European union regulations on algorithmic decision-making and a “right to explanation’’. AI Mag. 38(3), 50–57 (2017)
Google Scholar
Hajek, P., Abedin, M.Z., Sivarajah, U.: Fraud detection in mobile payment systems using an XGBoost-based framework. Inf. Syst. Front. 162, 1–19 (2022)
Google Scholar
Junger, M., Wang, V., Schlömer, M.: Fraud against businesses both online and offline: crime scripts, business characteristics, efforts, and benefits. Crime Sci. 9(1), 13 (2020)
Article Google Scholar
Kamienny, P.A., d’Ascoli, S., Lample, G., Charton, F.: End-to-end symbolic regression with transformers. Proc. NeurIPS 35, 10269–10281 (2022)
Google Scholar
Kim, E., et al.: Champion-challenger analysis for credit card fraud detection: hybrid ensemble and deep learning. Expert Syst. Appl. 128, 214–224 (2019)
Article Google Scholar
Koza, J.R.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge, MA, USA (1992)
Google Scholar
Kumar, I.E., Venkatasubramanian, S., Scheidegger, C., Friedler, S.: Problems with shapley-value-based explanations as feature importance measures. In: International Conference on Machine Learning, pp. 5491–5500. PMLR, Vienna, Austria (2020)
Google Scholar
La Cava, W., Orzechowski, P., Burlacu, B., de Franca, F.O., Virgolin, M., Jin, Y., Kommenda, M., Moore, J.H.: Contemporary symbolic regression methods and their relative performance. In: Thirty-fifth Conference on Neural Information Processing Systems. PMLR, online (2021)
Google Scholar
Landajuela, M., et al.: A unified framework for deep symbolic regression. Proc. NeurIPS 35, 33985–33998 (2022)
Google Scholar
Liu, C., Arnon, T., Lazarus, C., Strong, C., Barrett, C., Kochenderfer, M.J., et al.: Algorithms for verifying deep neural networks. Found. Trends® in Optimization 4(3–4), 244–404 (2021)
Google Scholar
Lopez-Rojas, E., Elmir, A., Axelsson, S.: PaySim: a financial mobile money simulator for fraud detection. In: 28th European Modeling and Simulation Symposium, EMSS, Larnaca, pp. 249–255 (2016)
Google Scholar
Mainali, P., Psychoula, I., Petitcolas, F.A.: ExMo: Explainable AI Model using inverse frequency decision rules. In: International Conference on Human-Computer Interaction, vol. 13336, pp. 179–198. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-05643-7_12
Mundhenk, T.N., Landajuela, M., Glatt, R., Santiago, C.P., Faissol, D.M., Petersen, B.K.: Symbolic regression via neural-guided genetic programming population seeding (2021)
Google Scholar
Nesvijevskaia, A., Ouillade, S., Guilmin, P., Zucker, J.D.: The accuracy versus interpretability trade-off in fraud detection model. Data Policy 3, e12 (2021)
Google Scholar
Petersen, B.K., Larma, M.L., Mundhenk, T.N., Santiago, C.P., Kim, S.K., Kim, J.T.: Deep symbolic regression: recovering mathematical expressions from data via risk-seeking policy gradients. In: Proceedings of ICLR (2021)
Google Scholar
Raghavan, P., El Gayar, N.: Fraud detection using machine learning and deep learning. In: 2019 International Conference on Computational Intelligence and Knowledge Economy (ICCIKE), pp. 334–339. IEEE (2019)
Google Scholar
Rudin, C.: Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Mach. Intell. 1(5), 206–215 (2019)
Article Google Scholar
Samek, W., Montavon, G., Vedaldi, A., Hansen, L.K., Müller, K.-R. (eds.): Explainable AI: Interpreting, Explaining and Visualizing Deep Learning. LNCS (LNAI), vol. 11700. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-28954-6
Book Google Scholar
Sipper, M.: Binary and multinomial classification through evolutionary symbolic regression. In: Proceedings of the Genetic and Evolutionary Computation Conference Companion, pp. 300–303 (2022)
Google Scholar
Smits, G.F., Kotanchek, M.: Pareto-front exploitation in symbolic regression. Genetic programming theory and practice II, pp. 283–299. Springer, Cham (2005). https://doi.org/10.1007/0-387-23254-0_17
Sovrano, F., Vitali, F.: An objective metric for explainable AI: how and why to estimate the degree of explainability. Knowl.-Based Syst. 278, 110866 (2023)
Article Google Scholar
Sundarkumar, G.G., Ravi, V., Siddeshwar, V.: One-class support vector machine based undersampling: application to churn prediction and insurance fraud detection. In: 2015 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC), pp. 1–7. IEEE (2015)
Google Scholar
Varshney, K.R., Alemzadeh, H.: On the safety of machine learning: cyber-physical systems, decision sciences, and data products. Big data 5(3), 246–255 (2017)
Article Google Scholar
Vilone, G., Longo, L.: Explainable artificial intelligence: a systematic review (2020)
Google Scholar
Wexler, R.: When a computer program keeps you in jail. NY Times 13 (2017)
Google Scholar
Whitrow, C., Hand, D.J., Juszczak, P., Weston, D., Adams, N.M.: Transaction aggregation as a strategy for credit card fraud detection. Data Min. Knowl. Disc. 18, 30–55 (2009)
Article MathSciNet Google Scholar
Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 8, 5–32 (1992). https://doi.org/10.1007/BF00992696
Article Google Scholar

Download references

Acknowledgements

We kindly thank Wim Tip for sharing his expertise on fraud detection and the anonymous reviewers for their useful suggestions to improve on this work. Floris den Hengst is generously funded by NWO Hybrid Intelligence Project (024.004.022).

Author information

Authors and Affiliations

RiskQuest, Amsterdam, The Netherlands
Samantha Visbeek
ILLC and IvI, University of Amsterdam, Amsterdam, The Netherlands
Erman Acar
Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
Floris den Hengst

Authors

Samantha Visbeek
View author publications
You can also search for this author in PubMed Google Scholar
Erman Acar
View author publications
You can also search for this author in PubMed Google Scholar
Floris den Hengst
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Erman Acar or Floris den Hengst .

Editor information

Editors and Affiliations

Technological University Dublin, Dublin, Ireland
Luca Longo
Fraunhofer Institute for Telecommunications, Berlin, Germany
Sebastian Lapuschkin
University of Marburg, Marburg, Germany
Christin Seifert

Ethics declarations

Disclosure of Interests

The authors have no competing interests to declare that are relevant to the content of this article.

Appendices

A Baseline Model Configuration

The training set was randomly undersampled to achieve a balanced training set. Both the balanced training set and the original training set were used to train the baseline models. Subsequently, these models were tested on an unbalanced test set. The parameters of the baseline models are displayed in Table 4.

Table 4. Parameters of the baseline models

Full size table

B Preprocessing the PaySim Dataset

The following steps were taken into account to preprocess the data set:

Certain transactions in the data set exhibited non-zero amounts, but had corresponding old and new balances of zero. To address this scenario, we introduced the features externalOrig and externalDest for the customer and recipient accounts, respectively (please refer to Table 5 for further details). Following this, we performed imputation of the balances according to the following relationships:
$$\begin{aligned} & \textit{newbalanceDest} = \textit{oldbalanceDest} + \textit{amount}\\ & \textit{oldbalanceOrig} = \textit{newbalanceOrig} + \textit{amount} \end{aligned}$$
Additional features were obtained through aggregation techniques in the data set. Descriptions of these features are given in Table 5.
The features nameOrig, nameDest and isFlaggedFraud were discarded.
The feature type was one-hot encoded.
The data was randomly split into a training, validation, and test set which encompassed 75%, 10% and 15% of the data, respectively.
A standard scaler was fitted on the numerical columns of the training set. Subsequently, the numerical columns of the training, validation, and the test set were scaled using this fitted standard scaler.
For some of the baseline models, an additional balanced training set was generated by randomly undersampling the training data. Specifically, all fraudulent transactions were retained and an equal number of legitimate transactions was randomly selected to match the count of fraudulent instances.

We here briefly describe and motivate some modeling decisions made in the experiments. In all experiments we aim to incorporate aggregation features that encompass all previous transactions of both the customer and the recipient, providing insight into their overall behavior patterns. The PaySim data set represents 30 d of transactions, which results in a major fraction of the account holders to participate in a low number of transactions. As a consequence, aggregation features may not accurately describe the individual’s overall behavior. To address this issue, we assume that subsequent transactions are independent from the current transaction: they primarily reflect the individual’s general behavior and exhibit similar distributions as those observed in previous (yet unseen) months. Therefore, we include future transactions as well in certain aggregation features. Thus, for each transaction, we add characteristics that show the mean and maximum transaction amount over the entire data set of both the customer and recipient. This approach has a risk of data leakage, as earlier transactions may contain information from subsequent time steps through the balance features. However, we argue that future transaction information primarily reflects general user behavior and therefore does not constitute a form of data leakage. To reflect that these features model overall customer behavior and reduce the risk of data leakage even further, we add a Gaussian noise to aggregation features that contain future information.

Table 5. Descriptions of the additional features that were added to the data set

Full size table

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Visbeek, S., Acar, E., den Hengst, F. (2024). Explainable Fraud Detection with Deep Symbolic Classification. In: Longo, L., Lapuschkin, S., Seifert, C. (eds) Explainable Artificial Intelligence. xAI 2024. Communications in Computer and Information Science, vol 2155. Springer, Cham. https://doi.org/10.1007/978-3-031-63800-8_18

Download citation

DOI: https://doi.org/10.1007/978-3-031-63800-8_18
Published: 10 July 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-63799-5
Online ISBN: 978-3-031-63800-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Explainable Fraud Detection with Deep Symbolic Classification