Abstract
There is a growing demand for explainable, transparent, and data-driven models within the domain of fraud detection. Decisions made by the fraud detection model need to be explainable in the event of a customer dispute. Additionally, the decision-making process in the model must be transparent to win the trust of regulators, analysts, and business stakeholders. At the same time, fraud detection solutions can benefit from data due to the noisy and dynamic nature of fraud detection and the availability of large historical data sets. Finally, fraud detection is notorious for its class imbalance: there are typically several orders of magnitude more legitimate transactions than fraudulent ones. In this paper, we present Deep Symbolic Classification (DSC), an extension of the Deep Symbolic Regression framework to classification problems. DSC casts classification as a search problem in the space of all analytic functions composed of a vocabulary of variables, constants, and operations and optimizes for an arbitrary evaluation metric directly. The search is guided by a deep neural network trained with reinforcement learning. Because the functions are mathematical expressions that are in closed-form and concise, the model is inherently explainable both at the level of a single classification decision and at the model’s decision process level. Furthermore, the class imbalance problem is successfully addressed by optimizing for metrics that are robust to class imbalance such as the F1 score. This eliminates the need for problematic oversampling and undersampling techniques that plague traditional approaches. Finally, the model allows to explicitly balance between the prediction accuracy and the explainability. An evaluation on the PaySim data set demonstrates competitive predictive performance with state-of-the-art models, while surpassing them in terms of explainability. This establishes DSC as a promising model for fraud detection systems.
E. Acar and F. den Hengst—Equal contribution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Source code available at https://github.com/samanthav24/DSC_Fraud_Detection.
- 2.
- 3.
It is important to acknowledge that the insights are derived from the PaySim data set do not necessarily reflect current fraudulent behavior.
- 4.
References
Alarfaj, F.K., Malik, I., Khan, H.U., Almusallam, N., Ramzan, M., Ahmed, M.: Credit card fraud detection using state-of-the-art machine learning and deep learning algorithms. IEEE Access 10, 39700–39715 (2022)
Alvarez-Melis, D., Jaakkola, T.S.: On the robustness of interpretability methods (June 2018)
Aria, M., Cuccurullo, C., Gnasso, A.: A comparison among interpretative proposals for random forests. Mach. Learn. Appl. 6, 100094 (2021)
Bahnsen, A.C., Aouada, D., Stojanovic, A., Ottersten, B.: Feature engineering strategies for credit card fraud detection. Expert Syst. Appl. 51, 134–142 (2016)
Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp. 785–794. ACM, New York, NY (2016)
Dal Pozzolo, A., Boracchi, G., Caelen, O., Alippi, C., Bontempi, G.: Credit card fraud detection and concept-drift adaptation with delayed supervised information. In: 2015 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE, iscataway, New Jersey (2015)
Diveev, A., Shmalko, E.: Machine Learning Control by Symbolic Regression. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-83213-1
Garreau, D., Luxburg, U.: Explaining the explainer: a first theoretical analysis of lime. In: International Conference on Artificial Intelligence and Statistics, pp. 1287–1296. Springer, Cham, Switzerland (2020)
Goodman, B., Flaxman, S.: European union regulations on algorithmic decision-making and a “right to explanation’’. AI Mag. 38(3), 50–57 (2017)
Hajek, P., Abedin, M.Z., Sivarajah, U.: Fraud detection in mobile payment systems using an XGBoost-based framework. Inf. Syst. Front. 162, 1–19 (2022)
Junger, M., Wang, V., Schlömer, M.: Fraud against businesses both online and offline: crime scripts, business characteristics, efforts, and benefits. Crime Sci. 9(1), 13 (2020)
Kamienny, P.A., d’Ascoli, S., Lample, G., Charton, F.: End-to-end symbolic regression with transformers. Proc. NeurIPS 35, 10269–10281 (2022)
Kim, E., et al.: Champion-challenger analysis for credit card fraud detection: hybrid ensemble and deep learning. Expert Syst. Appl. 128, 214–224 (2019)
Koza, J.R.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge, MA, USA (1992)
Kumar, I.E., Venkatasubramanian, S., Scheidegger, C., Friedler, S.: Problems with shapley-value-based explanations as feature importance measures. In: International Conference on Machine Learning, pp. 5491–5500. PMLR, Vienna, Austria (2020)
La Cava, W., Orzechowski, P., Burlacu, B., de Franca, F.O., Virgolin, M., Jin, Y., Kommenda, M., Moore, J.H.: Contemporary symbolic regression methods and their relative performance. In: Thirty-fifth Conference on Neural Information Processing Systems. PMLR, online (2021)
Landajuela, M., et al.: A unified framework for deep symbolic regression. Proc. NeurIPS 35, 33985–33998 (2022)
Liu, C., Arnon, T., Lazarus, C., Strong, C., Barrett, C., Kochenderfer, M.J., et al.: Algorithms for verifying deep neural networks. Found. Trends® in Optimization 4(3–4), 244–404 (2021)
Lopez-Rojas, E., Elmir, A., Axelsson, S.: PaySim: a financial mobile money simulator for fraud detection. In: 28th European Modeling and Simulation Symposium, EMSS, Larnaca, pp. 249–255 (2016)
Mainali, P., Psychoula, I., Petitcolas, F.A.: ExMo: Explainable AI Model using inverse frequency decision rules. In: International Conference on Human-Computer Interaction, vol. 13336, pp. 179–198. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-05643-7_12
Mundhenk, T.N., Landajuela, M., Glatt, R., Santiago, C.P., Faissol, D.M., Petersen, B.K.: Symbolic regression via neural-guided genetic programming population seeding (2021)
Nesvijevskaia, A., Ouillade, S., Guilmin, P., Zucker, J.D.: The accuracy versus interpretability trade-off in fraud detection model. Data Policy 3, e12 (2021)
Petersen, B.K., Larma, M.L., Mundhenk, T.N., Santiago, C.P., Kim, S.K., Kim, J.T.: Deep symbolic regression: recovering mathematical expressions from data via risk-seeking policy gradients. In: Proceedings of ICLR (2021)
Raghavan, P., El Gayar, N.: Fraud detection using machine learning and deep learning. In: 2019 International Conference on Computational Intelligence and Knowledge Economy (ICCIKE), pp. 334–339. IEEE (2019)
Rudin, C.: Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Mach. Intell. 1(5), 206–215 (2019)
Samek, W., Montavon, G., Vedaldi, A., Hansen, L.K., Müller, K.-R. (eds.): Explainable AI: Interpreting, Explaining and Visualizing Deep Learning. LNCS (LNAI), vol. 11700. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-28954-6
Sipper, M.: Binary and multinomial classification through evolutionary symbolic regression. In: Proceedings of the Genetic and Evolutionary Computation Conference Companion, pp. 300–303 (2022)
Smits, G.F., Kotanchek, M.: Pareto-front exploitation in symbolic regression. Genetic programming theory and practice II, pp. 283–299. Springer, Cham (2005). https://doi.org/10.1007/0-387-23254-0_17
Sovrano, F., Vitali, F.: An objective metric for explainable AI: how and why to estimate the degree of explainability. Knowl.-Based Syst. 278, 110866 (2023)
Sundarkumar, G.G., Ravi, V., Siddeshwar, V.: One-class support vector machine based undersampling: application to churn prediction and insurance fraud detection. In: 2015 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC), pp. 1–7. IEEE (2015)
Varshney, K.R., Alemzadeh, H.: On the safety of machine learning: cyber-physical systems, decision sciences, and data products. Big data 5(3), 246–255 (2017)
Vilone, G., Longo, L.: Explainable artificial intelligence: a systematic review (2020)
Wexler, R.: When a computer program keeps you in jail. NY Times 13 (2017)
Whitrow, C., Hand, D.J., Juszczak, P., Weston, D., Adams, N.M.: Transaction aggregation as a strategy for credit card fraud detection. Data Min. Knowl. Disc. 18, 30–55 (2009)
Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 8, 5–32 (1992). https://doi.org/10.1007/BF00992696
Acknowledgements
We kindly thank Wim Tip for sharing his expertise on fraud detection and the anonymous reviewers for their useful suggestions to improve on this work. Floris den Hengst is generously funded by NWO Hybrid Intelligence Project (024.004.022).
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Ethics declarations
Disclosure of Interests
The authors have no competing interests to declare that are relevant to the content of this article.
Appendices
A Baseline Model Configuration
The training set was randomly undersampled to achieve a balanced training set. Both the balanced training set and the original training set were used to train the baseline models. Subsequently, these models were tested on an unbalanced test set. The parameters of the baseline models are displayed in Table 4.
B Preprocessing the PaySim Dataset
The following steps were taken into account to preprocess the data set:
-
Certain transactions in the data set exhibited non-zero amounts, but had corresponding old and new balances of zero. To address this scenario, we introduced the features externalOrig and externalDest for the customer and recipient accounts, respectively (please refer to Table 5 for further details). Following this, we performed imputation of the balances according to the following relationships:
$$\begin{aligned} & \textit{newbalanceDest} = \textit{oldbalanceDest} + \textit{amount}\\ & \textit{oldbalanceOrig} = \textit{newbalanceOrig} + \textit{amount} \end{aligned}$$ -
Additional features were obtained through aggregation techniques in the data set. Descriptions of these features are given in Table 5.
-
The features nameOrig, nameDest and isFlaggedFraud were discarded.
-
The feature type was one-hot encoded.
-
The data was randomly split into a training, validation, and test set which encompassed 75%, 10% and 15% of the data, respectively.
-
A standard scaler was fitted on the numerical columns of the training set. Subsequently, the numerical columns of the training, validation, and the test set were scaled using this fitted standard scaler.
-
For some of the baseline models, an additional balanced training set was generated by randomly undersampling the training data. Specifically, all fraudulent transactions were retained and an equal number of legitimate transactions was randomly selected to match the count of fraudulent instances.
We here briefly describe and motivate some modeling decisions made in the experiments. In all experiments we aim to incorporate aggregation features that encompass all previous transactions of both the customer and the recipient, providing insight into their overall behavior patterns. The PaySim data set represents 30 d of transactions, which results in a major fraction of the account holders to participate in a low number of transactions. As a consequence, aggregation features may not accurately describe the individual’s overall behavior. To address this issue, we assume that subsequent transactions are independent from the current transaction: they primarily reflect the individual’s general behavior and exhibit similar distributions as those observed in previous (yet unseen) months. Therefore, we include future transactions as well in certain aggregation features. Thus, for each transaction, we add characteristics that show the mean and maximum transaction amount over the entire data set of both the customer and recipient. This approach has a risk of data leakage, as earlier transactions may contain information from subsequent time steps through the balance features. However, we argue that future transaction information primarily reflects general user behavior and therefore does not constitute a form of data leakage. To reflect that these features model overall customer behavior and reduce the risk of data leakage even further, we add a Gaussian noise to aggregation features that contain future information.
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Visbeek, S., Acar, E., den Hengst, F. (2024). Explainable Fraud Detection with Deep Symbolic Classification. In: Longo, L., Lapuschkin, S., Seifert, C. (eds) Explainable Artificial Intelligence. xAI 2024. Communications in Computer and Information Science, vol 2155. Springer, Cham. https://doi.org/10.1007/978-3-031-63800-8_18
Download citation
DOI: https://doi.org/10.1007/978-3-031-63800-8_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-63799-5
Online ISBN: 978-3-031-63800-8
eBook Packages: Computer ScienceComputer Science (R0)