Skip to main content

Explainable Fraud Detection with Deep Symbolic Classification

  • Conference paper
  • First Online:
Explainable Artificial Intelligence (xAI 2024)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 2155))

Included in the following conference series:

  • 729 Accesses

Abstract

There is a growing demand for explainable, transparent, and data-driven models within the domain of fraud detection. Decisions made by the fraud detection model need to be explainable in the event of a customer dispute. Additionally, the decision-making process in the model must be transparent to win the trust of regulators, analysts, and business stakeholders. At the same time, fraud detection solutions can benefit from data due to the noisy and dynamic nature of fraud detection and the availability of large historical data sets. Finally, fraud detection is notorious for its class imbalance: there are typically several orders of magnitude more legitimate transactions than fraudulent ones. In this paper, we present Deep Symbolic Classification (DSC), an extension of the Deep Symbolic Regression framework to classification problems. DSC casts classification as a search problem in the space of all analytic functions composed of a vocabulary of variables, constants, and operations and optimizes for an arbitrary evaluation metric directly. The search is guided by a deep neural network trained with reinforcement learning. Because the functions are mathematical expressions that are in closed-form and concise, the model is inherently explainable both at the level of a single classification decision and at the model’s decision process level. Furthermore, the class imbalance problem is successfully addressed by optimizing for metrics that are robust to class imbalance such as the F1 score. This eliminates the need for problematic oversampling and undersampling techniques that plague traditional approaches. Finally, the model allows to explicitly balance between the prediction accuracy and the explainability. An evaluation on the PaySim data set demonstrates competitive predictive performance with state-of-the-art models, while surpassing them in terms of explainability. This establishes DSC as a promising model for fraud detection systems.

E. Acar and F. den Hengst—Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Source code available at https://github.com/samanthav24/DSC_Fraud_Detection.

  2. 2.

    See https://www.kaggle.com/datasets/ealaxi/paysim1/discussion/ 99799.

  3. 3.

    It is important to acknowledge that the insights are derived from the PaySim data set do not necessarily reflect current fraudulent behavior.

  4. 4.

    https://www.abnamro.com/nl/nieuws/meer-over-financiele-criminaliteit.

References

  1. Alarfaj, F.K., Malik, I., Khan, H.U., Almusallam, N., Ramzan, M., Ahmed, M.: Credit card fraud detection using state-of-the-art machine learning and deep learning algorithms. IEEE Access 10, 39700–39715 (2022)

    Article  Google Scholar 

  2. Alvarez-Melis, D., Jaakkola, T.S.: On the robustness of interpretability methods (June 2018)

    Google Scholar 

  3. Aria, M., Cuccurullo, C., Gnasso, A.: A comparison among interpretative proposals for random forests. Mach. Learn. Appl. 6, 100094 (2021)

    Google Scholar 

  4. Bahnsen, A.C., Aouada, D., Stojanovic, A., Ottersten, B.: Feature engineering strategies for credit card fraud detection. Expert Syst. Appl. 51, 134–142 (2016)

    Article  Google Scholar 

  5. Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp. 785–794. ACM, New York, NY (2016)

    Google Scholar 

  6. Dal Pozzolo, A., Boracchi, G., Caelen, O., Alippi, C., Bontempi, G.: Credit card fraud detection and concept-drift adaptation with delayed supervised information. In: 2015 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE, iscataway, New Jersey (2015)

    Google Scholar 

  7. Diveev, A., Shmalko, E.: Machine Learning Control by Symbolic Regression. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-83213-1

    Book  Google Scholar 

  8. Garreau, D., Luxburg, U.: Explaining the explainer: a first theoretical analysis of lime. In: International Conference on Artificial Intelligence and Statistics, pp. 1287–1296. Springer, Cham, Switzerland (2020)

    Google Scholar 

  9. Goodman, B., Flaxman, S.: European union regulations on algorithmic decision-making and a “right to explanation’’. AI Mag. 38(3), 50–57 (2017)

    Google Scholar 

  10. Hajek, P., Abedin, M.Z., Sivarajah, U.: Fraud detection in mobile payment systems using an XGBoost-based framework. Inf. Syst. Front. 162, 1–19 (2022)

    Google Scholar 

  11. Junger, M., Wang, V., Schlömer, M.: Fraud against businesses both online and offline: crime scripts, business characteristics, efforts, and benefits. Crime Sci. 9(1), 13 (2020)

    Article  Google Scholar 

  12. Kamienny, P.A., d’Ascoli, S., Lample, G., Charton, F.: End-to-end symbolic regression with transformers. Proc. NeurIPS 35, 10269–10281 (2022)

    Google Scholar 

  13. Kim, E., et al.: Champion-challenger analysis for credit card fraud detection: hybrid ensemble and deep learning. Expert Syst. Appl. 128, 214–224 (2019)

    Article  Google Scholar 

  14. Koza, J.R.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge, MA, USA (1992)

    Google Scholar 

  15. Kumar, I.E., Venkatasubramanian, S., Scheidegger, C., Friedler, S.: Problems with shapley-value-based explanations as feature importance measures. In: International Conference on Machine Learning, pp. 5491–5500. PMLR, Vienna, Austria (2020)

    Google Scholar 

  16. La Cava, W., Orzechowski, P., Burlacu, B., de Franca, F.O., Virgolin, M., Jin, Y., Kommenda, M., Moore, J.H.: Contemporary symbolic regression methods and their relative performance. In: Thirty-fifth Conference on Neural Information Processing Systems. PMLR, online (2021)

    Google Scholar 

  17. Landajuela, M., et al.: A unified framework for deep symbolic regression. Proc. NeurIPS 35, 33985–33998 (2022)

    Google Scholar 

  18. Liu, C., Arnon, T., Lazarus, C., Strong, C., Barrett, C., Kochenderfer, M.J., et al.: Algorithms for verifying deep neural networks. Found. Trends® in Optimization 4(3–4), 244–404 (2021)

    Google Scholar 

  19. Lopez-Rojas, E., Elmir, A., Axelsson, S.: PaySim: a financial mobile money simulator for fraud detection. In: 28th European Modeling and Simulation Symposium, EMSS, Larnaca, pp. 249–255 (2016)

    Google Scholar 

  20. Mainali, P., Psychoula, I., Petitcolas, F.A.: ExMo: Explainable AI Model using inverse frequency decision rules. In: International Conference on Human-Computer Interaction, vol. 13336, pp. 179–198. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-05643-7_12

  21. Mundhenk, T.N., Landajuela, M., Glatt, R., Santiago, C.P., Faissol, D.M., Petersen, B.K.: Symbolic regression via neural-guided genetic programming population seeding (2021)

    Google Scholar 

  22. Nesvijevskaia, A., Ouillade, S., Guilmin, P., Zucker, J.D.: The accuracy versus interpretability trade-off in fraud detection model. Data Policy 3, e12 (2021)

    Google Scholar 

  23. Petersen, B.K., Larma, M.L., Mundhenk, T.N., Santiago, C.P., Kim, S.K., Kim, J.T.: Deep symbolic regression: recovering mathematical expressions from data via risk-seeking policy gradients. In: Proceedings of ICLR (2021)

    Google Scholar 

  24. Raghavan, P., El Gayar, N.: Fraud detection using machine learning and deep learning. In: 2019 International Conference on Computational Intelligence and Knowledge Economy (ICCIKE), pp. 334–339. IEEE (2019)

    Google Scholar 

  25. Rudin, C.: Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Mach. Intell. 1(5), 206–215 (2019)

    Article  Google Scholar 

  26. Samek, W., Montavon, G., Vedaldi, A., Hansen, L.K., Müller, K.-R. (eds.): Explainable AI: Interpreting, Explaining and Visualizing Deep Learning. LNCS (LNAI), vol. 11700. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-28954-6

    Book  Google Scholar 

  27. Sipper, M.: Binary and multinomial classification through evolutionary symbolic regression. In: Proceedings of the Genetic and Evolutionary Computation Conference Companion, pp. 300–303 (2022)

    Google Scholar 

  28. Smits, G.F., Kotanchek, M.: Pareto-front exploitation in symbolic regression. Genetic programming theory and practice II, pp. 283–299. Springer, Cham (2005). https://doi.org/10.1007/0-387-23254-0_17

  29. Sovrano, F., Vitali, F.: An objective metric for explainable AI: how and why to estimate the degree of explainability. Knowl.-Based Syst. 278, 110866 (2023)

    Article  Google Scholar 

  30. Sundarkumar, G.G., Ravi, V., Siddeshwar, V.: One-class support vector machine based undersampling: application to churn prediction and insurance fraud detection. In: 2015 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC), pp. 1–7. IEEE (2015)

    Google Scholar 

  31. Varshney, K.R., Alemzadeh, H.: On the safety of machine learning: cyber-physical systems, decision sciences, and data products. Big data 5(3), 246–255 (2017)

    Article  Google Scholar 

  32. Vilone, G., Longo, L.: Explainable artificial intelligence: a systematic review (2020)

    Google Scholar 

  33. Wexler, R.: When a computer program keeps you in jail. NY Times 13 (2017)

    Google Scholar 

  34. Whitrow, C., Hand, D.J., Juszczak, P., Weston, D., Adams, N.M.: Transaction aggregation as a strategy for credit card fraud detection. Data Min. Knowl. Disc. 18, 30–55 (2009)

    Article  MathSciNet  Google Scholar 

  35. Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 8, 5–32 (1992). https://doi.org/10.1007/BF00992696

    Article  Google Scholar 

Download references

Acknowledgements

We kindly thank Wim Tip for sharing his expertise on fraud detection and the anonymous reviewers for their useful suggestions to improve on this work. Floris den Hengst is generously funded by NWO Hybrid Intelligence Project (024.004.022).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Erman Acar or Floris den Hengst .

Editor information

Editors and Affiliations

Ethics declarations

Disclosure of Interests

The authors have no competing interests to declare that are relevant to the content of this article.

Appendices

A Baseline Model Configuration

The training set was randomly undersampled to achieve a balanced training set. Both the balanced training set and the original training set were used to train the baseline models. Subsequently, these models were tested on an unbalanced test set. The parameters of the baseline models are displayed in Table 4.

Table 4. Parameters of the baseline models

B Preprocessing the PaySim Dataset

The following steps were taken into account to preprocess the data set:

  • Certain transactions in the data set exhibited non-zero amounts, but had corresponding old and new balances of zero. To address this scenario, we introduced the features externalOrig and externalDest for the customer and recipient accounts, respectively (please refer to Table 5 for further details). Following this, we performed imputation of the balances according to the following relationships:

    $$\begin{aligned} & \textit{newbalanceDest} = \textit{oldbalanceDest} + \textit{amount}\\ & \textit{oldbalanceOrig} = \textit{newbalanceOrig} + \textit{amount} \end{aligned}$$
  • Additional features were obtained through aggregation techniques in the data set. Descriptions of these features are given in Table 5.

  • The features nameOrig, nameDest and isFlaggedFraud were discarded.

  • The feature type was one-hot encoded.

  • The data was randomly split into a training, validation, and test set which encompassed 75%, 10% and 15% of the data, respectively.

  • A standard scaler was fitted on the numerical columns of the training set. Subsequently, the numerical columns of the training, validation, and the test set were scaled using this fitted standard scaler.

  • For some of the baseline models, an additional balanced training set was generated by randomly undersampling the training data. Specifically, all fraudulent transactions were retained and an equal number of legitimate transactions was randomly selected to match the count of fraudulent instances.

We here briefly describe and motivate some modeling decisions made in the experiments. In all experiments we aim to incorporate aggregation features that encompass all previous transactions of both the customer and the recipient, providing insight into their overall behavior patterns. The PaySim data set represents 30 d of transactions, which results in a major fraction of the account holders to participate in a low number of transactions. As a consequence, aggregation features may not accurately describe the individual’s overall behavior. To address this issue, we assume that subsequent transactions are independent from the current transaction: they primarily reflect the individual’s general behavior and exhibit similar distributions as those observed in previous (yet unseen) months. Therefore, we include future transactions as well in certain aggregation features. Thus, for each transaction, we add characteristics that show the mean and maximum transaction amount over the entire data set of both the customer and recipient. This approach has a risk of data leakage, as earlier transactions may contain information from subsequent time steps through the balance features. However, we argue that future transaction information primarily reflects general user behavior and therefore does not constitute a form of data leakage. To reflect that these features model overall customer behavior and reduce the risk of data leakage even further, we add a Gaussian noise to aggregation features that contain future information.

Table 5. Descriptions of the additional features that were added to the data set

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Visbeek, S., Acar, E., den Hengst, F. (2024). Explainable Fraud Detection with Deep Symbolic Classification. In: Longo, L., Lapuschkin, S., Seifert, C. (eds) Explainable Artificial Intelligence. xAI 2024. Communications in Computer and Information Science, vol 2155. Springer, Cham. https://doi.org/10.1007/978-3-031-63800-8_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-63800-8_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-63799-5

  • Online ISBN: 978-3-031-63800-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics