Skip to main content
Log in

An augmented AI-based hybrid fraud detection framework for invoicing platforms

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

In this era of e-commerce, many companies are moving towards subscription-based invoicing platforms to maintain their electronic invoices. Unfortunately, fraudsters are using these platforms for different types of malicious activities. Identifying fraudsters is often challenging for many companies due to the limitation of time and other resources. A fully automated fraud detection model can be useful, but it creates a risk of false-positive identification. This paper proposed a hybrid fraud detection framework when only a small set of labelled (fraud/non-fraud) data is available, and human input is required in the final decision-making step. This framework used a combination of unsupervised and supervised machine learning, red-flag prioritization, and an augmented AI approach containing a human-in-the-loop process. It also proposed a weighted center based on the feature importance scores for the fraud risk cluster and used it in the red-flag prioritization process. Finally, the approach is illustrated using a case study to identify fraudulent users in an invoicing platform. Our hybrid framework showed promising results in identifying fraudulent users and improving human performance when human input is required to make the final decision.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Algorithm 1
Algorithm 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data Availability

Due to the nature of this research, participants of this study did not agree for their data to be shared publicly, so supporting data is not available.

References

  1. Agnisarman S, Lopes S, Madathil KC, Piratla K, Gramopadhye A (2019) A survey of automation-enabled human-in-the-loop systems for infrastructure visual inspection. Autom Constr 97:52–76

    Article  Google Scholar 

  2. Al-Hashedi KG, Magalingam P (2021) Financial fraud detection applying data mining techniques: a comprehensive review from 2009 to 2019. Comput Sci Rev 40:100402

  3. Al-Mohair HK, Saleh JM, Suandi SA (2015) Hybrid human skin detection using neural network and k-means clustering technique. Appl Soft Comput 33:337–347

    Article  Google Scholar 

  4. Amazon (2021) Cyber defence in the age of AI, smart societies and augmented humanity. https://rb.gy/ruvuj5. Accessed 23 Feb 2023

  5. Asatiani A, Apte U, Penttinen E, Rönkkö M, Saarinen T (2019) Impact of accounting process characteristics on accounting outsourcing comparison of users and non-users of cloud-based accounting information systems. Int J Account Inf Syst 34:100419

  6. Asatiani A, Penttinen E (2015) Managing the move to the cloud–analyzing the risks and opportunities of cloud-based accounting information systems. J Inf Technol Teaching Cases 5:27–34

    Article  Google Scholar 

  7. Baader G, Krcmar H (2018) Reducing false positives in fraud detection: combining the red flag approach with process mining. Int J Account Inf Syst 31:1–16

    Article  Google Scholar 

  8. Balayan V, Saleiro P, Belém C, Krippahl L, Bizarro P (2020) Teaching the machine to explain itself using domain knowledge. In: NeurIPS 2020: workshop on human and machine in–the–loop evaluation and learning strategies. NeurIPS

  9. Bao Y, Hilary G, Ke B (2022) Artificial intelligence and fraud detection. Innov Technol Interface Finance Operations I:223–247

  10. Barclays (2022) Invoice fraud: how to protect your organisation from fraudsters. https://rb.gy/ktdncj. Accessed 23 Feb 2023

  11. Best L, Foo E, Tian H (2022) Utilising k–means clustering and naive bayes for iot anomaly detection: a hybrid approach. In: Secure and trusted cyber physical systems. Springer, pp 177–214

  12. Bishop CM et al (1995) Neural networks for pattern recognition. Oxford University Press

    Book  Google Scholar 

  13. Bouman CA, Shapiro M, Cook G, Atkins CB, Cheng H (1997) Cluster: an unsupervised algorithm for modeling gaussian mixtures

  14. Breaban M, Luchian H (2011) A unifying criterion for unsupervised clustering and feature selection. Pattern Recogn 44:854–865

    Article  ADS  Google Scholar 

  15. Breiman L (1998) Rejoinder: arcing classifiers. Ann Stat 26:841–849

    Article  Google Scholar 

  16. Breiman L (2001) Random forests. Mach Learn 45:5–32

    Article  Google Scholar 

  17. Cedillo P, García A, Cárdenas JD, Bermeo A (2018) A systematic literature review of electronic invoicing, platforms and notification systems. In: 2018 international conference on eDemocracy & eGovernment (ICEDEG). IEEE, pp 150–157

  18. Chai C, Cao L, Li G, Li J, Luo Y, Madden S (2020) Human-in the- loop outlier detection. In: Proceedings of the 2020 ACM SIGMOD international conference on management of data, pp 19–33

  19. Chakraborty J, Majumder S, Yu Z, Menzies T (2020) Fairway: a way to build fair ml software. In: Proceedings of the 28th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering, pp 654–665

  20. Chan L, Hogaboam L, Cao R (2022) Artificial intelligence in accounting and auditing. In: Applied artificial intelligence in business. Springer, pp 119–137

  21. Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput Electr Eng 40:16–28

    Article  Google Scholar 

  22. Chawla NV (2009) Data mining for imbalanced datasets: an overview. Data mining and knowledge discovery handbook, pp 875–886

  23. Christauskas C, Miseviciene R (2012) Cloud-computing based accounting for small to medium sized business. Eng Econ 23:14–21

    Article  Google Scholar 

  24. Cranor LF (2008) A framework for reasoning about the human in the loop. In: Proceedings of the 1st conference on usability, psychology, and security, pp 1–15

  25. Cunningham P, Cord M, Delany SJ (2008) Supervised learning. Machine learning techniques for multimedia: case studies on organization and retrieval, pp 21–49

  26. Dejong M (2018) Tax crimes: the fight goes digital. Organisation for economic cooperation and development. OECD Observer 1–3

  27. Ferrara C, Carlucci M, Grigoriadis E, Corona P, Salvati L (2017) A comprehensive insight into the geography of forest cover in Italy: exploring the importance of socioeconomic local contexts. For Policy Econ 75:12–22

    Article  Google Scholar 

  28. Forestier G, Wemmert C (2016) Semi-supervised learning using multiple clusterings with limited labeled data. Inf Sci 361:48–65

    Article  Google Scholar 

  29. Geurts P, Ernst D, Wehenkel L (2006) Extremely randomized trees. Mach Learn 63:3–42

    Article  Google Scholar 

  30. Goutte C, Toft P, Rostrup E, Nielsen FÅ, Hansen LK (1999) On clustering fMRI time series. NeuroImage 9:298–310

    Article  CAS  PubMed  Google Scholar 

  31. GrantThornton (2021) Invoice fraud: how it works and five ways to prevent it. https://rb.gy/hnaedj. Accessed 23 Feb 2023

  32. Guerar M, Merlo A, Migliardi M, Palmieri F, Verderame L (2020) A fraud-resilient blockchain-based solution for invoice financing. IEEE Trans Eng Manag 67:1086–1098

    Article  Google Scholar 

  33. Hady MFA, Schwenker F (2013) Semi–supervised learning. Handbook Neural Inf Process 215–239

  34. Hamelers L (2021) Detecting and explaining potential financial fraud cases in invoice data with machine learning. Master’s thesis, University of Twente

  35. Handl J, Knowles J (2006) Feature subset selection in unsupervised learning via multiobjective optimization. Int J Comput Intell Res 2:217–238

    MathSciNet  Google Scholar 

  36. Hilda GT, Rajalaxmi R (2015) Effective feature selection for supervised learning using genetic algorithm. In: 2015 2nd international conference on electronics and communication systems (ICECS), pp 909–914

  37. Kariyawasam A (2019) Analysing the impact of cloud-based accounting on business performance of smes. Bus Manag Rev 10:37–44

    Google Scholar 

  38. Kearse N (2020) What is supplier invoice fraud and how do you prevent it? Hub. https://rb.gy/6fywno. Accessed 23 Feb 2023

  39. Khayyam H, Jamali A, Bab-Hadiashar A, Esch T, Ramakrishna S, Jalili M, Naebe M (2020) A novel hybrid machine learning algorithm for limited and big data modeling with application in industry 4.0. IEEE Access 8:111381–111393

    Article  Google Scholar 

  40. Kim S, Mai TD, Han S, Park S, Khanh TND, Soh J, Singh K, Cha M (2022) Active learning for human–in–the–loop customs inspection. IEEE Trans Knowl Data Eng 1–1

  41. Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97:273–324

    Article  Google Scholar 

  42. Kramer B (2015) Trust, but verify: fraud in small businesses. J Small Bus Enterp Dev 22:4–20

    Article  Google Scholar 

  43. Kranacher MJ, Riley R (2019) Forensic accounting and fraud examination. John Wiley & Sons

    Google Scholar 

  44. Kruber F, Wurst J, Botsch M (2018) An unsupervised random forest clustering technique for automatic traffic scenario categorization. In: 2018 21st international conference on intelligent transportation systems (ITSC), pp 2811–2818

  45. Kumar P, Murphy A, Werner S, Rougeaux C (2022) The fight against money laundering: machine learning is a game changer. McKinsey & Company. https://rb.gy/mn66cp. Accessed 23 Feb 2023

  46. Li N, Martin A, Estival R (2018) Combination of supervised learning and unsupervised learning based on object association for land cover classification. In: 2018 digital image computing: techniques and applications (DICTA), pp 1–8

  47. Li T, Kou G, Peng Y, Philip SY (2021) An integrated cluster detection, optimization, and interpretation approach for financial data. IEEE Trans Cybern 52:13848–13861

    Article  Google Scholar 

  48. Lui A, Lamb GW (2018) Artificial intelligence and augmented intelligence collaboration: regaining trust and confidence in the financial sector. Inf Commun Technol Law 27:267–283

    Article  Google Scholar 

  49. Ma D, Fisher R, Nesbit T (2021) Cloud-based client accounting and small and medium accounting practices: adoption and impact. Int J Account Inf Syst 41:100513

  50. Maadi M, Akbarzadeh Khorshidi H, Aickelin U (2021) A review on human–ai interaction in machine learning and insights for medical applications. Int J Environ Res Public Health 18:2121

    Article  PubMed  PubMed Central  Google Scholar 

  51. Mahalanobis PC (1936) On the generalised distance in statistics. In: Proceedings of the national institute of science of India, pp 49–55

  52. Manning CD (2008) Introduction to information retrieval. Syngress Publishing

  53. McLachlan GJ, Basford KE (1988) Mixture models: inference and applications to clustering, vol 38. M. Dekker, New York

  54. Pai PF, Hsu MF, Wang MC (2011) A support vector machine-based model for detecting top management fraud. Knowl-Based Syst 24:314–321

  55. Pavía JM, Veres-Ferrer EJ, Foix-Escura G (2012) Credit card incidents and control systems. Int J Inf Manag 32:501–503

    Article  Google Scholar 

  56. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830

    MathSciNet  Google Scholar 

  57. Popivniak Y (2019) Cloud-based accounting software: choice options in the light of modern international tendencies. Balt J Econ Studi 5:170–177

    Article  Google Scholar 

  58. Powers D (2011) Evaluation: from precision, recall and f–measure to roc, informedness, markedness and correlation. J Mach Learn Technol 2:37–63

    Google Scholar 

  59. Raghavan P, El Gayar N (2019) Fraud detection using machine learning and deep learning. In: 2019 international conference on computational intelligence and knowledge economy (ICCIKE), pp 334–339

  60. Reddy S, Dragan A, Levine S (2021) Pragmatic image compression for human-in-the-loop decision-making. Adv Neural Inf Process Syst 34:26499–26510

    Google Scholar 

  61. Samrin R, Vasumathi D (2018) Hybrid weighted k-means clustering and artificial neural network for an anomaly-based network intrusion detection system. J Intell Syst 27:135–147

    Google Scholar 

  62. Sittig DF, Singh H (2013) A red-flag-based approach to risk management of ehr-related safety concerns. J Healthcare Risk Manag 33:21–26

    Article  Google Scholar 

  63. Song L, Smola A, Gretton A, Borgwardt KM, Bedo J (2007) Supervised feature selection via dependence estimation. In: Proceedings of the 24th international conference on machine learning, pp 823–830

  64. Sorantin E, Grasser MG, Hemmelmayr A, Tschauner S, Hrzic F, Weiss V, Lacekova J, Holzinger A (2021) The augmented radiologist: artificial intelligence in the practice of radiology. Pediatr Radiol 1–13

  65. Stamler RT, Marschdorf HJ, Possamai M (2014) Fraud prevention and detection: warning signs and the red flag system. CRC Press

    Book  Google Scholar 

  66. Taylor P, Griffiths N, Hall V, Xu Z, Mouzakitis A (2022) Feature selection for supervised learning and compression. Appl Artif Intell 1–35

  67. U.S. Attorney’s Office (2020) Four individuals charged with \$19 million fraudulent invoicing scheme targeting Amazon’s vendor system. https://rb.gy/dj6xqs. Accessed 23 Feb 2023

  68. Wang J, Biljecki F (2022) Unsupervised machine learning in urban studies: a systematic review of applications. Cities 129:103925

  69. White AH (2017) 6 ways to spot and prevent invoice fraud. https://rb.gy/yccy8p. Accessed 23 Feb 2023

  70. Wu X, Xiao L, Sun Y, Zhang J, Ma T, He L (2022) A survey of human–in–the–loop for machine learning. Future Gen Comput Syst

  71. Xie CH, Chang JY, Liu YJ (2013) Estimating the number of components in gaussian mixture models adaptively for medical image. Optik 124:6216–6221

    Article  ADS  Google Scholar 

  72. Xie R, Mao W, Shi G (2019) Electronic invoice authenticity verifying scheme based on signature recognition. In: Journal of physics: conference series. IOP Publishing, p 032019

  73. Zhang Y, Li M, Wang S, Dai S, Luo L, Zhu E, Xu H, Zhu X, Yao C, Zhou H (2021) Gaussian mixture model clustering with incomplete data. ACM Trans Multimedia Comput Commun Appl (TOMM) 17:1–14

  74. Zheng NN, Liu ZY, Ren PJ, Ma YQ, Chen ST, Yu SY, Xue JR, Chen BD, Wang FY (2017) Hybrid-augmented intelligence: collaboration and cognition. Front Inf Technol Electron Eng 18:153–179

    Article  Google Scholar 

  75. Zhou Q, Zhou H, Li T (2016) Cost-sensitive feature selection using random forest: selecting low-cost subsets of informative features. Knowledgebased Syst 95:1–11

    Google Scholar 

Download references

Funding

We acknowledge support from the Natural Sciences and Engineering Research Council (NSERC) Discovery (Award Number: RGPIN-2020-06792) and Mitacs Accelerate Fellowship Program (Award Number: IT16025) programs for their support of this project.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dewan F. Wahid.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wahid, D.F., Hassini, E. An augmented AI-based hybrid fraud detection framework for invoicing platforms. Appl Intell 54, 1297–1310 (2024). https://doi.org/10.1007/s10489-023-05223-x

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-023-05223-x

Keywords

Navigation