Abstract
In this era of e-commerce, many companies are moving towards subscription-based invoicing platforms to maintain their electronic invoices. Unfortunately, fraudsters are using these platforms for different types of malicious activities. Identifying fraudsters is often challenging for many companies due to the limitation of time and other resources. A fully automated fraud detection model can be useful, but it creates a risk of false-positive identification. This paper proposed a hybrid fraud detection framework when only a small set of labelled (fraud/non-fraud) data is available, and human input is required in the final decision-making step. This framework used a combination of unsupervised and supervised machine learning, red-flag prioritization, and an augmented AI approach containing a human-in-the-loop process. It also proposed a weighted center based on the feature importance scores for the fraud risk cluster and used it in the red-flag prioritization process. Finally, the approach is illustrated using a case study to identify fraudulent users in an invoicing platform. Our hybrid framework showed promising results in identifying fraudulent users and improving human performance when human input is required to make the final decision.
Similar content being viewed by others
Data Availability
Due to the nature of this research, participants of this study did not agree for their data to be shared publicly, so supporting data is not available.
References
Agnisarman S, Lopes S, Madathil KC, Piratla K, Gramopadhye A (2019) A survey of automation-enabled human-in-the-loop systems for infrastructure visual inspection. Autom Constr 97:52–76
Al-Hashedi KG, Magalingam P (2021) Financial fraud detection applying data mining techniques: a comprehensive review from 2009 to 2019. Comput Sci Rev 40:100402
Al-Mohair HK, Saleh JM, Suandi SA (2015) Hybrid human skin detection using neural network and k-means clustering technique. Appl Soft Comput 33:337–347
Amazon (2021) Cyber defence in the age of AI, smart societies and augmented humanity. https://rb.gy/ruvuj5. Accessed 23 Feb 2023
Asatiani A, Apte U, Penttinen E, Rönkkö M, Saarinen T (2019) Impact of accounting process characteristics on accounting outsourcing comparison of users and non-users of cloud-based accounting information systems. Int J Account Inf Syst 34:100419
Asatiani A, Penttinen E (2015) Managing the move to the cloud–analyzing the risks and opportunities of cloud-based accounting information systems. J Inf Technol Teaching Cases 5:27–34
Baader G, Krcmar H (2018) Reducing false positives in fraud detection: combining the red flag approach with process mining. Int J Account Inf Syst 31:1–16
Balayan V, Saleiro P, Belém C, Krippahl L, Bizarro P (2020) Teaching the machine to explain itself using domain knowledge. In: NeurIPS 2020: workshop on human and machine in–the–loop evaluation and learning strategies. NeurIPS
Bao Y, Hilary G, Ke B (2022) Artificial intelligence and fraud detection. Innov Technol Interface Finance Operations I:223–247
Barclays (2022) Invoice fraud: how to protect your organisation from fraudsters. https://rb.gy/ktdncj. Accessed 23 Feb 2023
Best L, Foo E, Tian H (2022) Utilising k–means clustering and naive bayes for iot anomaly detection: a hybrid approach. In: Secure and trusted cyber physical systems. Springer, pp 177–214
Bishop CM et al (1995) Neural networks for pattern recognition. Oxford University Press
Bouman CA, Shapiro M, Cook G, Atkins CB, Cheng H (1997) Cluster: an unsupervised algorithm for modeling gaussian mixtures
Breaban M, Luchian H (2011) A unifying criterion for unsupervised clustering and feature selection. Pattern Recogn 44:854–865
Breiman L (1998) Rejoinder: arcing classifiers. Ann Stat 26:841–849
Breiman L (2001) Random forests. Mach Learn 45:5–32
Cedillo P, García A, Cárdenas JD, Bermeo A (2018) A systematic literature review of electronic invoicing, platforms and notification systems. In: 2018 international conference on eDemocracy & eGovernment (ICEDEG). IEEE, pp 150–157
Chai C, Cao L, Li G, Li J, Luo Y, Madden S (2020) Human-in the- loop outlier detection. In: Proceedings of the 2020 ACM SIGMOD international conference on management of data, pp 19–33
Chakraborty J, Majumder S, Yu Z, Menzies T (2020) Fairway: a way to build fair ml software. In: Proceedings of the 28th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering, pp 654–665
Chan L, Hogaboam L, Cao R (2022) Artificial intelligence in accounting and auditing. In: Applied artificial intelligence in business. Springer, pp 119–137
Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput Electr Eng 40:16–28
Chawla NV (2009) Data mining for imbalanced datasets: an overview. Data mining and knowledge discovery handbook, pp 875–886
Christauskas C, Miseviciene R (2012) Cloud-computing based accounting for small to medium sized business. Eng Econ 23:14–21
Cranor LF (2008) A framework for reasoning about the human in the loop. In: Proceedings of the 1st conference on usability, psychology, and security, pp 1–15
Cunningham P, Cord M, Delany SJ (2008) Supervised learning. Machine learning techniques for multimedia: case studies on organization and retrieval, pp 21–49
Dejong M (2018) Tax crimes: the fight goes digital. Organisation for economic cooperation and development. OECD Observer 1–3
Ferrara C, Carlucci M, Grigoriadis E, Corona P, Salvati L (2017) A comprehensive insight into the geography of forest cover in Italy: exploring the importance of socioeconomic local contexts. For Policy Econ 75:12–22
Forestier G, Wemmert C (2016) Semi-supervised learning using multiple clusterings with limited labeled data. Inf Sci 361:48–65
Geurts P, Ernst D, Wehenkel L (2006) Extremely randomized trees. Mach Learn 63:3–42
Goutte C, Toft P, Rostrup E, Nielsen FÅ, Hansen LK (1999) On clustering fMRI time series. NeuroImage 9:298–310
GrantThornton (2021) Invoice fraud: how it works and five ways to prevent it. https://rb.gy/hnaedj. Accessed 23 Feb 2023
Guerar M, Merlo A, Migliardi M, Palmieri F, Verderame L (2020) A fraud-resilient blockchain-based solution for invoice financing. IEEE Trans Eng Manag 67:1086–1098
Hady MFA, Schwenker F (2013) Semi–supervised learning. Handbook Neural Inf Process 215–239
Hamelers L (2021) Detecting and explaining potential financial fraud cases in invoice data with machine learning. Master’s thesis, University of Twente
Handl J, Knowles J (2006) Feature subset selection in unsupervised learning via multiobjective optimization. Int J Comput Intell Res 2:217–238
Hilda GT, Rajalaxmi R (2015) Effective feature selection for supervised learning using genetic algorithm. In: 2015 2nd international conference on electronics and communication systems (ICECS), pp 909–914
Kariyawasam A (2019) Analysing the impact of cloud-based accounting on business performance of smes. Bus Manag Rev 10:37–44
Kearse N (2020) What is supplier invoice fraud and how do you prevent it? Hub. https://rb.gy/6fywno. Accessed 23 Feb 2023
Khayyam H, Jamali A, Bab-Hadiashar A, Esch T, Ramakrishna S, Jalili M, Naebe M (2020) A novel hybrid machine learning algorithm for limited and big data modeling with application in industry 4.0. IEEE Access 8:111381–111393
Kim S, Mai TD, Han S, Park S, Khanh TND, Soh J, Singh K, Cha M (2022) Active learning for human–in–the–loop customs inspection. IEEE Trans Knowl Data Eng 1–1
Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97:273–324
Kramer B (2015) Trust, but verify: fraud in small businesses. J Small Bus Enterp Dev 22:4–20
Kranacher MJ, Riley R (2019) Forensic accounting and fraud examination. John Wiley & Sons
Kruber F, Wurst J, Botsch M (2018) An unsupervised random forest clustering technique for automatic traffic scenario categorization. In: 2018 21st international conference on intelligent transportation systems (ITSC), pp 2811–2818
Kumar P, Murphy A, Werner S, Rougeaux C (2022) The fight against money laundering: machine learning is a game changer. McKinsey & Company. https://rb.gy/mn66cp. Accessed 23 Feb 2023
Li N, Martin A, Estival R (2018) Combination of supervised learning and unsupervised learning based on object association for land cover classification. In: 2018 digital image computing: techniques and applications (DICTA), pp 1–8
Li T, Kou G, Peng Y, Philip SY (2021) An integrated cluster detection, optimization, and interpretation approach for financial data. IEEE Trans Cybern 52:13848–13861
Lui A, Lamb GW (2018) Artificial intelligence and augmented intelligence collaboration: regaining trust and confidence in the financial sector. Inf Commun Technol Law 27:267–283
Ma D, Fisher R, Nesbit T (2021) Cloud-based client accounting and small and medium accounting practices: adoption and impact. Int J Account Inf Syst 41:100513
Maadi M, Akbarzadeh Khorshidi H, Aickelin U (2021) A review on human–ai interaction in machine learning and insights for medical applications. Int J Environ Res Public Health 18:2121
Mahalanobis PC (1936) On the generalised distance in statistics. In: Proceedings of the national institute of science of India, pp 49–55
Manning CD (2008) Introduction to information retrieval. Syngress Publishing
McLachlan GJ, Basford KE (1988) Mixture models: inference and applications to clustering, vol 38. M. Dekker, New York
Pai PF, Hsu MF, Wang MC (2011) A support vector machine-based model for detecting top management fraud. Knowl-Based Syst 24:314–321
Pavía JM, Veres-Ferrer EJ, Foix-Escura G (2012) Credit card incidents and control systems. Int J Inf Manag 32:501–503
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830
Popivniak Y (2019) Cloud-based accounting software: choice options in the light of modern international tendencies. Balt J Econ Studi 5:170–177
Powers D (2011) Evaluation: from precision, recall and f–measure to roc, informedness, markedness and correlation. J Mach Learn Technol 2:37–63
Raghavan P, El Gayar N (2019) Fraud detection using machine learning and deep learning. In: 2019 international conference on computational intelligence and knowledge economy (ICCIKE), pp 334–339
Reddy S, Dragan A, Levine S (2021) Pragmatic image compression for human-in-the-loop decision-making. Adv Neural Inf Process Syst 34:26499–26510
Samrin R, Vasumathi D (2018) Hybrid weighted k-means clustering and artificial neural network for an anomaly-based network intrusion detection system. J Intell Syst 27:135–147
Sittig DF, Singh H (2013) A red-flag-based approach to risk management of ehr-related safety concerns. J Healthcare Risk Manag 33:21–26
Song L, Smola A, Gretton A, Borgwardt KM, Bedo J (2007) Supervised feature selection via dependence estimation. In: Proceedings of the 24th international conference on machine learning, pp 823–830
Sorantin E, Grasser MG, Hemmelmayr A, Tschauner S, Hrzic F, Weiss V, Lacekova J, Holzinger A (2021) The augmented radiologist: artificial intelligence in the practice of radiology. Pediatr Radiol 1–13
Stamler RT, Marschdorf HJ, Possamai M (2014) Fraud prevention and detection: warning signs and the red flag system. CRC Press
Taylor P, Griffiths N, Hall V, Xu Z, Mouzakitis A (2022) Feature selection for supervised learning and compression. Appl Artif Intell 1–35
U.S. Attorney’s Office (2020) Four individuals charged with \$19 million fraudulent invoicing scheme targeting Amazon’s vendor system. https://rb.gy/dj6xqs. Accessed 23 Feb 2023
Wang J, Biljecki F (2022) Unsupervised machine learning in urban studies: a systematic review of applications. Cities 129:103925
White AH (2017) 6 ways to spot and prevent invoice fraud. https://rb.gy/yccy8p. Accessed 23 Feb 2023
Wu X, Xiao L, Sun Y, Zhang J, Ma T, He L (2022) A survey of human–in–the–loop for machine learning. Future Gen Comput Syst
Xie CH, Chang JY, Liu YJ (2013) Estimating the number of components in gaussian mixture models adaptively for medical image. Optik 124:6216–6221
Xie R, Mao W, Shi G (2019) Electronic invoice authenticity verifying scheme based on signature recognition. In: Journal of physics: conference series. IOP Publishing, p 032019
Zhang Y, Li M, Wang S, Dai S, Luo L, Zhu E, Xu H, Zhu X, Yao C, Zhou H (2021) Gaussian mixture model clustering with incomplete data. ACM Trans Multimedia Comput Commun Appl (TOMM) 17:1–14
Zheng NN, Liu ZY, Ren PJ, Ma YQ, Chen ST, Yu SY, Xue JR, Chen BD, Wang FY (2017) Hybrid-augmented intelligence: collaboration and cognition. Front Inf Technol Electron Eng 18:153–179
Zhou Q, Zhou H, Li T (2016) Cost-sensitive feature selection using random forest: selecting low-cost subsets of informative features. Knowledgebased Syst 95:1–11
Funding
We acknowledge support from the Natural Sciences and Engineering Research Council (NSERC) Discovery (Award Number: RGPIN-2020-06792) and Mitacs Accelerate Fellowship Program (Award Number: IT16025) programs for their support of this project.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wahid, D.F., Hassini, E. An augmented AI-based hybrid fraud detection framework for invoicing platforms. Appl Intell 54, 1297–1310 (2024). https://doi.org/10.1007/s10489-023-05223-x
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-023-05223-x