An augmented AI-based hybrid fraud detection framework for invoicing platforms

Wahid, Dewan F.; Hassini, Elkafi

doi:10.1007/s10489-023-05223-x

An augmented AI-based hybrid fraud detection framework for invoicing platforms

Published: 04 January 2024

Volume 54, pages 1297–1310, (2024)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

345 Accesses
1 Altmetric
Explore all metrics

Abstract

In this era of e-commerce, many companies are moving towards subscription-based invoicing platforms to maintain their electronic invoices. Unfortunately, fraudsters are using these platforms for different types of malicious activities. Identifying fraudsters is often challenging for many companies due to the limitation of time and other resources. A fully automated fraud detection model can be useful, but it creates a risk of false-positive identification. This paper proposed a hybrid fraud detection framework when only a small set of labelled (fraud/non-fraud) data is available, and human input is required in the final decision-making step. This framework used a combination of unsupervised and supervised machine learning, red-flag prioritization, and an augmented AI approach containing a human-in-the-loop process. It also proposed a weighted center based on the feature importance scores for the fraud risk cluster and used it in the red-flag prioritization process. Finally, the approach is illustrated using a case study to identify fraudulent users in an invoicing platform. Our hybrid framework showed promising results in identifying fraudulent users and improving human performance when human input is required to make the final decision.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The role of artificial intelligence in healthcare: a structured literature review

Article Open access 10 April 2021

Identifying the most accurate machine learning classification technique to detect network threats

Article Open access 05 March 2024

Artificial Intelligence and Fraud Detection

Data Availability

Due to the nature of this research, participants of this study did not agree for their data to be shared publicly, so supporting data is not available.

References

Agnisarman S, Lopes S, Madathil KC, Piratla K, Gramopadhye A (2019) A survey of automation-enabled human-in-the-loop systems for infrastructure visual inspection. Autom Constr 97:52–76
Article Google Scholar
Al-Hashedi KG, Magalingam P (2021) Financial fraud detection applying data mining techniques: a comprehensive review from 2009 to 2019. Comput Sci Rev 40:100402
Al-Mohair HK, Saleh JM, Suandi SA (2015) Hybrid human skin detection using neural network and k-means clustering technique. Appl Soft Comput 33:337–347
Article Google Scholar
Amazon (2021) Cyber defence in the age of AI, smart societies and augmented humanity. https://rb.gy/ruvuj5. Accessed 23 Feb 2023
Asatiani A, Apte U, Penttinen E, Rönkkö M, Saarinen T (2019) Impact of accounting process characteristics on accounting outsourcing comparison of users and non-users of cloud-based accounting information systems. Int J Account Inf Syst 34:100419
Asatiani A, Penttinen E (2015) Managing the move to the cloud–analyzing the risks and opportunities of cloud-based accounting information systems. J Inf Technol Teaching Cases 5:27–34
Article Google Scholar
Baader G, Krcmar H (2018) Reducing false positives in fraud detection: combining the red flag approach with process mining. Int J Account Inf Syst 31:1–16
Article Google Scholar
Balayan V, Saleiro P, Belém C, Krippahl L, Bizarro P (2020) Teaching the machine to explain itself using domain knowledge. In: NeurIPS 2020: workshop on human and machine in–the–loop evaluation and learning strategies. NeurIPS
Bao Y, Hilary G, Ke B (2022) Artificial intelligence and fraud detection. Innov Technol Interface Finance Operations I:223–247
Barclays (2022) Invoice fraud: how to protect your organisation from fraudsters. https://rb.gy/ktdncj. Accessed 23 Feb 2023
Best L, Foo E, Tian H (2022) Utilising k–means clustering and naive bayes for iot anomaly detection: a hybrid approach. In: Secure and trusted cyber physical systems. Springer, pp 177–214
Bishop CM et al (1995) Neural networks for pattern recognition. Oxford University Press
Book Google Scholar
Bouman CA, Shapiro M, Cook G, Atkins CB, Cheng H (1997) Cluster: an unsupervised algorithm for modeling gaussian mixtures
Breaban M, Luchian H (2011) A unifying criterion for unsupervised clustering and feature selection. Pattern Recogn 44:854–865
Article ADS Google Scholar
Breiman L (1998) Rejoinder: arcing classifiers. Ann Stat 26:841–849
Article Google Scholar
Breiman L (2001) Random forests. Mach Learn 45:5–32
Article Google Scholar
Cedillo P, García A, Cárdenas JD, Bermeo A (2018) A systematic literature review of electronic invoicing, platforms and notification systems. In: 2018 international conference on eDemocracy & eGovernment (ICEDEG). IEEE, pp 150–157
Chai C, Cao L, Li G, Li J, Luo Y, Madden S (2020) Human-in the- loop outlier detection. In: Proceedings of the 2020 ACM SIGMOD international conference on management of data, pp 19–33
Chakraborty J, Majumder S, Yu Z, Menzies T (2020) Fairway: a way to build fair ml software. In: Proceedings of the 28th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering, pp 654–665
Chan L, Hogaboam L, Cao R (2022) Artificial intelligence in accounting and auditing. In: Applied artificial intelligence in business. Springer, pp 119–137
Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput Electr Eng 40:16–28
Article Google Scholar
Chawla NV (2009) Data mining for imbalanced datasets: an overview. Data mining and knowledge discovery handbook, pp 875–886
Christauskas C, Miseviciene R (2012) Cloud-computing based accounting for small to medium sized business. Eng Econ 23:14–21
Article Google Scholar
Cranor LF (2008) A framework for reasoning about the human in the loop. In: Proceedings of the 1st conference on usability, psychology, and security, pp 1–15
Cunningham P, Cord M, Delany SJ (2008) Supervised learning. Machine learning techniques for multimedia: case studies on organization and retrieval, pp 21–49
Dejong M (2018) Tax crimes: the fight goes digital. Organisation for economic cooperation and development. OECD Observer 1–3
Ferrara C, Carlucci M, Grigoriadis E, Corona P, Salvati L (2017) A comprehensive insight into the geography of forest cover in Italy: exploring the importance of socioeconomic local contexts. For Policy Econ 75:12–22
Article Google Scholar
Forestier G, Wemmert C (2016) Semi-supervised learning using multiple clusterings with limited labeled data. Inf Sci 361:48–65
Article Google Scholar
Geurts P, Ernst D, Wehenkel L (2006) Extremely randomized trees. Mach Learn 63:3–42
Article Google Scholar
Goutte C, Toft P, Rostrup E, Nielsen FÅ, Hansen LK (1999) On clustering fMRI time series. NeuroImage 9:298–310
Article CAS PubMed Google Scholar
GrantThornton (2021) Invoice fraud: how it works and five ways to prevent it. https://rb.gy/hnaedj. Accessed 23 Feb 2023
Guerar M, Merlo A, Migliardi M, Palmieri F, Verderame L (2020) A fraud-resilient blockchain-based solution for invoice financing. IEEE Trans Eng Manag 67:1086–1098
Article Google Scholar
Hady MFA, Schwenker F (2013) Semi–supervised learning. Handbook Neural Inf Process 215–239
Hamelers L (2021) Detecting and explaining potential financial fraud cases in invoice data with machine learning. Master’s thesis, University of Twente
Handl J, Knowles J (2006) Feature subset selection in unsupervised learning via multiobjective optimization. Int J Comput Intell Res 2:217–238
MathSciNet Google Scholar
Hilda GT, Rajalaxmi R (2015) Effective feature selection for supervised learning using genetic algorithm. In: 2015 2nd international conference on electronics and communication systems (ICECS), pp 909–914
Kariyawasam A (2019) Analysing the impact of cloud-based accounting on business performance of smes. Bus Manag Rev 10:37–44
Google Scholar
Kearse N (2020) What is supplier invoice fraud and how do you prevent it? Hub. https://rb.gy/6fywno. Accessed 23 Feb 2023
Khayyam H, Jamali A, Bab-Hadiashar A, Esch T, Ramakrishna S, Jalili M, Naebe M (2020) A novel hybrid machine learning algorithm for limited and big data modeling with application in industry 4.0. IEEE Access 8:111381–111393
Article Google Scholar
Kim S, Mai TD, Han S, Park S, Khanh TND, Soh J, Singh K, Cha M (2022) Active learning for human–in–the–loop customs inspection. IEEE Trans Knowl Data Eng 1–1
Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97:273–324
Article Google Scholar
Kramer B (2015) Trust, but verify: fraud in small businesses. J Small Bus Enterp Dev 22:4–20
Article Google Scholar
Kranacher MJ, Riley R (2019) Forensic accounting and fraud examination. John Wiley & Sons
Google Scholar
Kruber F, Wurst J, Botsch M (2018) An unsupervised random forest clustering technique for automatic traffic scenario categorization. In: 2018 21st international conference on intelligent transportation systems (ITSC), pp 2811–2818
Kumar P, Murphy A, Werner S, Rougeaux C (2022) The fight against money laundering: machine learning is a game changer. McKinsey & Company. https://rb.gy/mn66cp. Accessed 23 Feb 2023
Li N, Martin A, Estival R (2018) Combination of supervised learning and unsupervised learning based on object association for land cover classification. In: 2018 digital image computing: techniques and applications (DICTA), pp 1–8
Li T, Kou G, Peng Y, Philip SY (2021) An integrated cluster detection, optimization, and interpretation approach for financial data. IEEE Trans Cybern 52:13848–13861
Article Google Scholar
Lui A, Lamb GW (2018) Artificial intelligence and augmented intelligence collaboration: regaining trust and confidence in the financial sector. Inf Commun Technol Law 27:267–283
Article Google Scholar
Ma D, Fisher R, Nesbit T (2021) Cloud-based client accounting and small and medium accounting practices: adoption and impact. Int J Account Inf Syst 41:100513
Maadi M, Akbarzadeh Khorshidi H, Aickelin U (2021) A review on human–ai interaction in machine learning and insights for medical applications. Int J Environ Res Public Health 18:2121
Article PubMed PubMed Central Google Scholar
Mahalanobis PC (1936) On the generalised distance in statistics. In: Proceedings of the national institute of science of India, pp 49–55
Manning CD (2008) Introduction to information retrieval. Syngress Publishing
McLachlan GJ, Basford KE (1988) Mixture models: inference and applications to clustering, vol 38. M. Dekker, New York
Pai PF, Hsu MF, Wang MC (2011) A support vector machine-based model for detecting top management fraud. Knowl-Based Syst 24:314–321
Pavía JM, Veres-Ferrer EJ, Foix-Escura G (2012) Credit card incidents and control systems. Int J Inf Manag 32:501–503
Article Google Scholar
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830
MathSciNet Google Scholar
Popivniak Y (2019) Cloud-based accounting software: choice options in the light of modern international tendencies. Balt J Econ Studi 5:170–177
Article Google Scholar
Powers D (2011) Evaluation: from precision, recall and f–measure to roc, informedness, markedness and correlation. J Mach Learn Technol 2:37–63
Google Scholar
Raghavan P, El Gayar N (2019) Fraud detection using machine learning and deep learning. In: 2019 international conference on computational intelligence and knowledge economy (ICCIKE), pp 334–339
Reddy S, Dragan A, Levine S (2021) Pragmatic image compression for human-in-the-loop decision-making. Adv Neural Inf Process Syst 34:26499–26510
Google Scholar
Samrin R, Vasumathi D (2018) Hybrid weighted k-means clustering and artificial neural network for an anomaly-based network intrusion detection system. J Intell Syst 27:135–147
Google Scholar
Sittig DF, Singh H (2013) A red-flag-based approach to risk management of ehr-related safety concerns. J Healthcare Risk Manag 33:21–26
Article Google Scholar
Song L, Smola A, Gretton A, Borgwardt KM, Bedo J (2007) Supervised feature selection via dependence estimation. In: Proceedings of the 24th international conference on machine learning, pp 823–830
Sorantin E, Grasser MG, Hemmelmayr A, Tschauner S, Hrzic F, Weiss V, Lacekova J, Holzinger A (2021) The augmented radiologist: artificial intelligence in the practice of radiology. Pediatr Radiol 1–13
Stamler RT, Marschdorf HJ, Possamai M (2014) Fraud prevention and detection: warning signs and the red flag system. CRC Press
Book Google Scholar
Taylor P, Griffiths N, Hall V, Xu Z, Mouzakitis A (2022) Feature selection for supervised learning and compression. Appl Artif Intell 1–35
U.S. Attorney’s Office (2020) Four individuals charged with \$19 million fraudulent invoicing scheme targeting Amazon’s vendor system. https://rb.gy/dj6xqs. Accessed 23 Feb 2023
Wang J, Biljecki F (2022) Unsupervised machine learning in urban studies: a systematic review of applications. Cities 129:103925
White AH (2017) 6 ways to spot and prevent invoice fraud. https://rb.gy/yccy8p. Accessed 23 Feb 2023
Wu X, Xiao L, Sun Y, Zhang J, Ma T, He L (2022) A survey of human–in–the–loop for machine learning. Future Gen Comput Syst
Xie CH, Chang JY, Liu YJ (2013) Estimating the number of components in gaussian mixture models adaptively for medical image. Optik 124:6216–6221
Article ADS Google Scholar
Xie R, Mao W, Shi G (2019) Electronic invoice authenticity verifying scheme based on signature recognition. In: Journal of physics: conference series. IOP Publishing, p 032019
Zhang Y, Li M, Wang S, Dai S, Luo L, Zhu E, Xu H, Zhu X, Yao C, Zhou H (2021) Gaussian mixture model clustering with incomplete data. ACM Trans Multimedia Comput Commun Appl (TOMM) 17:1–14
Zheng NN, Liu ZY, Ren PJ, Ma YQ, Chen ST, Yu SY, Xue JR, Chen BD, Wang FY (2017) Hybrid-augmented intelligence: collaboration and cognition. Front Inf Technol Electron Eng 18:153–179
Article Google Scholar
Zhou Q, Zhou H, Li T (2016) Cost-sensitive feature selection using random forest: selecting low-cost subsets of informative features. Knowledgebased Syst 95:1–11
Google Scholar

Download references

Funding

We acknowledge support from the Natural Sciences and Engineering Research Council (NSERC) Discovery (Award Number: RGPIN-2020-06792) and Mitacs Accelerate Fellowship Program (Award Number: IT16025) programs for their support of this project.

Author information

Authors and Affiliations

School of Computational Science & Engineering, McMaster University, Hamilton, Canada
Dewan F. Wahid & Elkafi Hassini
DeGroote School of Business, McMaster University, Hamilton, Canada
Elkafi Hassini

Authors

Dewan F. Wahid
View author publications
You can also search for this author in PubMed Google Scholar
Elkafi Hassini
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dewan F. Wahid.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wahid, D.F., Hassini, E. An augmented AI-based hybrid fraud detection framework for invoicing platforms. Appl Intell 54, 1297–1310 (2024). https://doi.org/10.1007/s10489-023-05223-x

Download citation

Accepted: 06 December 2023
Published: 04 January 2024
Issue Date: January 2024
DOI: https://doi.org/10.1007/s10489-023-05223-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An augmented AI-based hybrid fraud detection framework for invoicing platforms

Abstract

Access this article

Similar content being viewed by others

The role of artificial intelligence in healthcare: a structured literature review

Identifying the most accurate machine learning classification technique to detect network threats

Artificial Intelligence and Fraud Detection

Data Availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An augmented AI-based hybrid fraud detection framework for invoicing platforms

Abstract

Access this article

Similar content being viewed by others

The role of artificial intelligence in healthcare: a structured literature review

Identifying the most accurate machine learning classification technique to detect network threats

Artificial Intelligence and Fraud Detection

Data Availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation