Claims fraud detection with uncertain labels

Vandervorst, Félix; Verbeke, Wouter; Verdonck, Tim

doi:10.1007/s11634-023-00568-0

Claims fraud detection with uncertain labels

Regular Article
Published: 30 November 2023

Volume 18, pages 219–243, (2024)
Cite this article

Advances in Data Analysis and Classification Aims and scope Submit manuscript

233 Accesses
7 Altmetric
1 Mention
Explore all metrics

Abstract

Insurance fraud is a non self-revealing type of fraud. The true historical labels (fraud or legitimate) are only as precise as the investigators’ efforts and successes to uncover them. Popular approaches of supervised and unsupervised learning fail to capture the ambiguous nature of uncertain labels. Imprecisely observed labels can be represented in the Dempster–Shafer theory of belief functions, a generalization of supervised and unsupervised learning suited to represent uncertainty. In this paper, we show that partial information from the historical investigations can add valuable, learnable information for the fraud detection system and improves its performances. We also show that belief function theory provides a flexible mathematical framework for concept drift detection and cost sensitive learning, two common challenges in fraud detection. Finally, we present an application to a real-world motor insurance claim fraud.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Iterative cleaning and learning of big highly-imbalanced fraud data using unsupervised learning

Article Open access 19 June 2023

Automated Health Insurance Management Framework with Intelligent Fraud Detection, Premium Prediction, and Risk Prediction

Importance of Self-Learning Algorithms for Fraud Detection Under Concept Drift

References

Abdallah A, Maarof MA, Zainal A (2016) Fraud detection system: a survey. J Netw Comput Appl 68:90–113
Article Google Scholar
Alippi C, Boracchi G, Roveri M (2013) Just-in-time classifiers for recurrent concepts. IEEE Trans Neural Netw Learn Syst 24(4):620–634
Article PubMed Google Scholar
Anderson E (1935) The irises of the gaspe peninsula. Bull Am Iris Soc 59:2–5
Google Scholar
Bahnsen AC, Aouada D, Ottersten B (2015) Example-dependent cost-sensitive decision trees. Expert Syst Appl 42(19):6609–6619
Article Google Scholar
Barabesi L, Cerasa A, Cerioli A, Perrotta D (2018) Goodness-of-fit testing for the Newcomb–Benford law with application to the detection of customs fraud. J Bus Econ Stat 36(2):346–358
Article MathSciNet Google Scholar
Bekker J, Davis J (2020) Learning from positive and unlabeled data: a survey. Mach Learn 109(4):719–760
Article MathSciNet Google Scholar
Bolton RJ, Hand DJ (2002) Statistical fraud detection: a review. Stat Sci 17(3):235–255
Article MathSciNet Google Scholar
Brockett PL, Xia X, Derrig RA (1998) Using kohonen’s self-organizing feature map to uncover automobile bodily injury claims fraud. J Risk Insurance, pp 245–274
Brown I, Mues C (2012) An experimental comparison of classification algorithms for imbalanced credit scoring data sets. Expert Syst Appl 39(3):3446–3453
Article Google Scholar
Carcillo F, Le Borgne Y-A, Caelen O, Bontempi G (2018) Streaming active learning strategies for real-life credit card fraud detection: assessment and visualization. Int J Data Sci Anal 5:285–300
Article Google Scholar
Cerioli A, Barabesi L, Cerasa A, Menegatti M, Perrotta D (2019) Newcomb–Benford law and the detection of frauds in international trade. Proc Natl Acad Sci 116(1):106–115
Article ADS MathSciNet CAS PubMed Google Scholar
Chapelle O, Scholkopf B, Zien A (2006) Semi-supervised learning. MIT Press, Cambridge
Book Google Scholar
Cherfi ZL, Oukhellou L, Côme E, Denoeux T, Aknin P (2012) Partially supervised independent factor analysis using soft labels elicited from multiple experts: application to railway track circuit diagnosis. Soft Comput 16(5):741–754
Article Google Scholar
Coallition Against Insurance Fraud. https://insurancefraud.org/fraud-stats/. Accessed 5 May 2023
Côme E, Oukhellou L, Denœux T, Aknin P (2008) Mixture model estimation with soft labels. In: Soft methods for handling variability and imprecision. Springer, Berlin, pp 165–174
Cuzzolin F (2021) The geometry of uncertainty: the geometry of imprecise probabilities. Springer, Switzerland
Book Google Scholar
Dal Pozzolo A, Boracchi G, Caelen O, Alippi C, Bontempi G (2017) Credit card fraud detection: a realistic modeling and a novel learning strategy. IEEE Trans Neural Netw Learn Syst 29(8):3784–3797
Dal Pozzolo A, Boracchi G, Caelen O, Alippi C, Bontempi G (2015) Credit card fraud detection and concept-drift adaptation with delayed supervised information. In: 2015 International joint conference on neural networks (IJCNN). IEEE, pp 1–8
Davis J, Goadrich M (2006) The relationship between precision-recall and roc curves. In: Proceedings of the 23rd international conference on machine learning, pp 233–240
Denoeux T (1995) A k-nearest neighbor classification rule based on Dempster–Shafer theory. IEEE Trans Syst Man Cybern 25(5):804–813
Article Google Scholar
Derrig RA (2002) Insurance fraud. J Risk Insurance 69(3):271–287
Article Google Scholar
Devroye L, Györfi L, Lugosi G (2013) A probabilistic theory of pattern recognition, vol 31. Springer, New York
Google Scholar
Elkan C (2001) The foundations of cost-sensitive learning. In: International Joint Conference on Artificial Intelligence, vol 17, pp 973–978. Lawrence Erlbaum Associates Ltd
Elkan C, Noto K (2008) Learning classifiers from only positive and unlabeled data. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, pp 213–220
European Union. https://europa.eu/youreurope/citizens/vehicles/insurance/accident/index_en.htm#shortcut-0. Accessed 26 July 2022
FBI Insurance Fraud. https://www.fbi.gov/stats-services/publications/insurance-fraud. Accessed 5 May 2023
Hand DJ, Anagnostopoulos C (2022) Notes on the h-measure of classifier performance. Adv Data Anal Classif, 1–16
Höppner S, Baesens B, Verbeke W, Verdonck T (2022) Instance-dependent cost-sensitive learning for detecting transfer fraud. Eur J Oper Res 297(1):291–300
Article MathSciNet Google Scholar
Insurance Europe. https://www.insuranceeurope.eu/priorities/23/fraud-prevention. Accessed 5 May 2023
Liang C, Zhang Y, Shi P, Hu Z (2012) Learning very fast decision tree from uncertain data streams with positive and unlabeled samples. Inf Sci 213:50–67
Article MathSciNet Google Scholar
Malekian D, Hashemi MR (2013) An adaptive profile based fraud detection framework for handling concept drift. In: 2013 10th International ISC conference on information security and cryptology (ISCISC), pp 1–6. IEEE
Nguyen Q, Valizadegan H, Hauskrecht M (2011) Learning classification with auxiliary probabilistic information. In: 2011 IEEE 11th international conference on data mining, pp 477–486. IEEE
Nian K, Zhang H, Tayal A, Coleman T, Li Y (2016) Auto insurance fraud detection using unsupervised spectral ranking for anomaly. J Finance Data Sci 2(1):58–75
Article Google Scholar
O’Hagan A (2019) Expert knowledge elicitation: subjective but scientific. Am Stat 73(sup1):69–81
Article MathSciNet Google Scholar
Quost B, Denoeux T, Li S (2017) Parametric classification with soft labels using the evidential em algorithm: linear discriminant analysis versus logistic regression. Adv Data Anal Classif 11(4):659–690
Article MathSciNet Google Scholar
Ross GJ, Adams NM, Tasoulis DK, Hand DJ (2012) Exponentially weighted moving average charts for detecting concept drift. Pattern Recognit Lett 33(2):191–198
Article ADS Google Scholar
Saito T, Rehmsmeier M (2015) The precision-recall plot is more informative than the roc plot when evaluating binary classifiers on imbalanced datasets. PLoS ONE 10(3):0118432
Article Google Scholar
Shafer G (1976) A mathematical theory of evidence. Princeton University Press, Princeton
Book Google Scholar
Šimecková M (2005) Maximum weighted likelihood estimator in logistic regression. In: WDS, vol 5, pp 144–148
Smets P (1989) Constructing the pignistic probability function in a context of uncertainty. In: UAI, vol 89, pp 29–40
Sparrow MK (2008) Fraud in the us health-care system: exposing the vulnerabilities of automated payments systems. Soc Res: Int Q 75(4):1151–1180
Article Google Scholar
Stripling E, Baesens B, Chizi B, vanden Broucke S (2018) Isolation-based conditional anomaly detection on mixed-attribute data to uncover workers’ compensation fraud. Decis Support Syst 111:13–26
Šubelj L, Furlan Š, Bajec M (2011) An expert system for detecting automobile insurance fraud using social network analysis. Expert Syst Appl 38(1):1039–1052
Article Google Scholar
The Association of British Insurers. https://www.abi.org.uk/products-and-issues/topics-and-issues/fraud/. Accessed 5 May 2023
Tsymbal A (2004) The problem of concept drift: definitions and related work. Comput Sci Dept Trinity Coll Dublin 106(2):58
Google Scholar
Vanderschueren T, Verdonck T, Baesens B, Verbeke W (2022) Predict-then-optimize or predict-and-optimize? An empirical evaluation of cost-sensitive learning strategies. Inf Sci 594:400–415
Article Google Scholar
Viaene S, Dedene G (2004) Insurance fraud: issues and challenges. Geneva Pap Risk Insurance-Issues Pract 29(2):313–333
Article Google Scholar
Webb GI, Hyde R, Cao H, Nguyen HL, Petitjean F (2016) Characterizing concept drift. Data Min Knowl Disc 30(4):964–994
Article MathSciNet Google Scholar
Yager RR, Liu L (2008) Classic works of the Dempster–Shafer theory of belief functions, vol 219. Springer, Berlin
Book Google Scholar
Yaghlane AB, Denœux T, Mellouli K (2008) Elicitation of expert opinions for constructing belief functions. In: Uncertainty and intelligent information systems. World Scientific, Singapore, pp 75–89

Download references

Author information

Authors and Affiliations

Faculty of Economics and Business, KU Leuven, Naamsestraat 69, Leuven, 3000, Belgium
Félix Vandervorst & Wouter Verbeke
Department of Mathematics, University of Antwerp, Middelheimlaan 1, Antwerp, 2020, Belgium
Félix Vandervorst & Tim Verdonck
Data Office, Allianz Benelux, Koning Albert II Laan 32, Brussels, 1000, Belgium
Félix Vandervorst

Authors

Félix Vandervorst
View author publications
You can also search for this author in PubMed Google Scholar
Wouter Verbeke
View author publications
You can also search for this author in PubMed Google Scholar
Tim Verdonck
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Félix Vandervorst.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Vandervorst, F., Verbeke, W. & Verdonck, T. Claims fraud detection with uncertain labels. Adv Data Anal Classif 18, 219–243 (2024). https://doi.org/10.1007/s11634-023-00568-0

Download citation

Received: 09 January 2023
Accepted: 16 September 2023
Published: 30 November 2023
Issue Date: March 2024
DOI: https://doi.org/10.1007/s11634-023-00568-0

Keywords

Mathematics Subject Classification

68T37 - Reasoning under uncertainty in the context of artificial intelligence

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Claims fraud detection with uncertain labels

Abstract

Access this article

Similar content being viewed by others

Iterative cleaning and learning of big highly-imbalanced fraud data using unsupervised learning

Automated Health Insurance Management Framework with Intelligent Fraud Detection, Premium Prediction, and Risk Prediction

Importance of Self-Learning Algorithms for Fraud Detection Under Concept Drift

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Claims fraud detection with uncertain labels

Abstract

Access this article

Similar content being viewed by others

Iterative cleaning and learning of big highly-imbalanced fraud data using unsupervised learning

Automated Health Insurance Management Framework with Intelligent Fraud Detection, Premium Prediction, and Risk Prediction

Importance of Self-Learning Algorithms for Fraud Detection Under Concept Drift

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation