Skip to main content
Log in

Claims fraud detection with uncertain labels

  • Regular Article
  • Published:
Advances in Data Analysis and Classification Aims and scope Submit manuscript

Abstract

Insurance fraud is a non self-revealing type of fraud. The true historical labels (fraud or legitimate) are only as precise as the investigators’ efforts and successes to uncover them. Popular approaches of supervised and unsupervised learning fail to capture the ambiguous nature of uncertain labels. Imprecisely observed labels can be represented in the Dempster–Shafer theory of belief functions, a generalization of supervised and unsupervised learning suited to represent uncertainty. In this paper, we show that partial information from the historical investigations can add valuable, learnable information for the fraud detection system and improves its performances. We also show that belief function theory provides a flexible mathematical framework for concept drift detection and cost sensitive learning, two common challenges in fraud detection. Finally, we present an application to a real-world motor insurance claim fraud.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  • Abdallah A, Maarof MA, Zainal A (2016) Fraud detection system: a survey. J Netw Comput Appl 68:90–113

    Article  Google Scholar 

  • Alippi C, Boracchi G, Roveri M (2013) Just-in-time classifiers for recurrent concepts. IEEE Trans Neural Netw Learn Syst 24(4):620–634

    Article  PubMed  Google Scholar 

  • Anderson E (1935) The irises of the gaspe peninsula. Bull Am Iris Soc 59:2–5

    Google Scholar 

  • Bahnsen AC, Aouada D, Ottersten B (2015) Example-dependent cost-sensitive decision trees. Expert Syst Appl 42(19):6609–6619

    Article  Google Scholar 

  • Barabesi L, Cerasa A, Cerioli A, Perrotta D (2018) Goodness-of-fit testing for the Newcomb–Benford law with application to the detection of customs fraud. J Bus Econ Stat 36(2):346–358

    Article  MathSciNet  Google Scholar 

  • Bekker J, Davis J (2020) Learning from positive and unlabeled data: a survey. Mach Learn 109(4):719–760

    Article  MathSciNet  Google Scholar 

  • Bolton RJ, Hand DJ (2002) Statistical fraud detection: a review. Stat Sci 17(3):235–255

    Article  MathSciNet  Google Scholar 

  • Brockett PL, Xia X, Derrig RA (1998) Using kohonen’s self-organizing feature map to uncover automobile bodily injury claims fraud. J Risk Insurance, pp 245–274

  • Brown I, Mues C (2012) An experimental comparison of classification algorithms for imbalanced credit scoring data sets. Expert Syst Appl 39(3):3446–3453

    Article  Google Scholar 

  • Carcillo F, Le Borgne Y-A, Caelen O, Bontempi G (2018) Streaming active learning strategies for real-life credit card fraud detection: assessment and visualization. Int J Data Sci Anal 5:285–300

    Article  Google Scholar 

  • Cerioli A, Barabesi L, Cerasa A, Menegatti M, Perrotta D (2019) Newcomb–Benford law and the detection of frauds in international trade. Proc Natl Acad Sci 116(1):106–115

    Article  ADS  MathSciNet  CAS  PubMed  Google Scholar 

  • Chapelle O, Scholkopf B, Zien A (2006) Semi-supervised learning. MIT Press, Cambridge

    Book  Google Scholar 

  • Cherfi ZL, Oukhellou L, Côme E, Denoeux T, Aknin P (2012) Partially supervised independent factor analysis using soft labels elicited from multiple experts: application to railway track circuit diagnosis. Soft Comput 16(5):741–754

    Article  Google Scholar 

  • Coallition Against Insurance Fraud. https://insurancefraud.org/fraud-stats/. Accessed 5 May 2023

  • Côme E, Oukhellou L, Denœux T, Aknin P (2008) Mixture model estimation with soft labels. In: Soft methods for handling variability and imprecision. Springer, Berlin, pp 165–174

  • Cuzzolin F (2021) The geometry of uncertainty: the geometry of imprecise probabilities. Springer, Switzerland

    Book  Google Scholar 

  • Dal Pozzolo A, Boracchi G, Caelen O, Alippi C, Bontempi G (2017) Credit card fraud detection: a realistic modeling and a novel learning strategy. IEEE Trans Neural Netw Learn Syst 29(8):3784–3797

  • Dal Pozzolo A, Boracchi G, Caelen O, Alippi C, Bontempi G (2015) Credit card fraud detection and concept-drift adaptation with delayed supervised information. In: 2015 International joint conference on neural networks (IJCNN). IEEE, pp 1–8

  • Davis J, Goadrich M (2006) The relationship between precision-recall and roc curves. In: Proceedings of the 23rd international conference on machine learning, pp 233–240

  • Denoeux T (1995) A k-nearest neighbor classification rule based on Dempster–Shafer theory. IEEE Trans Syst Man Cybern 25(5):804–813

    Article  Google Scholar 

  • Derrig RA (2002) Insurance fraud. J Risk Insurance 69(3):271–287

    Article  Google Scholar 

  • Devroye L, Györfi L, Lugosi G (2013) A probabilistic theory of pattern recognition, vol 31. Springer, New York

    Google Scholar 

  • Elkan C (2001) The foundations of cost-sensitive learning. In: International Joint Conference on Artificial Intelligence, vol 17, pp 973–978. Lawrence Erlbaum Associates Ltd

  • Elkan C, Noto K (2008) Learning classifiers from only positive and unlabeled data. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, pp 213–220

  • European Union. https://europa.eu/youreurope/citizens/vehicles/insurance/accident/index_en.htm#shortcut-0. Accessed 26 July 2022

  • FBI Insurance Fraud. https://www.fbi.gov/stats-services/publications/insurance-fraud. Accessed 5 May 2023

  • Hand DJ, Anagnostopoulos C (2022) Notes on the h-measure of classifier performance. Adv Data Anal Classif, 1–16

  • Höppner S, Baesens B, Verbeke W, Verdonck T (2022) Instance-dependent cost-sensitive learning for detecting transfer fraud. Eur J Oper Res 297(1):291–300

    Article  MathSciNet  Google Scholar 

  • Insurance Europe. https://www.insuranceeurope.eu/priorities/23/fraud-prevention. Accessed 5 May 2023

  • Liang C, Zhang Y, Shi P, Hu Z (2012) Learning very fast decision tree from uncertain data streams with positive and unlabeled samples. Inf Sci 213:50–67

    Article  MathSciNet  Google Scholar 

  • Malekian D, Hashemi MR (2013) An adaptive profile based fraud detection framework for handling concept drift. In: 2013 10th International ISC conference on information security and cryptology (ISCISC), pp 1–6. IEEE

  • Nguyen Q, Valizadegan H, Hauskrecht M (2011) Learning classification with auxiliary probabilistic information. In: 2011 IEEE 11th international conference on data mining, pp 477–486. IEEE

  • Nian K, Zhang H, Tayal A, Coleman T, Li Y (2016) Auto insurance fraud detection using unsupervised spectral ranking for anomaly. J Finance Data Sci 2(1):58–75

    Article  Google Scholar 

  • O’Hagan A (2019) Expert knowledge elicitation: subjective but scientific. Am Stat 73(sup1):69–81

    Article  MathSciNet  Google Scholar 

  • Quost B, Denoeux T, Li S (2017) Parametric classification with soft labels using the evidential em algorithm: linear discriminant analysis versus logistic regression. Adv Data Anal Classif 11(4):659–690

    Article  MathSciNet  Google Scholar 

  • Ross GJ, Adams NM, Tasoulis DK, Hand DJ (2012) Exponentially weighted moving average charts for detecting concept drift. Pattern Recognit Lett 33(2):191–198

    Article  ADS  Google Scholar 

  • Saito T, Rehmsmeier M (2015) The precision-recall plot is more informative than the roc plot when evaluating binary classifiers on imbalanced datasets. PLoS ONE 10(3):0118432

    Article  Google Scholar 

  • Shafer G (1976) A mathematical theory of evidence. Princeton University Press, Princeton

    Book  Google Scholar 

  • Šimecková M (2005) Maximum weighted likelihood estimator in logistic regression. In: WDS, vol 5, pp 144–148

  • Smets P (1989) Constructing the pignistic probability function in a context of uncertainty. In: UAI, vol 89, pp 29–40

  • Sparrow MK (2008) Fraud in the us health-care system: exposing the vulnerabilities of automated payments systems. Soc Res: Int Q 75(4):1151–1180

    Article  Google Scholar 

  • Stripling E, Baesens B, Chizi B, vanden Broucke S (2018) Isolation-based conditional anomaly detection on mixed-attribute data to uncover workers’ compensation fraud. Decis Support Syst 111:13–26

  • Šubelj L, Furlan Š, Bajec M (2011) An expert system for detecting automobile insurance fraud using social network analysis. Expert Syst Appl 38(1):1039–1052

    Article  Google Scholar 

  • The Association of British Insurers. https://www.abi.org.uk/products-and-issues/topics-and-issues/fraud/. Accessed 5 May 2023

  • Tsymbal A (2004) The problem of concept drift: definitions and related work. Comput Sci Dept Trinity Coll Dublin 106(2):58

    Google Scholar 

  • Vanderschueren T, Verdonck T, Baesens B, Verbeke W (2022) Predict-then-optimize or predict-and-optimize? An empirical evaluation of cost-sensitive learning strategies. Inf Sci 594:400–415

    Article  Google Scholar 

  • Viaene S, Dedene G (2004) Insurance fraud: issues and challenges. Geneva Pap Risk Insurance-Issues Pract 29(2):313–333

    Article  Google Scholar 

  • Webb GI, Hyde R, Cao H, Nguyen HL, Petitjean F (2016) Characterizing concept drift. Data Min Knowl Disc 30(4):964–994

    Article  MathSciNet  Google Scholar 

  • Yager RR, Liu L (2008) Classic works of the Dempster–Shafer theory of belief functions, vol 219. Springer, Berlin

    Book  Google Scholar 

  • Yaghlane AB, Denœux T, Mellouli K (2008) Elicitation of expert opinions for constructing belief functions. In: Uncertainty and intelligent information systems. World Scientific, Singapore, pp 75–89

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Félix Vandervorst.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Vandervorst, F., Verbeke, W. & Verdonck, T. Claims fraud detection with uncertain labels. Adv Data Anal Classif 18, 219–243 (2024). https://doi.org/10.1007/s11634-023-00568-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11634-023-00568-0

Keywords

Mathematics Subject Classification

Navigation