Auto claim fraud detection using Bayesian learning neural networks

https://doi.org/10.1016/j.eswa.2005.04.030Get rights and content

Abstract

This article explores the explicative capabilities of neural network classifiers with automatic relevance determination weight regularization, and reports the findings from applying these networks for personal injury protection automobile insurance claim fraud detection. The automatic relevance determination objective function scheme provides us with a way to determine which inputs are most informative to the trained neural network model. An implementation of MacKay's, (1992a,b) evidence framework approach to Bayesian learning is proposed as a practical way of training such networks. The empirical evaluation is based on a data set of closed claims from accidents that occurred in Massachusetts, USA during 1993.

Introduction

In recent years, the detection of fraudulent claims has blossomed into a high-priority and technology-laden problem for insurers (Viaene, 2002). Several sources speak of the increasing prevalence of insurance fraud and the sizeable proportions it has taken on (see, for example, Canadian Coalition Against Insurance Fraud, 2002, Coalition Against Insurance Fraud, 2002, Comité Européen des Assurances, 1996, Comité Européen des Assurances, 1997). September 2002, a special issue of the Journal of Risk and Insurance (Derrig, 2002) was devoted to insurance fraud topics. It scopes a significant part of previous and current technical research directions regarding insurance (claim) fraud prevention, detection and diagnosis.

More systematic electronic collection and organization of and company-wide access to coherent insurance data have stimulated data-driven initiatives aimed at analyzing and modeling the formal relations between fraud indicator combinations and claim suspiciousness to upgrade fraud detection with (semi-)automatic, intelligible, accountable tools. Machine learning and artificial intelligence solutions are increasingly explored for the purpose of fraud prediction and diagnosis in the insurance domain. Still, all in all, little work has been published on the latter. Most of the state-of-the-art practice and methodology on fraud detection remains well-protected behind the thick walls of insurance companies. The reasons are legion.

Viaene, et al. (2002) reported on the results of a predictive performance benchmarking study. The study involved the task of learning to predict expert suspicion of personal injury protection (PIP) (no-fault) automobile insurance claim fraud. The data that was used consisted of closed real-life PIP claims from accidents that occurred in Massachusetts, USA during 1993, and that were previously investigated for suspicion of fraud by domain experts. The study contrasted several instantiations of a spectrum of state-of-the-art supervised classification techniques, that is, techniques aimed at algorithmically learning to allocate data objects, that is, input or feature vectors, to a priori defined object classes, based on a training set of data objects with known class or target labels. Among the considered techniques were neural network classifiers trained according to MacKay's (1992a) evidence framework approach to Bayesian learning. These neural networks were shown to consistently score among the best for all evaluated scenarios.

Statistical modeling techniques such as logistic regression, linear and quadratic discriminant analysis are widely used for modeling and prediction purposes. However, their predetermined functional form and restrictive (often unfounded) model assumptions limit their usefulness. The role of neural networks is to provide general and efficiently scalable parameterized nonlinear mappings between a set of input variables and a set of output variables (Bishop, 1995). Neural networks have shown to be very promising alternatives for modeling complex nonlinear relationships (see, for example, Desai et al., 1996, Lacher et al., 1995, Lee et al., 1996, Mobley et al., 2000, Piramuthu, 1999, Salchenberger et al., 1997, Sharda and Wilson, 1996). This is especially true in situations where one is confronted with a lack of domain knowledge which prevents any valid argumentation to be made concerning an appropriate model selection bias on the basis of prior domain knowledge.

Even though the modeling flexibility of neural networks makes them a very attractive and interesting alternative for pattern learning purposes, unfortunately, many practical problems still remain when implementing neural networks, such as What is the impact of the initial weight choice? How to set the weight decay parameter? How to avoid the neural network from fitting the noise in the training data? These and other issues are often dealt with in ad hoc ways. Nevertheless, they are crucial to the success of any neural network implementation. Another major objection to the use of neural networks for practical purposes remains their widely proclaimed lack of explanatory power. Neural networks are black boxes, it says. In this article Bayesian learning (Bishop, 1995, Neal, 1996) is suggested as a way to deal with these issues during neural network training in a principled, rather than an ad hoc fashion.

We set out to explore and demonstrate the explicative capabilities of neural network classifiers trained using an implementation of MacKay's (1992a) evidence framework approach to Bayesian learning for optimizing an automatic relevance determination (ARD) regularized objective function (MacKay, 1994, Neal, 1998). The ARD objective function scheme allows us to determine the relative importance of inputs to the trained model. The empirical evaluation in this article is based on the modeling work performed in the context of the baseline benchmarking study of Viaene et al. (2002).

The importance of input relevance assessment needs no underlining. It is not uncommon for domain experts to ask which inputs are relatively more important. Specifically, Which inputs contribute most to the detection of insurance claim fraud? This is a very reasonable question. As such, methods for input selection are not only capable of improving the human understanding of the problem domain, in casu the diagnosis of insurance claim fraud, but also allow for more efficient and lower-cost solutions. In addition, penalization or elimination of (partially) redundant or irrelevant inputs may also effectively counter the curse of dimensionality (Bellman, 1961). In practice, adding inputs (even relevant ones) beyond a certain point can actually lead to a reduction in the performance of a predictive model. This is because, faced with limited data availability, as we are in practice, increasing the dimensionality of the input space will eventually lead to a situation where this space is so sparsely populated that it very poorly represents the true model in the data. This phenomenon has been termed the curse of dimensionality. The ultimate objective of input selection is, therefore, to select a minimum number of inputs required to capture the structure in the data.

This article is organized as follows. Section 2 revisits some basic theory on multilayer neural networks for classification. Section 3 elaborates on input relevance determination. The evidence framework approach to Bayesian learning for neural network classifiers is discussed in Section 4. The theoretical exposition in the first three sections is followed by an empirical evaluation. Section 5 describes the characteristics of the 1993 Massachusetts, USA PIP closed claims data that were used. Section 6 describes the setup of the empirical evaluation and reports its results. Section 7 concludes this article.

Section snippets

Neural networks for classification

Fig. 1 shows a simple three-layer neural network. It is made up of an input layer, a hidden layer and an output layer, each consisting of a number of processing units. The layers are interconnected by modifiable weights, represented by the links between the layers. A bias unit is connected to each unit other than the input units. The function of a processing unit is to accept signals along its incoming connections and (nonlinearly) transform a weighted sum of these signals, termed its

Input relevance determination

The ARD objective function allows us to control the size of the weights associated with each input separately. Large αm values suppress the weights exiting from the respective input and effectively switch its contribution to the functioning of the MLP classifier to a lower level. This means that all inputs can be rank ordered according to their optimized αm values. Inputs associated with larger αm values are less relevant to the neural network. The most relevant inputs will have the lowest αm.

Evidence framework

The aim of Bayesian learning or Bayesian estimation (Bishop, 1995, Neal, 1996) is to develop probabilistic models that fit the data, and make optimal predictions using those models. The conceptual difference between Bayesian estimation and maximum likelihood estimation is that we no longer view model parameters as fixed, but rather treat them as random variables that are characterized by a joint probability model. This stresses the importance of capturing and accommodating for the inherent

PIP claims data

The empirical evaluation in Section 6 is based on a data set of 1,399 closed PIP automobile insurance claim files from accidents that occurred in Massachusetts, USA during 1993, and for which information was meticulously collected by the Automobile Insurers Bureau (AIB) of Massachusetts, USA. For all the claims the AIB tracked information on 25 binary fraud indicators (also known as red flags) and 12 nonindicator inputs, specifically, discretized continuous inputs, that are all supposed to make

Empirical evaluation

In this section, we demonstrate the intelligible soft input selection capabilities of MLP-ARD using the 1993 Massachusetts, USA PIP automobile insurance closed claims data. The produced input importance ranking will be compared with the results from popular logistic regression and decision tree learning. For this study, we have used the models that were fitted to the data for the baseline benchmarking study of Viaene et al. (2002).

In the baseline benchmarking study, we contrasted the predictive

Conclusion

Understanding the semantics that underlie the output of neural network models proves an important aspect of their acceptance by domain experts for routine analysis and decision making purposes. Hence, we explored the explicative capabilities of neural network classifiers with automatic relevance determination weight regularization, and reported the findings of applying these networks for personal injury protection automobile insurance claim fraud detection. The regularization scheme was aimed

References (59)

  • C.M. Bishop

    Neural networks for pattern recognition

    (1995)
  • Breiman, L. (2001a). Understanding complex predictors. Invited talk at the nonparametrics in large, multidimensional...
  • L. Breiman

    Random forests

    Machine Learning

    (2001)
  • L. Breiman et al.

    Classification and regression trees (CART)

    (1984)
  • Buntine, W.L. (1990). A theory of learning classification rules. PhD Thesis. School of Computing Science, University of...
  • Canadian Coalition Against Insurance fraud. (2002). Insurance fraud,...
  • B. Cestnik

    Estimating probabilities: A crucial task in machine learning

    (1990)
  • Coalition Against Insurance Fraud (2002). Insurance fraud: The crime you pay for,...
  • Comité Européen des Assurances. (1996). The European insurance anti-fraud guide. CEA info special issue 4. Paris: Euro...
  • Comité Européen des Assurances. (1997). The European insurance anti-fraud guide 1997 update. CEA info special issue 5...
  • J.B. Copas

    Regression, prediction and shrinkage (with discussion)

    Journal of the Royal Statistical Society: Methodological 45

    (1983)
  • Cussens, J. (1993). Bayes and pseudo-Bayes estimates of conditional probabilities and their reliability. Proceedings of...
  • Derrig, R. A. (Ed.) (2002), Special issue on insurance fraud. Journal of Risk and Insurance 69...
  • Derrig, R. A., & Weisberg, H. (1998). AIB PIP screening experiment final report—Understanding and improving the claim...
  • P. Diaconis et al.

    Computer-intensive methods in statistics

    Scientific American

    (1983)
  • Dietterich, T. G. (2002). Ensemble learning. In M.A. Arbib (Ed.), The hanbook of brain theory and neural networks (2nd...
  • Domingos, P. (2000). Bayesian averaging of classifiers and the overfitting problem. Proceedings of the seventeenth...
  • Drummond, C., & Holte, R. C. (2000). Explicitly representing expected cost: An alternative to ROC representation....
  • R.O. Duda et al.

    Pattern classification

    (2000)
  • Cited by (90)

    • The value of cross-data set analysis for automobile insurance fraud detection

      2022, Research in International Business and Finance
    • Financial Fraud: A Review of Anomaly Detection Techniques and Recent Advances

      2022, Expert Systems with Applications
      Citation Excerpt :

      This eliminates the need to remove irrelevant inputs a priori, as they are dealt directly with by the ARD’s regularization parameter scheme. In their work, Viaene et al. conducted a baseline benchmark study, empirically evaluating the performances of the DT, SVM and proposed BNN models in detecting fraudulent automobile insurance claims (Viaene et al., 2002; Viaene et al., 2005). A dataset consisting of 1,399 claims made in the state of Massachusetts in 1992 was used.

    • Intelligent financial fraud detection practices in post-pandemic era

      2021, Innovation
      Citation Excerpt :

      Insurance companies and banks established unique systems to collect and store the basic information of policyholders or account holders.37 For insurers, the information used for fraud detection, such as insurance claims,38 the characteristics of incidents,25 and customer purchase behaviors,39 are obtained from the claim statement or the policy.40,41 Banks usually predict fraud with the help of transaction information, such as transactional history and payment observation.42,43

    • Optimal balancing & efficient feature ranking approach to minimize credit risk

      2021, International Journal of Information Management Data Insights
    • Predicting automobile insurance fraud using classical and machine learning models

      2024, International Journal of Electrical and Computer Engineering
    View all citing articles on Scopus
    View full text