Auto claim fraud detection using Bayesian learning neural networks
Introduction
In recent years, the detection of fraudulent claims has blossomed into a high-priority and technology-laden problem for insurers (Viaene, 2002). Several sources speak of the increasing prevalence of insurance fraud and the sizeable proportions it has taken on (see, for example, Canadian Coalition Against Insurance Fraud, 2002, Coalition Against Insurance Fraud, 2002, Comité Européen des Assurances, 1996, Comité Européen des Assurances, 1997). September 2002, a special issue of the Journal of Risk and Insurance (Derrig, 2002) was devoted to insurance fraud topics. It scopes a significant part of previous and current technical research directions regarding insurance (claim) fraud prevention, detection and diagnosis.
More systematic electronic collection and organization of and company-wide access to coherent insurance data have stimulated data-driven initiatives aimed at analyzing and modeling the formal relations between fraud indicator combinations and claim suspiciousness to upgrade fraud detection with (semi-)automatic, intelligible, accountable tools. Machine learning and artificial intelligence solutions are increasingly explored for the purpose of fraud prediction and diagnosis in the insurance domain. Still, all in all, little work has been published on the latter. Most of the state-of-the-art practice and methodology on fraud detection remains well-protected behind the thick walls of insurance companies. The reasons are legion.
Viaene, et al. (2002) reported on the results of a predictive performance benchmarking study. The study involved the task of learning to predict expert suspicion of personal injury protection (PIP) (no-fault) automobile insurance claim fraud. The data that was used consisted of closed real-life PIP claims from accidents that occurred in Massachusetts, USA during 1993, and that were previously investigated for suspicion of fraud by domain experts. The study contrasted several instantiations of a spectrum of state-of-the-art supervised classification techniques, that is, techniques aimed at algorithmically learning to allocate data objects, that is, input or feature vectors, to a priori defined object classes, based on a training set of data objects with known class or target labels. Among the considered techniques were neural network classifiers trained according to MacKay's (1992a) evidence framework approach to Bayesian learning. These neural networks were shown to consistently score among the best for all evaluated scenarios.
Statistical modeling techniques such as logistic regression, linear and quadratic discriminant analysis are widely used for modeling and prediction purposes. However, their predetermined functional form and restrictive (often unfounded) model assumptions limit their usefulness. The role of neural networks is to provide general and efficiently scalable parameterized nonlinear mappings between a set of input variables and a set of output variables (Bishop, 1995). Neural networks have shown to be very promising alternatives for modeling complex nonlinear relationships (see, for example, Desai et al., 1996, Lacher et al., 1995, Lee et al., 1996, Mobley et al., 2000, Piramuthu, 1999, Salchenberger et al., 1997, Sharda and Wilson, 1996). This is especially true in situations where one is confronted with a lack of domain knowledge which prevents any valid argumentation to be made concerning an appropriate model selection bias on the basis of prior domain knowledge.
Even though the modeling flexibility of neural networks makes them a very attractive and interesting alternative for pattern learning purposes, unfortunately, many practical problems still remain when implementing neural networks, such as What is the impact of the initial weight choice? How to set the weight decay parameter? How to avoid the neural network from fitting the noise in the training data? These and other issues are often dealt with in ad hoc ways. Nevertheless, they are crucial to the success of any neural network implementation. Another major objection to the use of neural networks for practical purposes remains their widely proclaimed lack of explanatory power. Neural networks are black boxes, it says. In this article Bayesian learning (Bishop, 1995, Neal, 1996) is suggested as a way to deal with these issues during neural network training in a principled, rather than an ad hoc fashion.
We set out to explore and demonstrate the explicative capabilities of neural network classifiers trained using an implementation of MacKay's (1992a) evidence framework approach to Bayesian learning for optimizing an automatic relevance determination (ARD) regularized objective function (MacKay, 1994, Neal, 1998). The ARD objective function scheme allows us to determine the relative importance of inputs to the trained model. The empirical evaluation in this article is based on the modeling work performed in the context of the baseline benchmarking study of Viaene et al. (2002).
The importance of input relevance assessment needs no underlining. It is not uncommon for domain experts to ask which inputs are relatively more important. Specifically, Which inputs contribute most to the detection of insurance claim fraud? This is a very reasonable question. As such, methods for input selection are not only capable of improving the human understanding of the problem domain, in casu the diagnosis of insurance claim fraud, but also allow for more efficient and lower-cost solutions. In addition, penalization or elimination of (partially) redundant or irrelevant inputs may also effectively counter the curse of dimensionality (Bellman, 1961). In practice, adding inputs (even relevant ones) beyond a certain point can actually lead to a reduction in the performance of a predictive model. This is because, faced with limited data availability, as we are in practice, increasing the dimensionality of the input space will eventually lead to a situation where this space is so sparsely populated that it very poorly represents the true model in the data. This phenomenon has been termed the curse of dimensionality. The ultimate objective of input selection is, therefore, to select a minimum number of inputs required to capture the structure in the data.
This article is organized as follows. Section 2 revisits some basic theory on multilayer neural networks for classification. Section 3 elaborates on input relevance determination. The evidence framework approach to Bayesian learning for neural network classifiers is discussed in Section 4. The theoretical exposition in the first three sections is followed by an empirical evaluation. Section 5 describes the characteristics of the 1993 Massachusetts, USA PIP closed claims data that were used. Section 6 describes the setup of the empirical evaluation and reports its results. Section 7 concludes this article.
Section snippets
Neural networks for classification
Fig. 1 shows a simple three-layer neural network. It is made up of an input layer, a hidden layer and an output layer, each consisting of a number of processing units. The layers are interconnected by modifiable weights, represented by the links between the layers. A bias unit is connected to each unit other than the input units. The function of a processing unit is to accept signals along its incoming connections and (nonlinearly) transform a weighted sum of these signals, termed its
Input relevance determination
The ARD objective function allows us to control the size of the weights associated with each input separately. Large αm values suppress the weights exiting from the respective input and effectively switch its contribution to the functioning of the MLP classifier to a lower level. This means that all inputs can be rank ordered according to their optimized αm values. Inputs associated with larger αm values are less relevant to the neural network. The most relevant inputs will have the lowest αm.
Evidence framework
The aim of Bayesian learning or Bayesian estimation (Bishop, 1995, Neal, 1996) is to develop probabilistic models that fit the data, and make optimal predictions using those models. The conceptual difference between Bayesian estimation and maximum likelihood estimation is that we no longer view model parameters as fixed, but rather treat them as random variables that are characterized by a joint probability model. This stresses the importance of capturing and accommodating for the inherent
PIP claims data
The empirical evaluation in Section 6 is based on a data set of 1,399 closed PIP automobile insurance claim files from accidents that occurred in Massachusetts, USA during 1993, and for which information was meticulously collected by the Automobile Insurers Bureau (AIB) of Massachusetts, USA. For all the claims the AIB tracked information on 25 binary fraud indicators (also known as red flags) and 12 nonindicator inputs, specifically, discretized continuous inputs, that are all supposed to make
Empirical evaluation
In this section, we demonstrate the intelligible soft input selection capabilities of MLP-ARD using the 1993 Massachusetts, USA PIP automobile insurance closed claims data. The produced input importance ranking will be compared with the results from popular logistic regression and decision tree learning. For this study, we have used the models that were fitted to the data for the baseline benchmarking study of Viaene et al. (2002).
In the baseline benchmarking study, we contrasted the predictive
Conclusion
Understanding the semantics that underlie the output of neural network models proves an important aspect of their acceptance by domain experts for routine analysis and decision making purposes. Hence, we explored the explicative capabilities of neural network classifiers with automatic relevance determination weight regularization, and reported the findings of applying these networks for personal injury protection automobile insurance claim fraud detection. The regularization scheme was aimed
References (59)
- et al.
A comparison of neural networks and linear scoring models in the credit union environment
European Journal of Operational Research
(1996) - et al.
A neural network for classifying the financial health of a firm
European Journal of Operational Research
(1995) - et al.
Hybrid neural network models for bankruptcy predictions
Decision Support Systems
(1996) - et al.
Predictions of coronary artery stenosis by artificial neural network
Artificial Intelligence in Medicine
(2000) Financial credit-risk evaluation with neural and neurofuzzy systems
European Journal of Operational Research
(1999)- et al.
Using neural networks to aid the diagnosis of breast implant rupture
Computers and Operations Research
(1997) Logistic regression using the SAS system: Theory and application
(1999)- et al.
An empirical comparison of voting classification algorithms: Bagging, boosting and variants
Machine Learning
(1999) Adaptive control processes
(1961)Gradient-based optimization of hyper-parameters
Neural Computation
(2000)
Neural networks for pattern recognition
Random forests
Machine Learning
Classification and regression trees (CART)
Estimating probabilities: A crucial task in machine learning
Regression, prediction and shrinkage (with discussion)
Journal of the Royal Statistical Society: Methodological 45
Computer-intensive methods in statistics
Scientific American
Pattern classification
Cited by (90)
The value of cross-data set analysis for automobile insurance fraud detection
2022, Research in International Business and FinanceFinancial Fraud: A Review of Anomaly Detection Techniques and Recent Advances
2022, Expert Systems with ApplicationsCitation Excerpt :This eliminates the need to remove irrelevant inputs a priori, as they are dealt directly with by the ARD’s regularization parameter scheme. In their work, Viaene et al. conducted a baseline benchmark study, empirically evaluating the performances of the DT, SVM and proposed BNN models in detecting fraudulent automobile insurance claims (Viaene et al., 2002; Viaene et al., 2005). A dataset consisting of 1,399 claims made in the state of Massachusetts in 1992 was used.
Intelligent financial fraud detection practices in post-pandemic era
2021, InnovationCitation Excerpt :Insurance companies and banks established unique systems to collect and store the basic information of policyholders or account holders.37 For insurers, the information used for fraud detection, such as insurance claims,38 the characteristics of incidents,25 and customer purchase behaviors,39 are obtained from the claim statement or the policy.40,41 Banks usually predict fraud with the help of transaction information, such as transactional history and payment observation.42,43
Optimal balancing & efficient feature ranking approach to minimize credit risk
2021, International Journal of Information Management Data InsightsPredicting automobile insurance fraud using classical and machine learning models
2024, International Journal of Electrical and Computer EngineeringBayesian learning for neural networks: an algorithmic survey
2023, Artificial Intelligence Review