Internal fraud risk reduction: Results of a data mining case study

https://doi.org/10.1016/j.accinf.2009.12.004Get rights and content

Abstract

Corporate fraud represents a huge cost to the current economy. Academic literature has demonstrated how data mining techniques can be of value in the fight against fraud. This research has focused on fraud detection, mostly in a context of external fraud. In this paper, we discuss the use of a data mining approach to reduce the risk of internal fraud. Reducing fraud risk involves both detection and prevention. Accordingly, a descriptive data mining strategy is applied as opposed to the widely used prediction data mining techniques in the literature. The results of using a multivariate latent class clustering algorithm to a case company's procurement data suggest that applying this technique in a descriptive data mining approach is useful in assessing the current risk of internal fraud. The same results could not be obtained by applying a univariate analysis.

Introduction

Saying that fraud is an important (however not loved) part of business, is nothing new. Fraud is a multi-million dollar business concern, as several research studies reveal and as reflected in recent surveys by the Association of Certified Fraud Examiners (ACFE, 2008) and PriceWaterhouse & Coopers (PwC, 2007). The ACFE study conducted in 2007–2008 in the United States reported that company's estimate a loss of 7% of annual revenues to fraud. Applied to US$ 14,196 billion of United States Gross Domestic Product in 2008, this would translate to approximately US$ 994 billion in fraud losses for the United States. The PwC worldwide study revealed that 43% of the companies surveyed had fallen victim to economic crime in the years 2006 and 2007 (PwC, 2007). The average financial damage to these companies was US$ 2.42 million per company over two years. These survey reports demonstrate the magnitude of fraud that companies face today.

Numerous academic studies used data mining to investigate fraud detection (Brockett et al., 2002, Cortes et al., 2002, Estévez et al., 2006, Fanning & Cogger, 1998, Kim & Kwon, 2006) and (; Kirkos et al., 2007). Although prior research has investigated fraud within different domains and using different techniques, the studies all focused on external fraud3 and used predictive data mining. Our study differs in several ways. First, we focus on internal fraud as internal fraud represents the majority of costs as identified by the PwC and ACFE surveys. Second, while prior studies examined only fraud detection, our study investigates internal fraud risk reduction, the combination of fraud detection and fraud prevention. Companies' risk exposure would be substantially greater if they only focused on fraud detection, a reactive working method. Companies use a combination of detection and prevention controls to help minimize their fraud risk. Hence, our study provides a more comprehensive view of the real world. Third, prior research used predictive data mining or more precisely predictive classification techniques. The purpose of these techniques is to classify whether an observation is fraudulent or not. Because we are focusing on risk reduction rather than detection, we believe descriptive data mining is more suited. Descriptive data mining provides us with insights on the complete data set rather than only one aspect of it, i.e., fraudulent or not. This characteristic is valuable for assessing the fraud risk in selected business processes.

The aim of this paper is to provide a framework for both researchers and practitioners to reduce internal fraud risk and to present empirical results on this topic by applying this framework. Based on data collected from an international financial service provider, we investigated fraud risk reduction in the procurement process. The results are promising. In both a subset of recent and old purchasing orders a small cluster with a high risk profile is found. The population of old purchasing orders was of such size that full examination of the specified cluster was feasible. Our analysis suggested a closer examination of ten cases. Of these ten purchasing orders nine were circumventing procedures (creating windows of opportunity to commit fraud), and one was the result of an error.

In the following sections we explain the methodology used in this study, the data set, the latent class clustering algorithm, and the results of investigating the procurement business process of the case company. We first apply a univariate analysis to explore the data and thereafter a multivariate analysis. We compare the results of both analyzes and conclude with the implications of our findings.

Section snippets

Methodology

The applied methodology is the IFR² Framework of Jans et al. (2009), summarized in Fig. 1. The IFR² Framework, which stands for Internal Fraud Risk Reduction, is a conceptual framework to guide research in internal fraud risk reduction. As a first step, an organization should select a business process which it thinks is worthwhile investigating. Selection of a business process can be motivated by the following reasons: a business process that involves large cash flow, that is unstructured, that

Data set

Based on the selected methodology, we focus on the application of the IFR² Framework for a real life database. The data set used in this study was obtained from an international financial services provider. The corporation is ranked in the top 20 among the largest European financial institutions. The business process selected for internal fraud risk reduction is procurement. This selection is inspired by the lack of existing fraud files for the procurement business process within the case

Latent class clustering algorithm

For a descriptive data mining approach, we chose a latent class (LC) clustering algorithm. LC clustering was preferred to the more traditional K-means clustering for several reasons. The most important reason is that this algorithm allows for overlapping clusters. At LC clustering, an observation is given a set of probabilities, expressing the probabilities of belonging to each cluster. Example given, in a 3 cluster setting observation A has p = .80 for cluster 1, p = .20 for cluster 2 and p = .00

Model specifications

Before turning to the core of the model applied in the descriptive data mining approach on behavior describing attributes, univariate clustering is applied to provide a comparative basis for exploring the data. Performing univariate analyzes is a common way of exploring the data at hand, before turning to more complex analyzes, such as multivariate analysis. The univariate analysis is applied on obvious attributes. The three numerical attributes were selected: number of changes (Model A),

Model specifications

Although the univariate clustering analysis showed some interesting deviating characteristics, it yielded contradictory information, depending on which attribute was selected to cluster on. A multivariate analysis takes several attributes at the same time into account and is therefore better suited in a real life scenario than selecting only one attribute at a time. Also, we needed multivariate analysis for conducting a data mining step. Before applying this analysis, the third step of our

Audit by domain experts

Because it is too time consuming to audit all 408 POs of cluster 3, it can be interesting to take a sample of POs that are made by one of the creators described above or involve one of those suppliers (or both). In this context a smaller sample of cluster 3 was extracted by taking only those POs of the six creators or in which one of the three suppliers that are most represented in the cluster were involved. This yielded a sample of 38 POs. Why is it that they merely induced POs in this small

Multivariate versus univariate analysis

The results of using a multivariate descriptive data mining approach based on behavior describing attributes, provided us with interesting results. In the smaller subset of old POs we encountered POs that are changed over and over again. Also in the larger subset, changing the PO a lot of times is a primal characteristic of the selected observations. However, one could wonder if this outcome was not much easier to obtain, simply by applying univariate clustering instead of multivariate

Conclusion

In this paper, a methodology for reducing internal fraud risk, the IFR² Framework (Jans et al. 2009), is applied in a top 20 ranked European financial institution. The results of the case study suggest that the use of a descriptive data mining approach and the multivariate latent class clustering technique, can be of additional value to reduce the risk of internal fraud in a company. Using univariate latent class clustering did not yield the same results. The application of the IFR² Framework

Cited by (0)

1

Tel.: + 32 11 268602.

2

Tel.: + 32 11 269153.

View full text