On the communal analysis suspicion scoring for identity crime in streaming credit applications

doi:10.1016/j.ejor.2008.02.015

European Journal of Operational Research

Volume 195, Issue 2, 1 June 2009, Pages 595-612

https://doi.org/10.1016/j.ejor.2008.02.015 Get rights and content

Abstract

This paper describes a rapid technique: communal analysis suspicion scoring (CASS), for generating numeric suspicion scores on streaming credit applications based on implicit links to each other, over both time and space. CASS includes pair-wise communal scoring of identifier attributes for applications, definition of categories of suspiciousness for application-pairs, the incorporation of temporal and spatial weights, and smoothed k-wise scoring of multiple linked application-pairs. Results on mining several hundred thousand real credit applications demonstrate that CASS reduces false alarm rates while maintaining reasonable hit rates. CASS is scalable for this large data sample, and can rapidly detect early symptoms of identity crime. In addition, new insights have been observed from the relationships between applications.

Introduction

Annually, credit bureaus collect millions of enquiries from financial institutions (subscribers) relating to credit applications. In Australia, credit card and personal loan applications have increased significantly, and currently, close to half a million credit bureau enquiries are made per month (Baycorp, 2005). Each credit application contains a large number of identity attributes such as personal names, address(es), telephone number(s), driver licence number (or social security number), date-of-birth, and other personal identifiers which are potentially available to the credit bureau (if local privacy laws permit it). Therefore, from a commercial perspective, it is uneconomical to physically validate and approve each attribute in every credit application.

Application fraud, a manifestation of identity crime, is present when application form(s) contain plausible and synthetic (identity fraud), and/or real, but stolen identity information (identity theft). In developed countries, the monetary cost of application fraud and identity crime is often estimated to be in the billions of dollars; and this is strongly correlated with the large volume of widely available personal information. By performing better once-off assessments in the first stage of the credit life cycle, credit scoring processes could be improved and some transactional fraud can be prevented.

Typical commercial techniques for the identification of such fraud involve the use of attribute verification rules using reference tables, and pair-wise matching rules between credit application and credit history data. However, the success rate of rule-based approaches can be weak when faced with increasingly common fraudster-tampered applications (Oscherwitz, 2005) which have valid attributes and no credit history. Other techniques being used include known fraud matching using blacklists (list of applications previously submitted by fraudsters) and supervised modelling/classification using labelled data. Often, these labelled data approaches alone are operationally inefficient and ineffective (Phua et al., 2005). Our work focuses on credit application data only, with no checks carried out against credit history.

As it is simulated to run in real-time, communal analysis suspicion scoring (CASS) does not take class labels into account when scoring applications. It only uses class labels to determine the effectiveness of its approach. Its purpose is to generate numeric suspicion scores on streaming credit applications based on implicit links to each other, over both time and space. CASS includes pair-wise communal scoring of identifier attributes for applications, definition of categories of suspiciousness for application-pairs, the incorporation of temporal and spatial weights, and smoothed k-wise scoring of multiple linked application-pairs.

In Section 2, this paper explores and compares semi-related literature and contrasts the research findings with this paper’s contributions. Section 3 describes a rapid technique known as CASS for scoring and generating links on incoming current/new application streams on demand and focuses at the level of each pair of linked applications (application-pair). Section 4, firstly, lays out the retrospective and stream processing experimental results on ethically-approved data and subsequently visualises and discusses the unique patterns of identity crime in credit applications. Section 5 concludes the paper.

Section snippets

Related work

There is no academic research, to the best of our knowledge (Phua et al., 2005) into the scoring of dynamic credit applications which accounts for its sparse-identifiers, communal, temporal, and spatial aspects. However, there are other related and established application fields. Section 2.1 summarises multi-attribute pair-wise matching (for example, record linkage/de-duplication detection and an interesting effort to discover cheating amongst teachers in exams by altering their students’

Communal analysis suspicion scoring (CASS) process

In this section, CASS compresses multiple identifier attributes to a single attribute vector representation of each link/non-link (Section 3.2). The approach distinguishes between three different categories of links: black, white, and anomalous, which will result in different weights and scores for every application-pair (Section 3.3). It accounts for the temporal and spatial effects by applying weights to each linked application-pair’s communal score computation (Section 3.4). Also, this

Experiments

All experiments were performed on a single Pentium IV 3.0 GHz, 2 Gb RAM workstation, running on Windows XP platform. CASS itself is written in Visual Basic and works together with C#.NET libraries, and the credit application data is stored in Microsoft Access. In this section, the ethically-approved data sample, privacy and confidentiality issues, and the attributes are explained (Section 4.1). The analysis of individual non-identity attributes highlights some general non-compliant behavioural

Conclusion

CASS is a new, low false alarm, rapid credit application fraud detection tool and technique which is complementary to those already in existence. It identifies a large number of individuals and keeps scores of the relations between them, detecting normal/abnormal and small/large similarities. The results presented here indicate substantial cost savings by investigating/rejecting only a few hundred of the most suspicious credit applications out of a few hundred thousand. The investigations,

Acknowledgements

The first author was financially supported by the Australian Research Council under Linkage Grant Number LP0454077 whilst he was in Monash University as a PhD candidate. Ethics approval has been granted by Monash SCERH under Project Number 2005/694ED. The real credit application data has been provided by Veda Advantage. Special thanks go to the developers of yEd and particular participants in the Credit Scoring and Credit Control (CSCC05) conference for useful comments.

References (18)

Baxter, R., Christen, P., Churches, T., 2003. A Comparison of fast blocking methods for record linkage. In: Proceedings...
Baycorp Advantage, 2005. Zero-Interest Credit Cards Cause Record Growth In Card...
M. Bilenko et al.
Adaptive name matching in information integration
IEEE Intelligent Systems
(2003)
Chapman, S., 2005. Simmetrics – Open Source Similarity Measure Library. Accessed from:...
C. Cortes et al.
Computational methods for dynamic graphs
Journal of Computational and Graphical Statistics
(2003)
Cortes, C., Pregibon, D., 1999. Information mining platforms: an infrastructure for KDD rapid deployment. In:...
T. Fawcett et al.
Adaptive fraud detection
Data Mining and Knowledge Discovery
(1997)
ID Analytics, 2004. Identity 2004: The Identity Risk Management...
J. Kleinberg
Authoritative sources in a hyperlinked environment
Journal of the ACM
(1999)

There are more references available in the full text version of this article.

Cited by (22)

Boosting credit risk models
2023, British Accounting Review
In this article, we give various recommendations to boost the performance of credit risk models. It is based upon more than two decades of research and consulting on the topic. Building credit risk models typically entails four steps: gathering and preprocessing data, modelling of probability of default (PD), Loss Given Default (LGD) and Exposure at Default (EAD), evaluating the credit risk models built and then the deployment step to put them into production. We give recommendations to boost credit risk models during each of these steps. Furthermore, we also define and review model risk as an all-encompassing challenge one needs to be properly aware of during each step of the process. We conclude by presenting a research agenda of topics we believe are in high need for further investigation and study.
HOBA: A novel feature engineering methodology for credit card fraud detection with a deep learning architecture
2021, Information Sciences
Citation Excerpt :
Cybersource (2013) reported that online fraud caused a $3.5 billion dollar estimated loss in 2012 [28]. Credit card fraud can be categorized into two types: application fraud [32] and behavior fraud [5]. Application fraud refers to situations in which an application for a credit card is fraudulent.
Credit card transaction fraud costs billions of dollars to card issuers every year. A well-developed fraud detection system with a state-of-the-art fraud detection model is regarded as essential to reducing fraud losses. The main contribution of our work is the development of a fraud detection system that employs a deep learning architecture together with an advanced feature engineering process based on homogeneity-oriented behavior analysis (HOBA). Based on a real-life dataset from one of the largest commercial banks in China, we conduct a comparative study to assess the effectiveness of the proposed framework. The experimental results illustrate that our proposed methodology is an effective and feasible mechanism for credit card fraud detection. From a practical perspective, our proposed method can identify relatively more fraudulent transactions than the benchmark methods under an acceptable false positive rate. The managerial implication of our work is that credit card issuers can apply the proposed methodology to efficiently identify fraudulent transactions to protect customers’ interests and reduce fraud losses and regulatory costs.
An evolutionary approach to fraud management
2020, European Journal of Operational Research
Citation Excerpt :
Operational risk denotes the risk of losses caused by inadequate internal processes, controls, systems and external events (Coleman, 2011). Among the risk types identified as operational risk, fraud represents an important source of financial losses in a variety of different contexts, including for instance fraudulent financial audit reporting (Anastasopoulos & Anastasopoulos, 2012; Cecchini, Aytug, Koehler, & Pathak, 2010; Corona & Randhawa, 2010; Hansen, McDonald, Messier, & Bell, 1996; Hopkins, Maydew, & Venkatachalam, 2015; Ravisankar, Ravi, Rao, & Bose, 2011), fraudulent financial activities (Bernard & Vanduffel, 2014; Lensberg, Eilifsen, & McKee, 2006), credit card fraud (Bahnsen, Stojanovic, Aouada, & Ottersten, 2014; Leonard, 1995; Phua, Gayler, Lee, & Smith-Miles, 2009; Shen, Tong, & Deng, 2007; Van Vlasselaer et al., 2015; Wang, Chen, & Chen, 2019; Wu, Xu, & Li, 2019), fraudulent international traded products (Barabesi, Cerasa, Perrotta, & Cerioli, 2016; Yang, Zhang, & Zhu, 2017; Zhao et al., 2016), public transportation fraud (Dai, Galeotti, & Villeval, 2018; Herrera, Figueroa, & Ramírez, 2018) and social security fraud (Stripling, Baesens, Chizi, & vanden Broucke, 2018; Van Vlasselaer, Eliassi-Rad, Akoglu, Snoeck, & Baesens, 2017). In this paper, we focus our attention on insurance fraud (Dionne, Giuliano, & Picard, 2009), but the results of the analysis hold for more general situations.
Building on several contributions to the analysis of insurance fraud, we propose a dynamical model of the fraudulence game, where three typologies of players interact: the insurance company, the fraudsters and the honest insured (who may be tempted to become dishonest), each one taking decisions on the basis of an adaptive strategy.
It follows from the mathematical analysis that several scenarios and different asymptotic outcomes of the game are possible. In all cases, managerial/actuarial interpretations and implications are provided, suggesting how insurers can adapt proper control policies both to evolving behaviours of policyholders and to different external (economical, geographical, social) contexts.
Fraud detection: A systematic literature review of graph-based anomaly detection approaches
2020, Decision Support Systems
Citation Excerpt :
Studies modeling their input network as an undirected network mainly explore user-to-product or user-to-service relationships and are mostly bipartite networks (14 of the reviewed papers employed bipartite networks). In recent years, dynamic networks have increased in popularity owing to their applications in social networks, insurance, and online banking [54,56,66,68]. The relentless growth of social networks, in particular, has provided opportunities for fraudsters to infiltrate these networks and spread their illusive activities by frequently establishing new connections with other users or changing their relations with existing users [72].
Graph-based anomaly detection (GBAD) approaches are among the most popular techniques used to analyze connectivity patterns in communication networks and identify suspicious behaviors. Given the different GBAD approaches proposed for fraud detection, in this study, we develop a framework to synthesize the existing literature on the application of GBAD methods in fraud detection published between 2007 and 2018. This study aims to investigate the present trends and identify the key challenges that require significant research efforts to increase the credibility of the technique. Additionally, we provide some recommendations to deal with these challenges.
Smart credit card fraud detection system based on dilated convolutional neural network with sampling technique
2023, Multimedia Tools and Applications
An ensemble fraud detection approach for online loans based on application usage patterns
2023, Journal of Intelligent and Fuzzy Systems

View all citing articles on Scopus

View full text

O.R. ApplicationsOn the communal analysis suspicion scoring for identity crime in streaming credit applications

Abstract

Introduction

Section snippets

Related work

Communal analysis suspicion scoring (CASS) process

Experiments

Conclusion

Acknowledgements

Adaptive name matching in information integration

IEEE Intelligent Systems

Computational methods for dynamic graphs

Journal of Computational and Graphical Statistics

Adaptive fraud detection

Data Mining and Knowledge Discovery

Authoritative sources in a hyperlinked environment

Journal of the ACM

O.R. Applications
On the communal analysis suspicion scoring for identity crime in streaming credit applications