O.R. Applications
On the communal analysis suspicion scoring for identity crime in streaming credit applications

https://doi.org/10.1016/j.ejor.2008.02.015Get rights and content

Abstract

This paper describes a rapid technique: communal analysis suspicion scoring (CASS), for generating numeric suspicion scores on streaming credit applications based on implicit links to each other, over both time and space. CASS includes pair-wise communal scoring of identifier attributes for applications, definition of categories of suspiciousness for application-pairs, the incorporation of temporal and spatial weights, and smoothed k-wise scoring of multiple linked application-pairs. Results on mining several hundred thousand real credit applications demonstrate that CASS reduces false alarm rates while maintaining reasonable hit rates. CASS is scalable for this large data sample, and can rapidly detect early symptoms of identity crime. In addition, new insights have been observed from the relationships between applications.

Introduction

Annually, credit bureaus collect millions of enquiries from financial institutions (subscribers) relating to credit applications. In Australia, credit card and personal loan applications have increased significantly, and currently, close to half a million credit bureau enquiries are made per month (Baycorp, 2005). Each credit application contains a large number of identity attributes such as personal names, address(es), telephone number(s), driver licence number (or social security number), date-of-birth, and other personal identifiers which are potentially available to the credit bureau (if local privacy laws permit it). Therefore, from a commercial perspective, it is uneconomical to physically validate and approve each attribute in every credit application.

Application fraud, a manifestation of identity crime, is present when application form(s) contain plausible and synthetic (identity fraud), and/or real, but stolen identity information (identity theft). In developed countries, the monetary cost of application fraud and identity crime is often estimated to be in the billions of dollars; and this is strongly correlated with the large volume of widely available personal information. By performing better once-off assessments in the first stage of the credit life cycle, credit scoring processes could be improved and some transactional fraud can be prevented.

Typical commercial techniques for the identification of such fraud involve the use of attribute verification rules using reference tables, and pair-wise matching rules between credit application and credit history data. However, the success rate of rule-based approaches can be weak when faced with increasingly common fraudster-tampered applications (Oscherwitz, 2005) which have valid attributes and no credit history. Other techniques being used include known fraud matching using blacklists (list of applications previously submitted by fraudsters) and supervised modelling/classification using labelled data. Often, these labelled data approaches alone are operationally inefficient and ineffective (Phua et al., 2005). Our work focuses on credit application data only, with no checks carried out against credit history.

As it is simulated to run in real-time, communal analysis suspicion scoring (CASS) does not take class labels into account when scoring applications. It only uses class labels to determine the effectiveness of its approach. Its purpose is to generate numeric suspicion scores on streaming credit applications based on implicit links to each other, over both time and space. CASS includes pair-wise communal scoring of identifier attributes for applications, definition of categories of suspiciousness for application-pairs, the incorporation of temporal and spatial weights, and smoothed k-wise scoring of multiple linked application-pairs.

In Section 2, this paper explores and compares semi-related literature and contrasts the research findings with this paper’s contributions. Section 3 describes a rapid technique known as CASS for scoring and generating links on incoming current/new application streams on demand and focuses at the level of each pair of linked applications (application-pair). Section 4, firstly, lays out the retrospective and stream processing experimental results on ethically-approved data and subsequently visualises and discusses the unique patterns of identity crime in credit applications. Section 5 concludes the paper.

Section snippets

Related work

There is no academic research, to the best of our knowledge (Phua et al., 2005) into the scoring of dynamic credit applications which accounts for its sparse-identifiers, communal, temporal, and spatial aspects. However, there are other related and established application fields. Section 2.1 summarises multi-attribute pair-wise matching (for example, record linkage/de-duplication detection and an interesting effort to discover cheating amongst teachers in exams by altering their students’

Communal analysis suspicion scoring (CASS) process

In this section, CASS compresses multiple identifier attributes to a single attribute vector representation of each link/non-link (Section 3.2). The approach distinguishes between three different categories of links: black, white, and anomalous, which will result in different weights and scores for every application-pair (Section 3.3). It accounts for the temporal and spatial effects by applying weights to each linked application-pair’s communal score computation (Section 3.4). Also, this

Experiments

All experiments were performed on a single Pentium IV 3.0 GHz, 2 Gb RAM workstation, running on Windows XP platform. CASS itself is written in Visual Basic and works together with C#.NET libraries, and the credit application data is stored in Microsoft Access. In this section, the ethically-approved data sample, privacy and confidentiality issues, and the attributes are explained (Section 4.1). The analysis of individual non-identity attributes highlights some general non-compliant behavioural

Conclusion

CASS is a new, low false alarm, rapid credit application fraud detection tool and technique which is complementary to those already in existence. It identifies a large number of individuals and keeps scores of the relations between them, detecting normal/abnormal and small/large similarities. The results presented here indicate substantial cost savings by investigating/rejecting only a few hundred of the most suspicious credit applications out of a few hundred thousand. The investigations,

Acknowledgements

The first author was financially supported by the Australian Research Council under Linkage Grant Number LP0454077 whilst he was in Monash University as a PhD candidate. Ethics approval has been granted by Monash SCERH under Project Number 2005/694ED. The real credit application data has been provided by Veda Advantage. Special thanks go to the developers of yEd and particular participants in the Credit Scoring and Credit Control (CSCC05) conference for useful comments.

References (18)

  • Baxter, R., Christen, P., Churches, T., 2003. A Comparison of fast blocking methods for record linkage. In: Proceedings...
  • Baycorp Advantage, 2005. Zero-Interest Credit Cards Cause Record Growth In Card...
  • M. Bilenko et al.

    Adaptive name matching in information integration

    IEEE Intelligent Systems

    (2003)
  • Chapman, S., 2005. Simmetrics – Open Source Similarity Measure Library. Accessed from:...
  • C. Cortes et al.

    Computational methods for dynamic graphs

    Journal of Computational and Graphical Statistics

    (2003)
  • Cortes, C., Pregibon, D., 1999. Information mining platforms: an infrastructure for KDD rapid deployment. In:...
  • T. Fawcett et al.

    Adaptive fraud detection

    Data Mining and Knowledge Discovery

    (1997)
  • ID Analytics, 2004. Identity 2004: The Identity Risk Management...
  • J. Kleinberg

    Authoritative sources in a hyperlinked environment

    Journal of the ACM

    (1999)
There are more references available in the full text version of this article.

Cited by (22)

  • Boosting credit risk models

    2023, British Accounting Review
  • HOBA: A novel feature engineering methodology for credit card fraud detection with a deep learning architecture

    2021, Information Sciences
    Citation Excerpt :

    Cybersource (2013) reported that online fraud caused a $3.5 billion dollar estimated loss in 2012 [28]. Credit card fraud can be categorized into two types: application fraud [32] and behavior fraud [5]. Application fraud refers to situations in which an application for a credit card is fraudulent.

  • An evolutionary approach to fraud management

    2020, European Journal of Operational Research
    Citation Excerpt :

    Operational risk denotes the risk of losses caused by inadequate internal processes, controls, systems and external events (Coleman, 2011). Among the risk types identified as operational risk, fraud represents an important source of financial losses in a variety of different contexts, including for instance fraudulent financial audit reporting (Anastasopoulos & Anastasopoulos, 2012; Cecchini, Aytug, Koehler, & Pathak, 2010; Corona & Randhawa, 2010; Hansen, McDonald, Messier, & Bell, 1996; Hopkins, Maydew, & Venkatachalam, 2015; Ravisankar, Ravi, Rao, & Bose, 2011), fraudulent financial activities (Bernard & Vanduffel, 2014; Lensberg, Eilifsen, & McKee, 2006), credit card fraud (Bahnsen, Stojanovic, Aouada, & Ottersten, 2014; Leonard, 1995; Phua, Gayler, Lee, & Smith-Miles, 2009; Shen, Tong, & Deng, 2007; Van Vlasselaer et al., 2015; Wang, Chen, & Chen, 2019; Wu, Xu, & Li, 2019), fraudulent international traded products (Barabesi, Cerasa, Perrotta, & Cerioli, 2016; Yang, Zhang, & Zhu, 2017; Zhao et al., 2016), public transportation fraud (Dai, Galeotti, & Villeval, 2018; Herrera, Figueroa, & Ramírez, 2018) and social security fraud (Stripling, Baesens, Chizi, & vanden Broucke, 2018; Van Vlasselaer, Eliassi-Rad, Akoglu, Snoeck, & Baesens, 2017). In this paper, we focus our attention on insurance fraud (Dionne, Giuliano, & Picard, 2009), but the results of the analysis hold for more general situations.

  • Fraud detection: A systematic literature review of graph-based anomaly detection approaches

    2020, Decision Support Systems
    Citation Excerpt :

    Studies modeling their input network as an undirected network mainly explore user-to-product or user-to-service relationships and are mostly bipartite networks (14 of the reviewed papers employed bipartite networks). In recent years, dynamic networks have increased in popularity owing to their applications in social networks, insurance, and online banking [54,56,66,68]. The relentless growth of social networks, in particular, has provided opportunities for fraudsters to infiltrate these networks and spread their illusive activities by frequently establishing new connections with other users or changing their relations with existing users [72].

View all citing articles on Scopus
View full text