An artificial intelligence system for predicting customer default in e-commerce
Introduction
E-commerce vendors in Germany have to deal with a peculiarity: commonly used payment types like credit cards and PayPal represent relatively low market shares, and the majority of orders are processed using open invoice instead. Using open invoice, a vendor bills customers for goods and services only after delivery of the product. Thus, the vendor grants customers a credit to the extent of the invoice. Usually, the vendor sends customers an invoice statement as soon as the products are delivered or provided. The invoice contains a detailed statement of the transaction. Because the customer receives a purchase before payment, it is called open, and the invoice is closed once the payment is received. Around 28% of customers in Germany choose open invoice as their payment type (Frigge, 2016), and around 68% of customers name open invoice as one of their favorite payment types (Fittkau & Maa Consulting, Wach). However, open invoice is prone to payment disruptions. Among the most common reasons, vendors find that customers simply forget to settle the bill or delay the payment on purpose. However, around 53% of vendors state that insolvency is one of the most common reasons for payment disruption (Weinfurner, Weisheit, Wittmann, Stahl, & Pur, 2011). The majority of the cases that conclude in default on payment in Germany are nowadays orders with open invoices, with more than 8% of all orders defaulting (Seidenschwarz, Weinfurtner, Stahl, & Wittmann, 2014). E-commerce vendors find themselves in a conflict: offering open invoice incentivizes many clients to confirm their purchases but, at the same time, increases the risk of default on payment rate. The former aspect has a positive effect on revenue, while the latter drives it down. Additionally, default on payment has a negative impact on the profit margin, due to costs arising through the provision of services and advance payments to third parties. In order to break through this vicious circle, vendors can fall back on a plethora of methods. Many tackle this conflict by implementing exclusion rules for customer groups they consider especially default-prone (for instance, customers who are unknown to the vendor or whose order values are conspicuously high). Another approach, used by more than 30% of e-commerce vendors in Germany, is to fall back on external risk-management services (Weinfurner et al., 2011). Risk management applications are aimed at detecting customers with a high risk of defaulting. Those applications are frequently built using credit scoring (CS) models. CS analyzes historical data to isolate meaningful characteristics that are used to predict the probability of default (Mester, 1997). However, the probability of default is not an attribute of potential customers but merely a vendor’s assessment of whether the potential customer is a risk worth taking. Over the years, CS has evolved from a subjective vendor’s “gut” decision to a method based on statistically sound models (Thomas, Edelman, & Crook, 2002). Among the providers of risk management services in Germany is the risk management division of Arvato Financial Solutions (AFS), which provides a number of services, including identification of individuals, evaluation of credit-worthiness, and fraud recognition. The AFS databases consist of 21 million solvency observations totaling information from 7 million individuals in Germany, addresses and change in address information, and bank account information as well as phone numbers, email addresses, and device information. AFS’s risk management service for e-commerce is called Risk Solution Services (RSS). RSS covers the entire order process and provides a number of services for every stage of the order process. The main service for evaluating customers’ default probability is called risk check and is split into a pre-risk check and a main risk check. The main risk check is based on a credit agency score that uses country-specific solvency information on individuals. Hence, the main risk check is inoperable in countries without accessible solvency information. Contrarily, the pre-risk check was designed to always be operable and to ensure that the risk check returns an evaluation of the customers’ default probability. For this purpose, the pre-risk check uses data transmitted by the customer during the order process. However, the pre-risk check in several industrial realities is nowadays based on a generic model, sometimes even without statistically sound backup (Lessmann, Baesens, Seow, & Thomas, 2015).
The objective of this work is to use genetic programming (GP) to build a CS model to replace the existing RSS pre-risk check. This is done in continuity with a precise recent research track, aimed at using technology to improve risk management (Lessmann et al., 2015). Inspired by Darwin’s theory of evolution, GP (Koza, 1992a) is a computational intelligence (CI) method that employs evolutionary mechanisms such as inheritance, selection, crossover, and mutation to gradually evolve new solutions to a problem. In a CS environment, GP initializes a population of discriminant functions to classify customers into bad and good ones (hereafter called bads and goods for simplicity). This population is subsequently evolved to find the best possible discriminant function. The motivation for using a CI method to tackle the problem comes from Marques, Garcia, and Sanchez (2013), who discuss five major characteristics of CI systems that are especially appealing in CS: learning, adaption, flexibility, transparency, and discovery. Learning describes the ability to learn decisions and tasks from historical data. Adaption represents the capability to adapt to a changing environment, i.e., without being restricted to specific situations or economic conditions. The flexibility of CI systems allows for utilization even with incomplete or unreliable datasets. Furthermore, Marques and colleagues state that CI systems may be transparent, in the sense that resulting decisions may be visible and thus at least partially explainable in some cases. Lastly, discovery represents the ability to find previously unknown relationships. Inside the wide field of CI, our focus on GP follows the same motivations as in Ong, Huang, and Tzeng (2005), where it is argued that GP has a number of attractive characteristics for its application in CS. First, it is a non-parametric tool and is not restricted to specific situations or datasets, but can be used in a vast context. Second, it automatically determines the most fitting discriminant function. Last but not least, GP can automatically select the most important variables during the learning phase. Indeed, research has already shown the benefits of GP and its utility in CS (see Section 3 for a detailed discussion of the state of the art). However, CS is usually employed with data from the financial sector, while other sectors have rarely been considered so far. In this work - for the first time, to the best of our knowledge - we extend current research in CS by employing GP on a dataset that contains orders from e-commerce vendors.
This work is organized as follows. Section 2 contains a general introduction to the theoretical framework of CS. In Section 3, previous and related work is analyzed and discussed. Section 4 presents the RSS and the services it provides for every stage of the order process. In Section 5, we describe the dataset used in this work and provided by AFS. Section 6 presents the organization of our experimental study and a discussion of our experimental settings. In Section 7, we present and discuss the obtained experimental results. Finally, Section 8 concludes the paper and proposes ideas for future research. The paper is terminated by Appendix A, in which we briefly introduce GP for readers who are not familiar with this computational method and also suggest bibliographic material to deepen the readers’ understanding of the subject.
Section snippets
Theoretical framework
CS is widely used by financial institutions to determine applicants’ default probability and subsequently classify them into good applicants (the “goods”, for simplicity) or bad applicants (the “bads”) (Thomas et al., 2002). Consequentially, applicants may be rejected or accepted as customers based on that classification. Thus, CS represents a binary classification problem (Henley, 1995). The binary response variable represents a default in payment by the customer, or potential default in
Literature review
CS is currently a widely studied research field, and several important contributions have appeared. For a detailed survey of classification algorithms for CS, the reader is referred to Lessmann et al. (2015). While an attempt to exhaustively cover all existing contributions here is purely utopic, given the limited available space, we organize this section in the following way: in the first part, we present the history and evolution of the field, while in the last part, we focus on the most
Risk solution service
Risk Solution Service (RSS) is a risk management service that aims to cover the whole order process of e-commerce retailers’ customers. Its objectives are threefold. First, increase conversion rate and customer retention in the e-shop by improving differentiation and managing of payment methods. Second, enhance cost control by providing innovative pricing models and configurable standard solutions in different service levels. Third, improve discriminatory power by combining current and
Dataset
The dataset used in this work consists of order requests processed by RSS between 10-01-2014 and 12-31-2015, and it is provided by the AFS company. It contains 56,669 order requests, among which 15,535 ( ≈ 27%) are labeled as “bad”, while the remaining 41,134 are labeled as “good”. These order requests are subject to a stratified random split into a training set with 31,669 ( ≈ 56%) observations, a test set with 10,000 ( ≈ 18%) observations and a validation set with 15,000 ( ≈ 26%)
Experimental organization and settings
When GP is employed to solve complex problems, like the one tackled in this paper, the use of an appropriate fitness function is often a crucial step. In this work, after considering several other possible measures, we have decided to use the area under the receiver operating characteristic (ROC) curve (ROC-AUC). ROC-AUC is the single-scalar representation of the ROC curve (Abdou & Pointon, 2011). The ROC curve is used when a classifier returns a numeric value that has to be interpreted as a
Experimental results
The presentation of the experimental results is organized as follows: in Section 7.1, we present the results obtained by GP in the CS problem described so far, and we dedicate particular attention to a discussion and an interpretation of the best model evolved by GP. In Section 7.2, we discuss the results we have obtained in the calibration phase. In Section 7.3, we compare GP and other machine learning methods. Finally, in Section 7.4, we discuss the results we obtained when GP was first
Conclusions and future work
The objective of this work was to develop a credit scoring (CS) model to replace the pre-risk check of the e-commerce risk management system Risk Solution Services (RSS), which is currently one of the most used systems to estimate customers’ default probabilities. The pre-risk check uses data from the order process and includes exclusion rules and a generic CS model. The new model was supposed to work as a replacement for the whole pre-score and had to be able to work in isolation and in
References (75)
Genetic programming for credit scoring: The case of Egyptian public sector banks
Expert Systems with Applications
(2009)- et al.
An empirical distribution function for sampling with incomplete information
Source: The Annals of Mathematical Statistics
(1955) ROC graphs : Notes and practical considerations for data mining researchers
HP Invent
(2003)- Frigge, D. (2016). Online-payment...
Making large-scale SVM learning practical
Technical Report
(1998)- et al.
The royal london space planning: An integration of space analysis and treatment planning
American Journal of Orthodontics and Dentofacial Orthopedics
(2000) - Kruppa, J., Schwarz, A., Arminger, G., & Ziegler, A. (2013). Consumer credit risk: Individual probability estimates...
- et al.
A note on platts probabilistic outputs for support vector machines
Machine learning
(2007) - et al.
Differentiating between good credits and bad credits using neuro-fuzzy systems
European Journal of Operational Research
(2002) What’s the point of credit scoring ?
Business Review
(1997)
A framework for data transformation in credit behavioral scoring applications based on model driven development
Expert Systems with Applications
Soft margins for adaboost
Machine learning
A note on the comparison of logit and discriminant models of consumer credit behavior
The Journal of Financial and Quantitative Analysis
Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers
Icml
Credit scoring, statistical techniques and evaluation criteria: A review of the literature
Intelligent Systems in Accounting, Finance & Management
Improving experimental studies about ensembles of classifiers for bankruptcy prediction and credit scoring
Expert Systems with Applications
Survival mixture models in behavioral scoring
Expert Systems with Applications
Benchmarking state of the art classification algorithms for credit scoring
Journal of the Operational Research Society
Verification of forecasts expressed in terms of probability
Monthly Weather Review
A c++ framework for geometric semantic genetic programming
Genetic Programming and Evolvable Machines
Machine-learning algorithms for credit-card applications
IMA Journal of Management Mathematics
The comparison and evaluation of forecasters
The Statistician
A comparison of neural networks and linear scoring models in the credit union environment
European Journal of Operational Research
Introduction to evolutionary computing
An empirical comparison of classification algorithms for mortgage default prediction: Evidence from a distressed mortgage market
European Journal of Operational Research
DEAP: Evolutionary algorithms made easy
Journal of Machine Learning Research
Introducing recursive partitioning for financial classification: The case of financial distress
The Journal of Finance
An insight into the experimental design for credit risk and corporate bankruptcy prediction systems
Journal of Intelligent Information Systems
Ensemble methods in data mining: Improving accuracy through combining predictions
Synthesis Lectures on Data Mining and Knowledge Discovery
Genetic algorithms in search, optimization and machine learning
Statistical classification methods in consumer credit scoring: A review
Journal of the Royal Statistical Society: Series A (Statistics in Society)
Multi-class adaboost
Statistics and its Interface
Cited by (53)
Credit scoring methods: Latest trends and points to consider
2022, Journal of Finance and Data ScienceCitation Excerpt :mean/mode imputation – for continuous/discrete variables,30,68,69 incorporation of missing values into a separate category – for discrete and categorical variables,30,63,70 weight of evidence (WOE) transformation,71
Assessing credit risk of commercial customers using hybrid machine learning algorithms
2022, Expert Systems with ApplicationsCitation Excerpt :The data set used in this study contains real-world financial, classification and transactional data as well as labeled information (i.e., current and past credit scores) of commercial customers over a period of three years. A review of the literature that focuses on the prediction and classification of credit scores shows that many studies investigated retail customers’ credit (Banasik et al., 1996; Bao et al., 2019; Bijak & Thomas, 2012; Chandler & Ewert, 1976; Finlay, 2011; Kozodoi et al., 2019; Kvamme et al., 2018; Lim & Sohn, 2007; Liu et al., 2019; Soui et al., 2019; Zhang et al., 2019), while others looked at commercial customers’ credit (Barboza et al., 2017; Ben-David & Frank, 2009; Bequé & Lessmann, 2017; Liang et al., 2016; Mai et al., 2019; Vanneschi, Horn, Castelli, & Popovic, 2018). Research that explored credit scoring for retail customers usually relied on data sets from the UCI Machine Learning Repository (Dua & Graff, 2019), even when the purpose was only for validating research results (Bao et al., 2019; Bequé & Lessmann, 2017; Soui et al., 2019; Zhang et al., 2019), with few studies having used private data sets from specific markets (Bao et al., 2019; Kvamme et al., 2018; Liu et al., 2019).
Making personnel selection smarter through word embeddings: A graph-based approach[Formula presented]
2022, Machine Learning with ApplicationsArtificial intelligence in healthcare services: past, present and future research directions
2024, Review of Managerial ScienceArtificial Intelligence for Impact Assessment of Administrative Burdens
2024, Emerging Science JournalArtificial Intelligence Tools for Reshaping E-Business and Trade
2024, Handbook of Artificial Intelligence Applications for Industrial Sustainability: Concepts and Practical Examples