Combining market and accounting-based models for credit scoring using a classification scheme based on support vector machines

doi:10.1016/j.amc.2014.02.028

Applied Mathematics and Computation

Volume 234, 15 May 2014, Pages 69-81

https://doi.org/10.1016/j.amc.2014.02.028 Get rights and content

Highlights

•
Combination of option-based model with accounting data for credit risk model.
•
Application of market model to non-listed firms.
•
Use of a novel additive support vector machines model.

Abstract

Credit risk rating is an important issue for both financial institutions and companies, especially in periods of economic recession. There are many different approaches and methods which have been developed over the years. The aim of this paper is to create a credit risk rating model, using a machine learning methodology that combines accounting data with the option-based approach of Black, Scholes, and Merton. The model is built on data for companies listed in the Greek stock exchange, but it is also shown to provide accurate results for non-listed firms as well. Linear and nonlinear support vector machines are used for model building, as well as an innovative additive modeling approach, which enables the construction of comprehensible and accurate credit scoring models.

Introduction

Credit risk refers to the probability that a client will not be able to meet his/her debt obligations (default). Over the years, many factors have contributed to the increasing importance of accurate credit risk measurement. Altman and Saunders [1] list five main issues, which are still valid in the current context: (i) a worldwide structural increase in the number of defaults, (ii) a trend towards disintermediation by the highest quality and largest borrowers, (iii) more competitive margins on loans, (iv) a declining value of real assets (and thus collateral) in many markets, and (v) a dramatic growth of high risk exposures including credit derivatives. Credit risk measurement is nowadays a critical issue as demonstrated by the recent outbreak of the global credit crisis in 2007–2008.

In a credit risk management context, the accurate estimation of the probability of default is a crucial point. Credit rating models (CRMs) are widely used for that purpose. CRMs evaluate the creditworthiness of obligors, estimate the probabilities of default, and classify obligors into risk groups. In a corporate credit granting context, most CRMs combine key financial (accounting) and non-financial data into an aggregate index indicating the credit risk of the firms. Such models can be constructed with a variety of statistical, data mining, and operations research techniques (e.g., logistic regression, neural networks, support vector machines, rule induction algorithms, multicriteria decision making, etc.). Comprehensive reviews of this line of research can be found in [2], [3], [4]. Despite their success and popularity, traditional credit scoring models are mostly static and they are based on historical accounting data, which may fail to represent adequately the future of the firms and the trends in the business environment [1], [5]. This is particularly important in the context of an economic turmoil, where exogenous conditions deteriorate rapidly in a short time period, thus affecting corporate activity and leading to increased credit risk levels throughout the market. Mensah [6] and Hillegeist et al. [7] also discuss issues related to the accounting standards and practices, which affect the quality of the information that financial statements provide.

The shortcomings of accounting-based credit scoring models have led to the consideration of a wide variety of alternative approaches (comprehensive overviews can be found in [1], [8]). Among them, structural models have attracted considerable interest. Structural models use stock exchange data to assess the probability of default [9], [10]. Stock prices reflect all the information related to the current status of the firms as well as the investors’ expectations about their future prospects [5]. Furthermore, market data are constantly updated in accordance with new information that becomes available about the operation of firms and the environment in which they operate. These features of market data and models indicate that they may be better suited for default prediction and credit risk measurement. Actually, several studies provide empirical results in support of market models in the context of credit risk modeling and bankruptcy prediction [5], [7]. Market models have also been shown to contribute in the construction of improved hybrid systems in combination with accounting-based models [11], [12].

Despite their strong theoretical grounds and good predictive power, market models are limited to firms listed in stock exchanges. Therefore, their extension to non-listed firms has attracted some interest over the past decade. Moody’s KMV RiskCalc™ model [13] is a commercial implementation, which has been employed in several countries with positive results [14], [15]. Altman et al. [16] used US data to examine the potential of developing multivariate regression models providing estimates for the probability of default implied by a market model. The authors found that this approach provides similar results to default prediction models, thus concluding that both approaches should be treated as complementary sources of information.

This study extends the results of Altman et al. [16] by investigating the applicability of a market-based credit risk modeling approach in a context where the hypotheses of market efficiency may be invalid [17]. In particular, we test whether a definition of default on the basis of a market model can be employed to build a credit scoring model for non-listed firms and compare the results to a default prediction model fitted on historical default data. The analysis is based on data from Greece over the period 2005–2010 using samples of listed and non-listed firms. The Greek case provides a challenging context due to two main reasons. First, the Greek stock market, after flourishing at the end of the 1990s, it entered a period characterized by increasing volatility, decreasing liquidity, and high market concentration with few large capitalization companies dominating the market. These features became even clearer during the international credit crisis and the subsequent sovereign debt crisis that hit the country, thus putting into serious question the efficiency of the Greek stock market [18]. Second, the crisis had a particularly strong effect on the Greek economy, with a sharp deterioration of the general economic and business conditions, which led to an unprecedented increase in the number of defaults and bankruptcies over a very short period of time. Thus, credit risk management becomes a challenging issue in this context, and the peculiarities of the Greek case cast doubts on whether an approach based on the grounds of a market model could actually provide useful results.

On the methodological side, non-parametric machine learning techniques are employed based on the framework of support vector machines (SVMs). The analysis is performed in two stages. First, a market model is used to assess the probability of default for listed companies and classify them into risk groups under different risk-taking scenarios. Risk assessment and classification models are then developed using linear and nonlinear support vector machines, as well as a recently developed innovative additive SVM model that suits well the requirements of credit rating systems. Logistic regression is also employed for comparative purposes and feature selection. The developed models are applied to a sample of non-listed firms. The comparison against traditional credit scoring models fitted on historical default data shows that the market-based modeling approach provides very competitive results. Among, the machine learning techniques used in the analysis, the additive SVM model provides the best results.

The rest of article is organized as follows. Section 2 presents the market model used in the analysis as well as the SVM formulations used for constructing the credit risk assessment models. Section 3 is devoted to the empirical analysis, including the presentation of the data and the obtained results. Finally, Section 4 concludes the paper, summarizes the main findings of this research, and proposes some future research directions.

Section snippets

The market model

Market-based models for credit risk assessment are founded on the works of Black, Scholes and Merton (henceforth referred to as BSM) [9], [10]. In the BSM framework, a firm is assumed to have a simple debt structure, consisting of a single liability L that is due in time T. From the financial point of view, a firm is assumed to default on its debt, if the market value of its assets (A) at time T is lower than L (i.e., if the firm’s assets are not enough to cover its debt). In this context,

Data and variables

Two data samples are used in the analysis. The first includes 1314 firm-year observations involving (non-financial) firms listed in the Athens Stock Exchange (ASE) over the period 2005–2010. For each year t in that period, the sample includes all firms traded throughout year t in ASE and their daily logarithmic returns over the whole year were used to estimate their PDs at the end of year t. The second sample consists of 10,716 firm-year observations for non-listed Greek firms from the

Conclusion and future perspectives

This study examined the development and implementation of a framework for building corporate credit scoring models based solely on publicly available data. To this end, the BSM model was used to introduce a proxy definition of default, based on market data instead of the traditional approach based on the credit history of the firms. The market model’s estimates of default were linked to models combining publicly available financial data. These models can be easily employed to evaluate any firm

References (35)

E. Altman et al.
Credit risk measurement: developments over the last 20 years
J. Banking Finance
(1997)
L.C. Thomas
A survey of credit and behavioral scoring: forecasting financial risk of lending to consumers
Int. J. Forecasting
(2000)
V. Agarwal et al.
Comparing the performance of market-based and accounting-based bankruptcy prediction models
J. Banking Finance
(2008)
M.-Y.L. Li et al.
A hybrid bankruptcy prediction model with dynamic loadings on accounting-ratio-based and market-based information: a binary quantile regression approach
J. Empir. Finance
(2010)
C.-C. Yeh et al.
A hybrid KMV model, random forests and rough set theory approach for credit rating
Knowledge Based Syst.
(2012)
D. Majumder
Inefficient markets and credit risk modeling: Why Merton’s model failed
J. Policy Model.
(2006)
M.F. Dicle et al.
Greek market efficiency and its international integration
J. Int. Finance Markets Inst. Money
(2011)
D. Martens et al.
Comprehensible credit scoring models using rule extraction from support vector machines
Eur. J. Oper. Res.
(2007)
T. Bellotti et al.
Support vector machines for credit scoring and discovery of significant features
Expert Syst. Appl.
(2009)
S.-C. Huang
Using Gaussian process based kernel classifiers for credit rating forecasting
Expert Syst. Appl.
(2011)

M. Crouhy et al.

Prototype risk rating system

J. Banking Finance

(2001)

T. Fawcett

Introduction to ROC analysis

Pattern Recogn. Lett.

(2006)

A. Blöchlinger et al.

Economic benefit of powerful credit scoring

J. Banking Finance

(2006)

D. Papageorgiou et al.

Credit rating systems: regulatory framework and comparative evaluation of existing methods

H.A. Abdou et al.

Credit scoring, statistical techniques and evaluation criteria: a review of the literature

Intell. Syst. Acc. Finance Manage.

(2011)

Y.M. Mensah

An examination of the stationarity of multivariate bankruptcy prediction models: a methodological study

J. Acc. Res.

(1984)

S. Hillegeist et al.

Assessing the probability of bankruptcy

Rev. Acc. Stud.

(2004)

Cited by (25)

Machine learning models for credit analysis improvements: Predicting low-income families’ default
2019, Applied Soft Computing Journal
Citation Excerpt :
For example, Tsai [31], Chang et al. [32], Feng et al. [6], Jadhav et al. [7], Tian et al. [33], Yu et al. [34], Óskarsdóttir et al. [35] among others, analysed datasets on a variety of topics. More recently, several studies have demonstrated the adoption of machine-learning techniques in credit modelling, highlighting various methodologies to estimate the probability of default, such as SVM [36,37], Decision Tree [38], Random Forest [39], and Bagging and Boosting [40]. Most studies highlight the advantages of using machine-learning systems in credit-risk analysis due to a better classification performance than that of traditional techniques, such as Logistic Regression [5,31,32]).
The main objective of this study is to investigate the behaviour of default prediction models based on credit scoring methods and computational techniques with machine learning algorithms. The predictive capabilities of the models were compared to identify default-prediction mechanisms in the “My Home, My Life” Program (Programa “Minha Casa, Minha Vida” — PMCMV). The PMCMV is one of the largest government initiatives in the world to finance home ownership in the low-income population. Implemented by the Brazilian government, the programme has provided financing in excess of USD 84 billion and by 2016 had already contracted for the construction of over 4.5 million housing units, with 3.3 million units already delivered. The models developed in this study involve different time intervals for default prediction as well as analysis without the use of traditional discriminatory variables (gender, age, and marital status). Three measurements were used to evaluate the quality of the prediction models: area under the ROC curve, the Kolmogorov–Smirnov index, and the Brier score. The results indicated that (1) the accuracy of the models improves as the number of days overdue used to define the default variable increases; (2) the best prediction results were obtained with traditional ensemble techniques — in this case Bagging (BG), Random Forest (RF), and Boosting; and (3) there was a negative impact on all criteria when a smaller number of observations was used, especially on the type II error. It was also found that the discriminatory power of the credit risk rating system is preserved when removing discriminatory variables from the models. Applying the BG algorithm, which is the best prediction method, a default rate of 11.80% could be reduced to 2.95%, which leads to a selection that would result in 197,905 fewer delinquent contracts in the PMCMV, thus representing a savings of approximately USD 3.0 billion in credit losses.
Supply chain finance: From traditional to supply chain credit rating
2019, Journal of Purchasing and Supply Management
Traditional credit rating models, adopted by financial institutions to assess the credit risk of a company, adopt a purely financial perspective, and often fail to properly assess small and medium enterprises. On the other hand, buyers usually assess suppliers by means of comprehensive vendor ratings, considering a broad range of operational performance. This paper investigates whether financial and vendor ratings can be integrated into a supply chain credit rating model that jointly considers financial indicators of the supplier and its operational evaluation provided by buyers; the paper also investigates the benefits and the challenges of such a model for all the stakeholders involved (buyers, suppliers, financial institutions, and technology providers), adopting the lenses of the stakeholder theory. We adopted both multiple case studies and an iterative focus group, involving representatives from suppliers, buyers, financial institutions, and technology providers. The results confirm the potential value of such an integrated rating, mainly for strategic suppliers, showing the expected benefits for all stakeholders and highlighting the potential challenges to face.
A new decision-making approach for multiple criteria sorting with an imbalanced set of assignment examples
2018, European Journal of Operational Research
Citation Excerpt :
One noteworthy difficulty is the imbalanced distribution of alternatives among considered categories, which exists in a wide range of real-world applications. For example, in credit risk assessment, firms are classified into two classes by a bank loan officer: default and non-default, and the number of default firms is significantly less than that of non-default firms (Angilella & Mazzù, 2015; Marinakis, Marinaki, Doumpos, Matsatsinis, & Zopounidis, 2008; Niklis, Doumpos, & Zopounidis, 2014); in ABC inventory classification, inventory items are assigned to three classes according to specific criteria, items of high value but small in number are termed as class A, items of low value but large in number are termed as class C, and items that fall between these two classes are termed as class B (Liu et al., 2016); in engineering management, activities carried out by a project team are assigned into classes of managerial practices, which include different control mechanisms for a project manager, and the class of activities which require most attention are usually small in number while the class of non-critical activities are large in quantity (de Miranda Mota & de Almeida, 2012). It is challenging to develop a sorting model from an imbalanced set of assignment examples.
We propose a novel approach to address a multiple criteria sorting (MCS) problem with an imbalanced set of assignment examples. The approach employs a piecewise-linear additive value function as the preference model and adopts the disaggregation–aggregation paradigm to infer a sorting model from provided assignment examples on a set of reference alternatives. We utilize a hierarchical clustering algorithm and several linear programming models to identify reference alternatives that are active to develop the sorting model, so that inactive ones are eliminated from the whole set of reference alternatives. Then, in order to construct a balanced set of assignment examples, a balancing algorithm is proposed to balance active reference alternatives across categories. Finally, the sorting model is obtained by minimizing the sum of violations between values of active reference alternatives and corresponding category thresholds. Furthermore, the performance of the proposed approach is investigated on a hypothetical problem and several real data sets. The experimental results show that our approach is efficient to address the MCS problem with an imbalanced set of assignment examples.
Selection of Support Vector Machines based classifiers for credit risk domain
2015, Expert Systems with Applications
Citation Excerpt :
The amount of it is not large, which may be influenced by the limitations of availability of the necessary financial/bankruptcy data (although the number of open financial datasources seems to be rising). ( Harris, 2015) used a dataset of over 20,000 entries from Barbados credit unions for model development to develop SVM linear and nonlinear classifier together with clustered SVM; the results indicated that performance of linear SVM did not significantly differ from SVM using RBF kernel; similar conclusion can be drawn from the results in (Niklis, Doumpos, & Zopounidis, 2014). Other recent research (Zhang, Gao, & Shi, 2014) used a USA credit dataset of over 6000 instances and also reported results which indicate that application of nonlinear SVM kernel for generic SVM, fuzzy SVM and hybrid fuzzy SVM does not show significant increase in classification accuracy, compared to linear SVM (resulting in accuracy of ∼75%).
This paper describes an approach for credit risk evaluation based on linear Support Vector Machines classifiers, combined with external evaluation and sliding window testing, with focus on application on larger datasets. It presents a technique for optimal linear SVM classifier selection based on particle swarm optimization technique, providing significant amount of focus on imbalanced learning issue. It is compared to other classifiers in terms of accuracy and identification of each class. Experimental classification performance results, obtained using real world financial dataset from SEC EDGAR database, lead to conclusion that proposed technique is capable to produce results, comparable to other classifiers, such as logistic regression and RBF network, and thus be can be an appealing option for future development of real credit risk evaluation models.
Enterprise credit risk portrait and evaluation from the perspective of the supply chain
2024, International Transactions in Operational Research
Bankruptcy Prediction for Sustainability of Businesses: The Application of Graph Theoretical Modeling
2023, Mathematics

View all citing articles on Scopus

View full text

Combining market and accounting-based models for credit scoring using a classification scheme based on support vector machines

Highlights

Abstract

Introduction

Section snippets

The market model

Data and variables

Conclusion and future perspectives

J. Banking Finance

Int. J. Forecasting

J. Banking Finance

J. Empir. Finance

Knowledge Based Syst.

J. Policy Model.

J. Int. Finance Markets Inst. Money

Eur. J. Oper. Res.

Expert Syst. Appl.

Expert Syst. Appl.

J. Banking Finance

Pattern Recogn. Lett.

J. Banking Finance

Credit rating systems: regulatory framework and comparative evaluation of existing methods

Credit scoring, statistical techniques and evaluation criteria: a review of the literature

Intell. Syst. Acc. Finance Manage.

An examination of the stationarity of multivariate bankruptcy prediction models: a methodological study

J. Acc. Res.

Assessing the probability of bankruptcy

Rev. Acc. Stud.