Combining market and accounting-based models for credit scoring using a classification scheme based on support vector machines

https://doi.org/10.1016/j.amc.2014.02.028Get rights and content

Highlights

  • Combination of option-based model with accounting data for credit risk model.

  • Application of market model to non-listed firms.

  • Use of a novel additive support vector machines model.

Abstract

Credit risk rating is an important issue for both financial institutions and companies, especially in periods of economic recession. There are many different approaches and methods which have been developed over the years. The aim of this paper is to create a credit risk rating model, using a machine learning methodology that combines accounting data with the option-based approach of Black, Scholes, and Merton. The model is built on data for companies listed in the Greek stock exchange, but it is also shown to provide accurate results for non-listed firms as well. Linear and nonlinear support vector machines are used for model building, as well as an innovative additive modeling approach, which enables the construction of comprehensible and accurate credit scoring models.

Introduction

Credit risk refers to the probability that a client will not be able to meet his/her debt obligations (default). Over the years, many factors have contributed to the increasing importance of accurate credit risk measurement. Altman and Saunders [1] list five main issues, which are still valid in the current context: (i) a worldwide structural increase in the number of defaults, (ii) a trend towards disintermediation by the highest quality and largest borrowers, (iii) more competitive margins on loans, (iv) a declining value of real assets (and thus collateral) in many markets, and (v) a dramatic growth of high risk exposures including credit derivatives. Credit risk measurement is nowadays a critical issue as demonstrated by the recent outbreak of the global credit crisis in 2007–2008.

In a credit risk management context, the accurate estimation of the probability of default is a crucial point. Credit rating models (CRMs) are widely used for that purpose. CRMs evaluate the creditworthiness of obligors, estimate the probabilities of default, and classify obligors into risk groups. In a corporate credit granting context, most CRMs combine key financial (accounting) and non-financial data into an aggregate index indicating the credit risk of the firms. Such models can be constructed with a variety of statistical, data mining, and operations research techniques (e.g., logistic regression, neural networks, support vector machines, rule induction algorithms, multicriteria decision making, etc.). Comprehensive reviews of this line of research can be found in [2], [3], [4]. Despite their success and popularity, traditional credit scoring models are mostly static and they are based on historical accounting data, which may fail to represent adequately the future of the firms and the trends in the business environment [1], [5]. This is particularly important in the context of an economic turmoil, where exogenous conditions deteriorate rapidly in a short time period, thus affecting corporate activity and leading to increased credit risk levels throughout the market. Mensah [6] and Hillegeist et al. [7] also discuss issues related to the accounting standards and practices, which affect the quality of the information that financial statements provide.

The shortcomings of accounting-based credit scoring models have led to the consideration of a wide variety of alternative approaches (comprehensive overviews can be found in [1], [8]). Among them, structural models have attracted considerable interest. Structural models use stock exchange data to assess the probability of default [9], [10]. Stock prices reflect all the information related to the current status of the firms as well as the investors’ expectations about their future prospects [5]. Furthermore, market data are constantly updated in accordance with new information that becomes available about the operation of firms and the environment in which they operate. These features of market data and models indicate that they may be better suited for default prediction and credit risk measurement. Actually, several studies provide empirical results in support of market models in the context of credit risk modeling and bankruptcy prediction [5], [7]. Market models have also been shown to contribute in the construction of improved hybrid systems in combination with accounting-based models [11], [12].

Despite their strong theoretical grounds and good predictive power, market models are limited to firms listed in stock exchanges. Therefore, their extension to non-listed firms has attracted some interest over the past decade. Moody’s KMV RiskCalc™ model [13] is a commercial implementation, which has been employed in several countries with positive results [14], [15]. Altman et al. [16] used US data to examine the potential of developing multivariate regression models providing estimates for the probability of default implied by a market model. The authors found that this approach provides similar results to default prediction models, thus concluding that both approaches should be treated as complementary sources of information.

This study extends the results of Altman et al. [16] by investigating the applicability of a market-based credit risk modeling approach in a context where the hypotheses of market efficiency may be invalid [17]. In particular, we test whether a definition of default on the basis of a market model can be employed to build a credit scoring model for non-listed firms and compare the results to a default prediction model fitted on historical default data. The analysis is based on data from Greece over the period 2005–2010 using samples of listed and non-listed firms. The Greek case provides a challenging context due to two main reasons. First, the Greek stock market, after flourishing at the end of the 1990s, it entered a period characterized by increasing volatility, decreasing liquidity, and high market concentration with few large capitalization companies dominating the market. These features became even clearer during the international credit crisis and the subsequent sovereign debt crisis that hit the country, thus putting into serious question the efficiency of the Greek stock market [18]. Second, the crisis had a particularly strong effect on the Greek economy, with a sharp deterioration of the general economic and business conditions, which led to an unprecedented increase in the number of defaults and bankruptcies over a very short period of time. Thus, credit risk management becomes a challenging issue in this context, and the peculiarities of the Greek case cast doubts on whether an approach based on the grounds of a market model could actually provide useful results.

On the methodological side, non-parametric machine learning techniques are employed based on the framework of support vector machines (SVMs). The analysis is performed in two stages. First, a market model is used to assess the probability of default for listed companies and classify them into risk groups under different risk-taking scenarios. Risk assessment and classification models are then developed using linear and nonlinear support vector machines, as well as a recently developed innovative additive SVM model that suits well the requirements of credit rating systems. Logistic regression is also employed for comparative purposes and feature selection. The developed models are applied to a sample of non-listed firms. The comparison against traditional credit scoring models fitted on historical default data shows that the market-based modeling approach provides very competitive results. Among, the machine learning techniques used in the analysis, the additive SVM model provides the best results.

The rest of article is organized as follows. Section 2 presents the market model used in the analysis as well as the SVM formulations used for constructing the credit risk assessment models. Section 3 is devoted to the empirical analysis, including the presentation of the data and the obtained results. Finally, Section 4 concludes the paper, summarizes the main findings of this research, and proposes some future research directions.

Section snippets

The market model

Market-based models for credit risk assessment are founded on the works of Black, Scholes and Merton (henceforth referred to as BSM) [9], [10]. In the BSM framework, a firm is assumed to have a simple debt structure, consisting of a single liability L that is due in time T. From the financial point of view, a firm is assumed to default on its debt, if the market value of its assets (A) at time T is lower than L (i.e., if the firm’s assets are not enough to cover its debt). In this context,

Data and variables

Two data samples are used in the analysis. The first includes 1314 firm-year observations involving (non-financial) firms listed in the Athens Stock Exchange (ASE) over the period 2005–2010. For each year t in that period, the sample includes all firms traded throughout year t in ASE and their daily logarithmic returns over the whole year were used to estimate their PDs at the end of year t. The second sample consists of 10,716 firm-year observations for non-listed Greek firms from the

Conclusion and future perspectives

This study examined the development and implementation of a framework for building corporate credit scoring models based solely on publicly available data. To this end, the BSM model was used to introduce a proxy definition of default, based on market data instead of the traditional approach based on the credit history of the firms. The market model’s estimates of default were linked to models combining publicly available financial data. These models can be easily employed to evaluate any firm

References (35)

  • M. Crouhy et al.

    Prototype risk rating system

    J. Banking Finance

    (2001)
  • T. Fawcett

    Introduction to ROC analysis

    Pattern Recogn. Lett.

    (2006)
  • A. Blöchlinger et al.

    Economic benefit of powerful credit scoring

    J. Banking Finance

    (2006)
  • D. Papageorgiou et al.

    Credit rating systems: regulatory framework and comparative evaluation of existing methods

  • H.A. Abdou et al.

    Credit scoring, statistical techniques and evaluation criteria: a review of the literature

    Intell. Syst. Acc. Finance Manage.

    (2011)
  • Y.M. Mensah

    An examination of the stationarity of multivariate bankruptcy prediction models: a methodological study

    J. Acc. Res.

    (1984)
  • S. Hillegeist et al.

    Assessing the probability of bankruptcy

    Rev. Acc. Stud.

    (2004)
  • Cited by (25)

    • Machine learning models for credit analysis improvements: Predicting low-income families’ default

      2019, Applied Soft Computing Journal
      Citation Excerpt :

      For example, Tsai [31], Chang et al. [32], Feng et al. [6], Jadhav et al. [7], Tian et al. [33], Yu et al. [34], Óskarsdóttir et al. [35] among others, analysed datasets on a variety of topics. More recently, several studies have demonstrated the adoption of machine-learning techniques in credit modelling, highlighting various methodologies to estimate the probability of default, such as SVM [36,37], Decision Tree [38], Random Forest [39], and Bagging and Boosting [40]. Most studies highlight the advantages of using machine-learning systems in credit-risk analysis due to a better classification performance than that of traditional techniques, such as Logistic Regression [5,31,32]).

    • Supply chain finance: From traditional to supply chain credit rating

      2019, Journal of Purchasing and Supply Management
    • A new decision-making approach for multiple criteria sorting with an imbalanced set of assignment examples

      2018, European Journal of Operational Research
      Citation Excerpt :

      One noteworthy difficulty is the imbalanced distribution of alternatives among considered categories, which exists in a wide range of real-world applications. For example, in credit risk assessment, firms are classified into two classes by a bank loan officer: default and non-default, and the number of default firms is significantly less than that of non-default firms (Angilella & Mazzù, 2015; Marinakis, Marinaki, Doumpos, Matsatsinis, & Zopounidis, 2008; Niklis, Doumpos, & Zopounidis, 2014); in ABC inventory classification, inventory items are assigned to three classes according to specific criteria, items of high value but small in number are termed as class A, items of low value but large in number are termed as class C, and items that fall between these two classes are termed as class B (Liu et al., 2016); in engineering management, activities carried out by a project team are assigned into classes of managerial practices, which include different control mechanisms for a project manager, and the class of activities which require most attention are usually small in number while the class of non-critical activities are large in quantity (de Miranda Mota & de Almeida, 2012). It is challenging to develop a sorting model from an imbalanced set of assignment examples.

    • Selection of Support Vector Machines based classifiers for credit risk domain

      2015, Expert Systems with Applications
      Citation Excerpt :

      The amount of it is not large, which may be influenced by the limitations of availability of the necessary financial/bankruptcy data (although the number of open financial datasources seems to be rising). ( Harris, 2015) used a dataset of over 20,000 entries from Barbados credit unions for model development to develop SVM linear and nonlinear classifier together with clustered SVM; the results indicated that performance of linear SVM did not significantly differ from SVM using RBF kernel; similar conclusion can be drawn from the results in (Niklis, Doumpos, & Zopounidis, 2014). Other recent research (Zhang, Gao, & Shi, 2014) used a USA credit dataset of over 6000 instances and also reported results which indicate that application of nonlinear SVM kernel for generic SVM, fuzzy SVM and hybrid fuzzy SVM does not show significant increase in classification accuracy, compared to linear SVM (resulting in accuracy of ∼75%).

    View all citing articles on Scopus
    View full text