A hybrid device for the solution of sampling bias problems in the forecasting of firms’ bankruptcy

https://doi.org/10.1016/j.eswa.2012.01.135Get rights and content

Abstract

This paper proposes a new approach to the forecasting of firms’ bankruptcy. Our proposal is a hybrid method in which sound companies are divided in clusters using Self Organized Maps (SOM) and then each cluster is replaced by a director vector which summarizes all of them. Once the companies in clusters have been replaced by director vectors, we estimate a classification model through Multivariate Adaptive Regression Splines (MARS). For the test of the model we considered a real setting of Spanish enterprises from the construction sector. With this procedure we intend to overcome the sampling-bias problems that matched-pairs models often suffer. We estimated two benchmark models: a back propagation neural network and a simple MARS model. Our results show that the proposed hybrid approach is much more accurate than the benchmark techniques for the identification of the bankrupt companies.

Highlights

► Hybrid methods are superior to single models for financial classification tasks. ► Our method clusters companies through Self-Organizing Maps. ► Then each cluster is replaced by a director vector. ► A Multivariate Adaptive Regression Splines (MARS) model is estimated. ► In a real setting, our system outperforms both a single MARS and a neural network.

Introduction

Correct resource allocation decisions are critical to guarantee the survival of banks and other lenders. So, bankruptcy forecasting models are key tools to help bank managers/officers in their investment/lending decisions. From the late 1960 onwards many models have been developed and tested. During the last years the importance of such systems is even higher due to the current financial crisis, which demands an even more careful management of financial resources. Furthermore, under Basel II Accord recommendations (Bank for International Settlements (BIS), 2006), banks which choose to develop their own empirical model to quantify required capital for credit risk (internal rating-based approach) are required to maintain less capital than those using the standardized approach. So, an accurate device to estimate loan default probabilities lets a financial entity to minimize the resources held as reserves and therefore to reach a higher level of profitability.

According to Sueyoshi and Goto (2009a), research on bankruptcy-based performance assessment can be classified into three broad categories. First, those studies centered on a particular model, which test how such model performs in comparison with others. Second, research focused on the selection of an appropriate set of variables to implement a particular model. The third category comprises papers which investigate the bankruptcy process.

Among these categories, the first is the one which has received most attention by researchers. The tested models are mainly statistical methodologies (for a review of the most outstanding studies see Balcaen and Ooghe, 2006, Keasey and Watson, 1991, among others) and artificial intelligence techniques (for a review see, e.g., Aziz and Dar, 2006, Ravi Kumar and Ravi, 2007).

Ravi Kumar & Ravi (2007) discuss the models which have been most frequently used in studies focused in insolvency prediction via intelligent systems. These models are Fuzzy Logic (FL), Neural Networks (NN), Genetic Algorithms (GA), Case-Based Reasoning Systems (CBR), Rough Sets (RS), Support Vector Machines (SVM), Decision trees (DT), Data Envelopment Analysis (DEA) and Hybrid Systems (HS).

Among these, HS are the most promising. These combine two or more intelligent techniques in several forms to derive the advantages of all of them. HS have received considerable attention from researchers as they amplify the advantages of the intelligent techniques while simultaneously nullifying their disadvantages. Most HS require a considerable amount of data to reach to accurate estimations. This is not a problem nowadays, as there exist publicly available databases containing financial information of listed and unlisted firms.

However, studies using HS for bankruptcy prediction suffer from a drawback which is that the majority of them estimate the model upon the basis of a sample in which non-failed companies are underrepresented. In most cases a matched-pairs design is used. The selection of non-failed firms is arbitrary, which makes the model to achieve a high in-sample percentage of correct classifications but it is likely to be inaccurate for failure prediction in new cases drawn from a real population.

Another strategy is to consider a “real” population as the sample. That is, to consider all the companies for which we have financial information available. However, as only a very small percentage of firms enter into financial distress in a normal economic situation, such samples are very unbalanced. This causes coefficient instability and leads to poor performance ability of the models.

As an alternative to both strategies we propose a HS model where, upon the basis of a real population of firms, data are preprocessed to summarize the information of healthy firms. So, the initial unbalanced sample is transformed into a balanced one which retains the main features of the healthy firms. Self Organized Maps (SOM) is used in this stage. Then a classification device is developed upon the transformed sample, for which we use the Multivariate Adaptive Regression Splines (MARS) approach. The results are compared with benchmarks which are popular in bankruptcy prediction literature. As an important application of the combined approach, this paper applies it to the solvency assessment of Spanish construction firms.

The remainder of the paper is structured as follows. Section 2 revises prior studies on bankruptcy prediction using HS. Section 3 is devoted to build the database. Section 4 describes the algorithm and the analytical procedures we used. Section 5 comments on the main results, including the benchmark techniques applied. Finally, Section 6 is devoted to the summary and main conclusions, including also some further research avenues.

Section snippets

Prior bankruptcy research using hybrid systems

Basically, there are four types of HS which have been applied to financial distress prediction:

  • Hybrid Algorithms (HA), where two or more intelligent algorithms are tightly integrated to form a new classification device (i.e., GA-trained NN, neuro-fuzzy systems).

  • Ensemble Classifiers (EC), which consist of multiple single classifiers whose decision is combined to form that of the combined system, usually by applying a voting scheme.

  • Feature Selectors (FS). In these systems, an algorithm is used

The database

The construction sector in Spain is the second largest of the EU-27 countries in terms of value added. It comprised 18% of the EU-27 total in 2007. In terms of the number of persons employed, Spain was the largest sector among the EU-27 in 2007: 2.9 million persons which represented a little less than one fifth (19.5%) on the construction workforce in the EU-27. In terms of value added generated the Spanish construction sector was the second largest and comprised 18% of the EU-27 total. Of the

The proposed hybrid model

The model proposed in the present research combines the use of MARS models with a clustering technique which is SOM mapping in order to obtain a MARS model which uses as training information only those companies considered as representative of each cluster. A more detailed explanation of the steps of the algorithm is presented below.

  • Step 1:

    Study of the similarities of the bankrupt companies by means of Mahalanobis’ distances. The Mahalanobis distance of all the bankrupt companies was calculated.

  • Step 2:

    Those

Results

In this section we detail the results of the algorithm. As settled above, the original data base is formed by 63,107 companies of which 256 went bankrupt. All the steps of the algorithm were applied five times, considering for each one of the runs 80% of the data for the trainings (50,485 non-bankrupt companies and 204 bankrupt) and the other 20% for validation (12,622 non-bankrupt companies and 52 bankrupt). We also detail the results of the application of the proposed benchmark techniques.

Summary, concluding remarks and further research

This paper proposes a new approach to the forecasting of firms’ bankruptcy. Our proposal is a hybrid method in which sound companies are divided in clusters according to their financial similarities and then each cluster is replaced by a director vector which summarizes all of them. In order to do this, we use SOM mapping. Once the companies in clusters have been replaced by director vectors, we estimate a classification model through MARS.

For the test of the model we considered a real setting

Acknowledgements

The part of this research conducted by Javier De Andrés and Pedro Lorca was partially supported by the research grant ECO-2008-00242, by the Spanish Ministry of Science and Innovation. The part of this research conducted by Fernando Sánchez-Lasheras and Francisco Javier de Cos-Juez was partially supported by the research project AYA2010-18153 (Ministry of Science and Innovation - Government of Spain).

References (87)

  • H.A. Abdou

    Genetic programming for credit scoring: The case of Egyptian public sector banks

    Expert Systems with Applications

    (2009)
  • B.S. Ahn et al.

    The integrated methodology of rough set theory and artificial neural network for business failure prediction

    Expert Systems with Applications

    (2000)
  • H. Ahn et al.

    Bankruptcy prediction modelling with hybrid case-based reasoning and genetic algorithms approach

    Applied Soft Computing

    (2009)
  • P. Alam et al.

    The use of fuzzy clustering algorithm and self-organizing neural networks for identifying potentially failing banks: an experimental study

    Expert Systems with Applications

    (2000)
  • E. Alfaro-Cid et al.

    Comparing multiobjective evolutionary ensembles for minimizing type I and II errors for bankruptcy prediction

    IEEE Congress on Evolutionary Computation

    (2008)
  • E. Alfaro et al.

    Bankruptcy forecasting: An empirical comparison of AdaBoost and neural networks

    Decision Support Systems

    (2008)
  • E.I. Altman

    Financial ratios, discriminant analysis and the prediction of the corporate bankruptcy

    Journal of Finance

    (1968)
  • E.I. Altman

    Corporate financial distress and bankruptcy

    (1993)
  • P. Avishek et al.

    Development of a hybrid methodology for dimensionality reduction in Mahalanobis-Taguchi system using Mahalanobis distance and binary particle swarm optimization

    Expert Systems with Applications

    (2010)
  • M.A. Aziz et al.

    Predicting corporate bankruptcy: Where we stand?

    Corporate Governance

    (2006)
  • S. Balcaen et al.

    35 years of studies on business failure: an overview of the classic statistical methodologies and their related problems

    The British Accounting Review

    (2006)
  • Bank for International Settlements (BIS)

    International convergence of capital measurement and capital standards. A revised framework

    (2006)
  • L. Becchetti et al.

    Bankruptcy risk and productive efficiency in manufacturing firms

    Journal of Banking and Finance

    (2003)
  • J. Begley et al.

    Bankruptcy classification errors in the 1980s: An empirical analysis of Altman’s and Ohlson’s models

    Review of Accounting Studies

    (1996)
  • M. Bhargava et al.

    Predicting bankruptcy in the retail sector: an examination of the validity of key measures of performance

    Journal of Retailing and Consumer Services

    (1998)
  • A. Boyacioglu et al.

    Predicting bank financial failures using neural networks, support vector machines and multivariate statistical methods: A comparative analysis in the sample of savings deposit insurance fund (SDIF) transferred banks in Turkey

    Expert Systems with Applications

    (2009)
  • L. Breiman et al.

    Classification and regression trees

    (1984)
  • A. Chaudhuri et al.

    Fuzzy support vector machine for bankruptcy prediction

    Applied Soft Computing

    (2011)
  • W. Chen et al.

    Mining the customer credit using hybrid support vector machine technique

    Expert Systems with Applications

    (2009)
  • L.H. Chen et al.

    MARS-based research of personal credit scoring: Verification of Chinese data

    International Conference on Management Science and Engineering

    (2006)
  • S. Cho et al.

    A hybrid approach based on the combination of variable selection using decision trees and case-based reasoning using the Mahalanobis distance: For bankruptcy prediction

    Expert Systems with Applications

    (2010)
  • Ch.L. Chuang et al.

    Constructing a reassigning credit scoring model

    Expert Systems with Applications

    (2009)
  • S. Davalos et al.

    The application of a neural network approach to predicting bankruptcy risks facing the major US air carriers: 1979–1996

    Journal of Air Transport Management

    (1999)
  • J. De Andrés et al.

    Bankruptcy forecasting: A hybrid approach using fuzzy c-means clustering and Multivariate Adaptive Regression Splines (MARS)

    Expert Systems with Applications

    (2011)
  • Defu, Z., Leung, S. C. H., & Zhimei, Y. (2008). A decision tree scoring model based on genetic algorithm and k-means...
  • A. Foglia et al.

    The definition of the grading scales in banks’ internal rating systems

    Economic Notes

    (2001)
  • R.D. Foreman

    A logistic analysis of bankruptcy within the US local telecommunications industry

    Journal of Economics and Business

    (2003)
  • J.H. Friedman

    Multivariate adaptive regression splines

    Annals of Statistics

    (1991)
  • J.H. Friedman et al.

    An introduction to multivariate adaptive regression splines

    Statistical Methods in Medical Research

    (1995)
  • J.S. Grice et al.

    Test of generalizability of Altman’s bankruptcy prediction model

    Journal of Business Research

    (2001)
  • Z. Gu

    Analyzing bankruptcy in the restaurant industry: A multiple discriminant model

    International Journal of Hospitality Management

    (2002)
  • T. Hastie et al.

    Generalized additive models

    (1990)
  • T. Hastie et al.

    The Elements of Statistical Learning

    (2003)
  • Hu, G., & Wang, Y. (2008). The application of data mining to customer credit analysis in medicament enterprise....
  • C.L. Huang et al.

    Credit scoring with a data mining approach based on support vector machines

    Expert Systems with Applications

    (2007)
  • Ch.L. Huang et al.

    A GA-based feature selection and parameters optimization for support vector machines

    Expert Systems with Applications

    (2006)
  • Ch. Hung et al.

    A selective ensemble based on expected probabilities for bankruptcy prediction

    Expert Systems with Applications

    (2009)
  • N.Ch. Hsieh

    Hybrid mining approach in the design of credit scoring models

    Expert Systems with Applications

    (2005)
  • International Finance Corporation (IFC) (2010). Doing Business 2011. Making a difference for entrepreneurs. Washington:...
  • K. Jeong et al.

    Stream modification patterns in a river basin: Field survey and self-organizing map (SOM) application

    Ecological Informatics

    (2010)
  • D. Karthik Chandra et al.

    Failure prediction of dotcom companies using hybrid intelligent techniques

    Expert Systems with Applications

    (2009)
  • K. Keasey et al.

    Financial distress prediction models: A review of their usefulness

    British Journal of Management

    (1991)
  • H. Kim et al.

    Predicting restaurant bankruptcy: A logit model in comparison with a discriminant model

    Journal of Hospitality & Tourism Research

    (2006)
  • Cited by (0)

    View full text