An experimental comparison of ensemble of classifiers for bankruptcy prediction and credit scoring
Introduction
An important issue in financial decision-making is to predict, timely and correctly, business failure (Atiya, 2001, Zhang et al., 1999) (e.g. bankruptcy prediction and credit scoring). The credit scoring models permit to discriminate between good credit group and bad credit group (Chen & Huang, 2003). The benefits obtained developing a reliable credit scoring system are (Tsai and Wu, 2007, West, 2000):
- •
reducing the cost of credit analysis;
- •
enabling faster decision;
- •
insuring credit collections and diminishing possible risk.
Several financial decision-making methods based on machine learning (examples of machine learning techniques used to solve the above financial decision-making problems are Atiya, 2001, Huang et al., 2004, Lee et al., 2006) use the multi-layer perceptron (MLP) (Haykin, 1999) as classifier. Other tested classifiers are the Decision Tree and the Support Vector machine. We want to stress that these studies show that the machine learning based systems are better than the traditional (statistical) methods for bankruptcy prediction and credit scoring problems (Huang et al., 2004, Ong et al., 2005, Vellido et al., 1999, Wong and Selvi, 1998). In Tsai and Wu (2007) the authors compare a single MLP classifier with multiple classifiers and diversified multiple classifiers on three datasets. However, they conclude that there is no an exact winner.
In Table 1, several machine learning based methods are compared.
The main drawbacks of the machine learning based methods proposed in the literature are (Tsai & Wu, 2007).
- •
In several works only one dataset is used to validate the proposed system.
- •
In several works only the accuracy is used to validate the proposed system (examples of exceptions are Lee et al., 2006, Lee et al., 2002, Tsai and Wu, 2007).
- •
Few papers study ensemble of classifiers (Frosyniotis et al., 2003, Ghosh, 2002, Kang and Doermann, 2003, Roli et al., 2004) on both credit scoring and bankruptcy prediction problems (Tsai and Wu, 2007, Wong and Selvi, 1998).
In this paper, we improve the results of Tsai and Wu (2007), we have made a deep study of the ensemble of classifiers for bankruptcy prediction and credit scoring. We have tested four different methods for creating an ensemble of classifiers and we have tested four different classifiers. We show that the Random Subspace method (RS) (Ho, 1998) improve the performance of the classifiers. In our test the best stand-alone classifier is MLP (as reported in several papers) but the best system is a Random Subspace of Levenberg–Marquardt neural nets.
The paper is organized as follows: in Section 2, a brief literature of the multiple classifiers is reported, in Section 3, the experimental results are presented. Finally, in Section 4, some concluding remarks are given.
Section snippets
Ensemble of classifiers
To improve the performance of the single classifier approaches the combination of multiple classifiers has been proposed in the field of machine learning (Fierrez-Aguilar et al., 2005, Bologna et al., 2002, Melville and Mooney, 2005). The multiple classifier systems are based on the combination of a pool of classifiers such that the their fusion achieves higher performance than the stand-alone classifiers. Hence, an ensemble of classifiers is a set of classifiers, whose individual
Experiments
In our experiments we have used the same dataset3 used in Tsai and Wu (2007):
- •
Australian credit.
- •
German credit.
- •
Japanese credit.
A summary of the characteristics of these datasets (number of attributes, number of examples, and number of classes) is reported in Table 2.
Each dataset was divided into training and testing data randomly, in which there are 70–30% training and testing sets per dataset. The process of training and testing was conducted
Conclusions
To our knowledge, this is the first paper that compare several ensemble methods for bankruptcy prediction and credit scoring, we show that ensemble of classifiers may be used for boosting the performance of “stand-alone” classifier. We show that the RS ensemble outperforms others ensemble methods.
It is interesting to note that the performance of MLP is very similar to that obtained by the ensembles of MLP (as reported in Tsai & Wu (2007)) while the performance of the other classifiers (LMNC;
References (33)
- et al.
Credit scoring and rejected instances reassigning through evolutionary computation techniques
Expert Systems with Applications
(2003) - et al.
Credit rating analysis with support vector machines and neural networks: A market comparative study
Decision Support Systems
(2004) - et al.
Mining the customer credit using classification and regression tree and multivariate adaptive regression splines
Computational Statistics and Data Analysis
(2006) - et al.
Credit scoring using the hybrid neural discriminant technique
Expert Systems with Applications
(2002) - et al.
Switching class labels to generate classification ensembles
Pattern Recognition
(2005) - et al.
Creating diversity in ensembles using artificial
Information Fusion: Special Issue on Diversity in Multiclassifier Systems
(2005) - et al.
Bankruptcy prediction using support vector machine with optimal choice of kernel function parameters
Expert Systems with Applications
(2005) - et al.
Building credit scoring models using genetic programming
Expert Systems with Applications
(2005) - et al.
An application of support vector machines in bankruptcy prediction model
Expert Systems with Applications
(2005) - et al.
Neural networks in business: a survey of applications (1992–1998)
Expert Systems with Applications
(1999)