Bankruptcy prediction modeling with hybrid case-based reasoning and genetic algorithms approach
Introduction
Prediction of corporate bankruptcies has long been an important topic and has been studied extensively in the finance and management literature because it is an essential basis for the risk management of financial institutions. Bankruptcy prediction models have used various statistical and artificial intelligence techniques. These techniques include discriminant analysis, logistic regression, decision tree, k-nearest neighbor, and artificial neural networks (ANNs) (see [1]). Among them, ANN has become one of the most popular techniques for the prediction of corporate bankruptcy due to its high prediction accuracy. ANN, however, has not been applied widely in financial companies because it is generally difficult to build models. The difficulty stems from many parameters to be set by heuristics. Furthermore, there is a danger of overfitting, and it is usually difficult to explain why it produces a specific result, i.e. poor explanation ability. So, there has been a need for other artificial intelligence techniques which have good explanation ability as well as high prediction performance.
Case-based reasoning (CBR) may be an alternative to relieve the above limitations of ANN. There is no possibility for overfitting because it uses specific knowledge of previously experienced problems rather than their generalized patterns [2]. Furthermore, CBR is maintained in an up-to-date state because the case-base is updated in real time, which is a very important feature for the real-world application.
Nevertheless, CBR has hardly attracted researchers’ interest because its prediction accuracy is usually much lower than the accuracy of ANN. Thus, there have been many studies to enhance the performance of CBR. Among them, the mechanisms to enhance the case retrieval process such as the selection of the appropriate feature subsets, instance subsets and the determination of feature weights have been most frequently studied (see [3], [4], [5], [6], [7]).
One of the state-of-the-art techniques for CBR is simultaneous optimization of these parameters in CBR. Most prior research tried to optimize these parameters independently. However, we can find the global optimization model for CBR when considering these parameters simultaneously, which improves the prediction results synergetically.
This study proposes a novel hybrid approach that optimizes the weights of the features and the training instances simultaneously by genetic algorithms (GAs). To validate the usefulness of our model, we apply it to the real-world case of corporate bankruptcy prediction and review the results produced by our model.
The rest of the paper is organized as follows. Section 2 briefly reviews prior studies, and Section 3 proposes our research model, the simultaneous optimization of feature weights and relevant instances by the GA approach. In the next section, the explanation for the research design and experiments are presented, and Section 5 describes all the empirical results and their meanings. In the final section, the conclusions of the study are presented.
Section snippets
Prior research
We review the prior studies on corporate bankruptcy prediction first. We also examine the general concept of CBR and the previous research to optimize it. After that, we review the recent studies regarding simultaneous optimization of several parameters for CBR systems. In the end, we examine the GA approach – the key method for simultaneous optimization – in detail.
Simultaneous optimization of feature weighting and instance selection using a genetic algorithm
This study proposes a novel CBR model whose feature weighting and instance selection are optimized globally, in order to improve prediction accuracy of typical CBR systems. Our model employs GA to select a relevant instance subset and to optimize the weights of each feature simultaneously using the reference and the test case-base. We call it GOCBR (Global Optimization of feature weighting and instance selection using GA for CBR). The flowchart of GOCBR is shown in Fig. 2.
The detailed
Application data
The application data used in this study consists of financial ratios and the status of bankrupt or non-bankrupt for corresponding corporations. The data is collected from one of the largest commercial banks in Korea. The sample consists of 1335 bankrupt companies in heavy industry which filed for bankruptcy between 1996 and 2000, and 1335 solvent companies in heavy industry between 1999 and 2000. Thus, the total number of samples is 2670 companies.
The financial status for each company is
The results of GA-optimized CBRs: FSCBR, FWCBR, ISCBR, FISCBR, and GOCBR
Table 4 shows the finally selected parameters of each model. As a result of GOCBR, we obtain 15 optimal weights of each feature and 1445 optimal training instances to maximize the prediction result for the test set. Because there are totally 1602 training samples, GOCBR selects about 90.26% from the total case base as an optimal instance subset. As we can see from Table 4, GOCBR selects more instances than ISCBR (71.66%) and FISCBR (53.12%).
The feature weights in Table 4 are not standardized,
Conclusions
We have proposed a new hybrid CBR model using GA–GOCBR. Our proposed model optimizes feature weighting and instance selection simultaneously. By selecting optimal instances, it may reduce noises or distorted cases which lead erroneous prediction. Moreover, our model may also find appropriate nearest neighbors for CBR by applying optimal feature weights to similarity calculation, which may enhance the prediction accuracy. Compared to other models such as TYCBR, FSCBR, FWCBR, and ISCBR as well as
References (59)
- et al.
A method of similarity metrics for structured representations
Expert Systems with Applications
(1997) - et al.
Case-based reasoning supported by genetic algorithms for corporate bond rating
Expert Systems with Applications
(1999) - et al.
Maintaining case-based reasoning systems using a genetic algorithms approach
Expert Systems with Applications
(2001) A case-based customer classification approach for direct marketing
Expert Systems with Applications
(2002)Self organizing neural networks for financial diagnosis
Decision Support Systems
(1996)- et al.
Corporate distress diagnosis: comparisons using linear discriminant analysis and neural networks
Journal of Banking and Finance
(1994) - et al.
Bankruptcy prediction using neural networks
Decision Support Systems
(1994) - et al.
Integration of case-based forecasting, neural network and discriminant analysis for bankruptcy prediction
Expert Systems with Applications
(1996) - et al.
Hybrid neural network models for bankruptcy predictions
Decision Support Systems
(1996) - et al.
Bankruptcy prediction using case-based reasoning, neural network and discriminant analysis
Expert Systems with Applications
(1997)