Comparative analysis of data mining methods for bankruptcy prediction
Highlights
► Decision tree model advantages with respect to usability. ► Comparison of decision tree models for bankruptcy data. ► Adjusting minimum support to yield comprehensible rule sets.
Introduction
Bankruptcy prediction has been a focus of study in business analytics because of the importance of accurate and timely strategic business decisions. Even though the accuracy of the prediction model is a very important criterion, understandability and transportability of the model are also important. The accurate prediction of bankruptcy has been a critical issue to shareholders, creditors, policy makers, and business managers.
There is a wealth of research that has been applied to this field [6], [9], [30], [32], [39], [42], both in finance and in other fields [38]. Among the thousands of refereed journal articles, many recent studies have applied neural networks (NNs) [1], [3], [18], [19], [23], [24], [27], [33], [34], [43], [44], [46], [48]. Another popular approach is decision trees (DTs) [10], [36], [41], [49]. Support vector machines (SVMs) have been proposed for smaller datasets with highly nonlinear relationships [12], [15], [21], [35], [40].
The vast majority of studies in this domain have focused on NNs, and how good they are compared to their statistical counterpart (i.e., logistic regression) at fitting data (fidelity [22]). However, neural network models are black boxes [4], [50], lacking transparency (seeing what the model is doing, or comprehensibility) and transportability (being able to easily deploy the model into a decision support system for new cases). We argue that decision trees (DTs) can be as accurate, and provide transparency and transportability that NNs are often criticized for.
The paper is organized as follows. Section 2 reviews previous research on bankruptcy prediction based on data mining methods. Section 3 describes data mining methodologies. Section 4 discusses the data collected and Section 5 presents data analysis and prediction model building methods as well as the results obtained from different data mining techniques. Section 6 gives our conclusions.
Section snippets
Data mining model transparency
Model transparency relates to human ability to understand what the model consists of, leading ideally to the ability to apply it to new observations (which we might term transportability). If a model is transparent, it can be transported. Some models have consistently proven to be strong in their ability to fit data, such as neural network models, but to have low transparency or transportability. Neural networks by their nature involve highly complex sets of node connections and weights that
Data mining methodology
In a comparative analysis of multiple prediction models, it is a common practice to split the complete data set into training and testing sub sets, and compare and contrast the prediction models based on their accuracy on the test data set. In splitting the data into training and testing dataset one can choose to make a single split (e.g., half of the data for training and other half of the data for testing) or multiple splits, which is commonly referred to as k-fold cross validation. The idea
Results
The data was modeled using IBM SPSS Modeler (for logistic regression, radial-basis function neural network, C5 and CART decision tree, and support vector machine (SVM) models). WEKA software was used for comparison when models were available, with expanded decision tree tools [45]. WEKA had more decision tree options.
It is obvious that different models had different accuracies, as is to be expected. For this particular set of data, logistic regression was less accurate than decision trees, but
Conclusions
Any particular set of data will have different relative fits from different data mining models. That is why it is conventional to apply logistic regression, neural networks, and decision trees to data. Neural network models often provide very good fit with a particular data set, but they are not transparent nor easily transportable. Decision tree models are expressed in easily understood terms. A common problem with decision trees is that models generate too many rules. This can be controlled
Dr. David L. Olson is the James & H.K. Stuart Professor in MIS and Chancellor's Professor at the University of Nebraska. He has published research in over 100 refereed journal articles, primarily on the topic of multiple objective decision-making and information technology. He teaches in the management information systems, management science, and operations management areas. He has authored 17 books, to include Decision Aids for Selection Problems, Introduction to Information Systems Project
References (50)
- et al.
Bankruptcy forecasting: an empirical comparison of AdaBoost and neural networks
Decision Support Systems
(2008) - et al.
Credit risk measurement and early warning of SMEs: an empirical study of listed SMEs in China
Decision Support Systems
(2010) - et al.
A hybrid approach based on the combination of variable selection using decision trees and case-based reasoning using the Mahalanobis distance: for bankruptcy prediction
Expert Systems with Applications
(2010) A comparative analysis of machine learning techniques for student retention management
Decision Support Systems
(2010)- et al.
Credit rating analysis with support vector machines and neural networks: a market comparative study
Decision Support Systems
(2004) Neural network techniques for financial performance prediction: integrating fundamental and technical analysis
Decision Support Systems
(2004)- et al.
Forecasting the NYSE composite index with technical analysis, pattern recognizer, neural network, and genetic algorithm: a case study in romantic decision support
Decision Support Systems
(2002) - et al.
Predicting going concern opinion with data mining
Decision Support Systems
(2008) - et al.
Using non-linear methods to investigate the criterion validity of traffic-psychological test batteries
Accident Analysis and Prevention
(2008) - et al.
An application of support vector machines in bankruptcy prediction model
Expert Systems with Applications
(2005)
Incorporating domain knowledge into data mining classifiers: an application in indirect lending
Decision Support Systems
An agent-based decision support system for wholesale electricity market
Decision Support Systems
Combining multiple feature selection methods for stock prediction: union, intersection, and multi-intersection approaches
Decision Support Systems
Comparing four bankruptcy prediction models: logit, quadratic interval logit, neural and fuzzy neural networks
Expert Systems with Applications
Neural network ensemble strategies for financial decision applications
Computers and Operations Research
Bankruptcy prediction using neural networks
Decision Support Systems
Probabilistic neural networks in bankruptcy prediction
Journal of Business Research
Artificial neural networks in bankruptcy prediction: general framework and cross-validation analysis
European Journal of Operational Research
A multi-objective genetic programming approach to developing Pareto optimal decision trees
Decision Support Systems
Financial ratios, discriminant analysis and the prediction of corporate bankruptcy
Journal of Finance
Bankruptcy prediction for credit risk using neural networks: a survey and new results
IEEE Transactions on Neural Networks
Financial ratios as predictors of failure
Journal of Accounting Research
bankruptcy prediction by generalized additive models
Applied Stochastic Models in Business and Industry
Classification and Regression Trees
Cited by (241)
Bankruptcy prediction with low-quality financial information
2024, Expert Systems with ApplicationsCombining intra-risk and contagion risk for enterprise bankruptcy prediction using graph neural networks
2024, Information SciencesBankruptcy prediction using machine learning models with the text-based communicative value of annual reports
2023, Expert Systems with ApplicationsA risk identification model for ICT supply chain based on network embedding and text encoding
2023, Expert Systems with ApplicationsThe evaluation of bankruptcy prediction models based on socio-economic costs
2023, Expert Systems with Applications
Dr. David L. Olson is the James & H.K. Stuart Professor in MIS and Chancellor's Professor at the University of Nebraska. He has published research in over 100 refereed journal articles, primarily on the topic of multiple objective decision-making and information technology. He teaches in the management information systems, management science, and operations management areas. He has authored 17 books, to include Decision Aids for Selection Problems, Introduction to Information Systems Project Management, and Managerial Issues of Enterprise Resource Planning Systems as well as co-authored the books Introduction to Business Data Mining, Enterprise Risk Management, Advanced Data Mining Techniques, New Frontiers in Enterprise Risk Management, Enterprise Information Systems, and Enterprise Risk Management Models. He is associate editor of Service Business and co-editor in chief of International Journal of Services Sciences. He has made over 100 presentations at international and national conferences on research topics. He is a member of the Decision Sciences Institute, the Institute for Operations Research and Management Sciences, and the Multiple Criteria Decision Making Society. He was a Lowry Mays endowed Professor at Texas A&M University from 1999 to 2001. He was named the Raymond E. Miles Distinguished Scholar award for 2002, and was a James C. and Rhonda Seacrest Fellow from 2005 to 2006. He was named Best Enterprise Information Systems Educator by IFIP in 2006. He is a Fellow of the Decision Sciences Institute.
Dr. Dursun Delen is an Associate Professor of Management Science and Information Systems in the Spears School of Business at Oklahoma State University (OSU). He received his Ph.D. in Industrial Engineering and Management from OSU in 1997. Prior to his appointment as an Assistant Professor at OSU in 2001, he worked for a private consultancy company, Knowledge Based Systems Inc., in College Station, Texas, as a research scientist for five years, during which he led a number of decision support and other information systems related research projects funded by federal agencies such as DoD, NASA, NIST and DOE. His research has appeared in major journals including Decision Support Systems, Communications of the ACM, Computers and Operations Research, Computers in Industry, Journal of Production Operations Management, Artificial Intelligence in Medicine, Expert Systems with Applications, among others. He has recently co-authored three books on data mining, decision support systems and business intelligence. He served as the general co-chair for the 4th International Conference on Network Computing and Advanced Information Management, and is regularly organizing tracks and mini-tracks for several international conferences. Dr. Delen serves on several technical journal editorial boards as associate editor-in-chief, associate editor and editorial board member. His research interests are in decision support systems, data/text mining, knowledge management, business intelligence and enterprise modeling.
Yanyan Meng is a Master's student in management information systems in the College of Business Administration, University of Nebraska – Lincoln.