Elsevier

Decision Support Systems

Volume 52, Issue 2, January 2012, Pages 464-473
Decision Support Systems

Comparative analysis of data mining methods for bankruptcy prediction

https://doi.org/10.1016/j.dss.2011.10.007Get rights and content

Abstract

A great deal of research has been devoted to prediction of bankruptcy, to include application of data mining. Neural networks, support vector machines, and other algorithms often fit data well, but because of lack of comprehensibility, they are considered black box technologies. Conversely, decision trees are more comprehensible by human users. However, sometimes far too many rules result in another form of incomprehensibility. The number of rules obtained from decision tree algorithms can be controlled to some degree through setting different minimum support levels. This study applies a variety of data mining tools to bankruptcy data, with the purpose of comparing accuracy and number of rules. For this data, decision trees were found to be relatively more accurate compared to neural networks and support vector machines, but there were more rule nodes than desired. Adjustment of minimum support yielded more tractable rule sets.

Highlights

► Decision tree model advantages with respect to usability. ► Comparison of decision tree models for bankruptcy data. ► Adjusting minimum support to yield comprehensible rule sets.

Introduction

Bankruptcy prediction has been a focus of study in business analytics because of the importance of accurate and timely strategic business decisions. Even though the accuracy of the prediction model is a very important criterion, understandability and transportability of the model are also important. The accurate prediction of bankruptcy has been a critical issue to shareholders, creditors, policy makers, and business managers.

There is a wealth of research that has been applied to this field [6], [9], [30], [32], [39], [42], both in finance and in other fields [38]. Among the thousands of refereed journal articles, many recent studies have applied neural networks (NNs) [1], [3], [18], [19], [23], [24], [27], [33], [34], [43], [44], [46], [48]. Another popular approach is decision trees (DTs) [10], [36], [41], [49]. Support vector machines (SVMs) have been proposed for smaller datasets with highly nonlinear relationships [12], [15], [21], [35], [40].

The vast majority of studies in this domain have focused on NNs, and how good they are compared to their statistical counterpart (i.e., logistic regression) at fitting data (fidelity [22]). However, neural network models are black boxes [4], [50], lacking transparency (seeing what the model is doing, or comprehensibility) and transportability (being able to easily deploy the model into a decision support system for new cases). We argue that decision trees (DTs) can be as accurate, and provide transparency and transportability that NNs are often criticized for.

The paper is organized as follows. Section 2 reviews previous research on bankruptcy prediction based on data mining methods. Section 3 describes data mining methodologies. Section 4 discusses the data collected and Section 5 presents data analysis and prediction model building methods as well as the results obtained from different data mining techniques. Section 6 gives our conclusions.

Section snippets

Data mining model transparency

Model transparency relates to human ability to understand what the model consists of, leading ideally to the ability to apply it to new observations (which we might term transportability). If a model is transparent, it can be transported. Some models have consistently proven to be strong in their ability to fit data, such as neural network models, but to have low transparency or transportability. Neural networks by their nature involve highly complex sets of node connections and weights that

Data mining methodology

In a comparative analysis of multiple prediction models, it is a common practice to split the complete data set into training and testing sub sets, and compare and contrast the prediction models based on their accuracy on the test data set. In splitting the data into training and testing dataset one can choose to make a single split (e.g., half of the data for training and other half of the data for testing) or multiple splits, which is commonly referred to as k-fold cross validation. The idea

Results

The data was modeled using IBM SPSS Modeler (for logistic regression, radial-basis function neural network, C5 and CART decision tree, and support vector machine (SVM) models). WEKA software was used for comparison when models were available, with expanded decision tree tools [45]. WEKA had more decision tree options.

It is obvious that different models had different accuracies, as is to be expected. For this particular set of data, logistic regression was less accurate than decision trees, but

Conclusions

Any particular set of data will have different relative fits from different data mining models. That is why it is conventional to apply logistic regression, neural networks, and decision trees to data. Neural network models often provide very good fit with a particular data set, but they are not transparent nor easily transportable. Decision tree models are expressed in easily understood terms. A common problem with decision trees is that models generate too many rules. This can be controlled

Dr. David L. Olson is the James & H.K. Stuart Professor in MIS and Chancellor's Professor at the University of Nebraska. He has published research in over 100 refereed journal articles, primarily on the topic of multiple objective decision-making and information technology. He teaches in the management information systems, management science, and operations management areas. He has authored 17 books, to include Decision Aids for Selection Problems, Introduction to Information Systems Project

References (50)

  • A.P. Sinha et al.

    Incorporating domain knowledge into data mining classifiers: an application in indirect lending

    Decision Support Systems

    (2008)
  • T. Sueyoshi et al.

    An agent-based decision support system for wholesale electricity market

    Decision Support Systems

    (2008)
  • C.-F. Tsai et al.

    Combining multiple feature selection methods for stock prediction: union, intersection, and multi-intersection approaches

    Decision Support Systems

    (2010)
  • F.-M. Tseng et al.

    Comparing four bankruptcy prediction models: logit, quadratic interval logit, neural and fuzzy neural networks

    Expert Systems with Applications

    (2010)
  • D. West et al.

    Neural network ensemble strategies for financial decision applications

    Computers and Operations Research

    (2005)
  • R.L. Wilson et al.

    Bankruptcy prediction using neural networks

    Decision Support Systems

    (1994)
  • Z.R. Yang et al.

    Probabilistic neural networks in bankruptcy prediction

    Journal of Business Research

    (1999)
  • G. Zhang et al.

    Artificial neural networks in bankruptcy prediction: general framework and cross-validation analysis

    European Journal of Operational Research

    (1999)
  • H. Zhao

    A multi-objective genetic programming approach to developing Pareto optimal decision trees

    Decision Support Systems

    (2007)
  • E.I. Altman

    Financial ratios, discriminant analysis and the prediction of corporate bankruptcy

    Journal of Finance

    (1968)
  • A.F. Atiya

    Bankruptcy prediction for credit risk using neural networks: a survey and new results

    IEEE Transactions on Neural Networks

    (2001)
  • N. Barakat, A.P. Bradley, Rule extraction from support vector machines: a review, Neurocomputing 74 (1–3) (2010)...
  • W.H. Beaver

    Financial ratios as predictors of failure

    Journal of Accounting Research

    (1966)
  • D. Berg

    bankruptcy prediction by generalized additive models

    Applied Stochastic Models in Business and Industry

    (2007)
  • L. Breiman et al.

    Classification and Regression Trees

    (1984)
  • Cited by (241)

    View all citing articles on Scopus

    Dr. David L. Olson is the James & H.K. Stuart Professor in MIS and Chancellor's Professor at the University of Nebraska. He has published research in over 100 refereed journal articles, primarily on the topic of multiple objective decision-making and information technology. He teaches in the management information systems, management science, and operations management areas. He has authored 17 books, to include Decision Aids for Selection Problems, Introduction to Information Systems Project Management, and Managerial Issues of Enterprise Resource Planning Systems as well as co-authored the books Introduction to Business Data Mining, Enterprise Risk Management, Advanced Data Mining Techniques, New Frontiers in Enterprise Risk Management, Enterprise Information Systems, and Enterprise Risk Management Models. He is associate editor of Service Business and co-editor in chief of International Journal of Services Sciences. He has made over 100 presentations at international and national conferences on research topics. He is a member of the Decision Sciences Institute, the Institute for Operations Research and Management Sciences, and the Multiple Criteria Decision Making Society. He was a Lowry Mays endowed Professor at Texas A&M University from 1999 to 2001. He was named the Raymond E. Miles Distinguished Scholar award for 2002, and was a James C. and Rhonda Seacrest Fellow from 2005 to 2006. He was named Best Enterprise Information Systems Educator by IFIP in 2006. He is a Fellow of the Decision Sciences Institute.

    Dr. Dursun Delen is an Associate Professor of Management Science and Information Systems in the Spears School of Business at Oklahoma State University (OSU). He received his Ph.D. in Industrial Engineering and Management from OSU in 1997. Prior to his appointment as an Assistant Professor at OSU in 2001, he worked for a private consultancy company, Knowledge Based Systems Inc., in College Station, Texas, as a research scientist for five years, during which he led a number of decision support and other information systems related research projects funded by federal agencies such as DoD, NASA, NIST and DOE. His research has appeared in major journals including Decision Support Systems, Communications of the ACM, Computers and Operations Research, Computers in Industry, Journal of Production Operations Management, Artificial Intelligence in Medicine, Expert Systems with Applications, among others. He has recently co-authored three books on data mining, decision support systems and business intelligence. He served as the general co-chair for the 4th International Conference on Network Computing and Advanced Information Management, and is regularly organizing tracks and mini-tracks for several international conferences. Dr. Delen serves on several technical journal editorial boards as associate editor-in-chief, associate editor and editorial board member. His research interests are in decision support systems, data/text mining, knowledge management, business intelligence and enterprise modeling.

    Yanyan Meng is a Master's student in management information systems in the College of Business Administration, University of Nebraska – Lincoln.

    View full text