Elsevier

Expert Systems with Applications

Volume 38, Issue 9, September 2011, Pages 11261-11272
Expert Systems with Applications

Predicting corporate financial distress based on integration of decision tree classification and logistic regression

https://doi.org/10.1016/j.eswa.2011.02.173Get rights and content

Abstract

Lately, stock and derivative securities markets continuously and rapidly evolve in the world. As quick market developments, enterprise operating status will be disclosed periodically on financial statement. Unfortunately, if executives of firms intentionally dress financial statements up, it will not be observed any financial distress possibility in the short or long run. Recently, there were occurred many financial crises in the international marketing, such as Enron, Kmart, Global Crossing, WorldCom and Lehman Brothers events. How these financial events affect world’s business, especially for the financial service industry or investors has been public’s concern. To improve the accuracy of the financial distress prediction model, this paper referred to the operating rules of the Taiwan Stock Exchange Corporation (TSEC) and collected 100 listed companies as the initial samples. Moreover, the empirical experiment with a total of 37 ratios which composed of financial and other non-financial ratios and used principle component analysis (PCA) to extract suitable variables. The decision tree (DT) classification methods (C5.0, CART, and CHAID) and logistic regression (LR) techniques were used to implement the financial distress prediction model. Finally, the experiments acquired a satisfying result, which testifies for the possibility and validity of our proposed methods for the financial distress prediction of listed companies.

This paper makes four critical contributions: (1) the more PCA we used, the less accuracy we obtained by the DT classification approach. However, the LR approach has no significant impact with PCA; (2) the closer we get to the actual occurrence of financial distress, the higher the accuracy we obtain in DT classification approach, with an 97.01% correct percentage for 2 seasons prior to the occurrence of financial distress; (3) our empirical results show that PCA increases the error of classifying companies that are in a financial crisis as normal companies; and (4) the DT classification approach obtains better prediction accuracy than the LR approach in short run (less one year). On the contrary, the LR approach gets better prediction accuracy in long run (above one and half year). Therefore, this paper proposes that the artificial intelligent (AI) approach could be a more suitable methodology than traditional statistics for predicting the potential financial distress of a company in short run.

Highlights

► Our empirical results show that PCA increases the error of classifying companies that are in a financial crisis as normal companies. ► The Decision Tree classification approach obtains better prediction accuracy in short run. ► The Logistic Regression approach gets better prediction accuracy in long run.

Introduction

Recently, one of the most attractive business news is a series of financial crisis events related to the public companies. Some of these companies are famous and also at high stock prices, originally (e.g. Enron Corp., Kmart Corp., WorldCom Corp., Lehman Brothers Bank, etc.). In consequence of the financial crisis, it is always too late for many creditors to withdraw their loans, as well as for investors to sell their own stocks, futures, or options. Therefore, corporate bankruptcy is a very important economic phenomenon and also affects the economy of every country. In Taiwan, domestic and foreign capital markets have developed rapidly in recent years, gradually giving people the idea of making a financial investment. Nevertheless, Procomp Corp. and Cdbank Corp. bankruptcy events have also caused tremendous disorder in the financial market and related industries are also affected by these economic shocks in Taiwan. The number of bankruptcy firms is important for the economy of a country and it can be viewed as an indictor of the development and robustness of the economy (Zopounidis & Dimitras, 1998). The high individual, economic, and social costs encountered in corporate failures or bankruptcies have spurred searches for better understanding and prediction capability (McKee & Lensberg, 2002). Therefore, forecasting corporate financial distress plays an increasingly important role in today’s society since it has a significant impact on lending decisions and the profitability of financial institutions.

A common methodology to bankruptcy prediction is to summarize the literature to search a large set of potential predictive financial and/or non-financial variables and then reduce a set of not significant variables, through traditional mathematical analysis that will predict bankruptcy (Lensberg, Eilifsen, & McKee, 2006). Many traditional classification techniques have been presented to predict financial distress using ratios, e.g., univariate approaches (Beaver, 1966), multivariate approaches, linear multiple discriminant approaches (MDA) (Altman, 1968, Altman et al., 1977), multiple regression (Meyer & Pifer, 1970), logistic regression (Dimitras, Zanakis, & Zopounidis, 1996), factor analysis (Blum, 1974), and stepwise (Laitinen & Laitinen, 2000). However strict assumptions of traditional statistics such as linearity, normality, independence among predictor variables and pre-existing functional form relating to the criterion variable and the predictor variable limit their application in the real world (Hua, Wang, Xu, Zhang, & Liang, 2007).

Therefore, this paper proposes a model of financial distress prediction comparing decision tree (DT) classification and logistic regression (LR) techniques. The main objectives of this paper are to (1) adopt DT and LR techniques to construct a financial distress prediction model, (2) use financial and non-financial ratios to enhance the accuracy of the financial distress prediction model, (3) employ a traditional statistical method (principle component analysis, PCA) to compare the degree of accuracy with that of the artificial intelligent (AI) approach, and (4) to expand this model so that it will work within a financial distress prediction system to provide information to investors as well as investment monitoring organizations. The data for our experiment were collected from the Taiwan Stock Exchange Corporation (TSEC) database.

The rest of this paper is organized as follows. A literature review of related techniques is provided in Section 2. We describe our proposed approach and its capabilities of each step in Section 3. Section 4 presents the process for choosing appropriate variables by PCA. In Section 5, we analyzed the prediction performance of our approach and fulfilled several experiments. Moreover, we compared our results with the DT, and LR approaches in Section 6. Finally, we inference our conclusions and discuss future research in Section 7.

Section snippets

Decision trees algorithm

Data mining (DM), also known as “knowledge discovery in databases” (KDD), is the process of discovering meaningful patterns in huge databases (Han & Kamber, 2001). In addition, it is also an application that can provide significant competitive advantages for making the right decision (Huang, Chen, & Lee, 2007). The more common model functions in the current data mining process include the classification, regression, clustering, association rules, summarization, dependency modeling and sequence

Research methodology

In this research, we compare DT and LR techniques for financial distress prediction (FDP) performance. The research methodology is as shown in Fig. 1. In the FDP Choosing phase, we handle the original huge datasets from the TSEC which will be processed by data pre-processing. Data pre-processing includes cleaning, normalization, transformation, feature extraction and selection. The product of data pre-processing is the final training and testing set. The goal in this phase is to choose the

Data

Our samples contained raw data from 100 Taiwan firms listed in the TSEC. The period of sampling was from 2000 January to May, 2007, amounting to 7 years and 5 months. The 50 firms in financial distress were matched with 50 non-bankruptcy firms. These firms were distinguished as non-bankruptcy based on the absence of any indication or proof concerning the issuing of financial distress in the auditors’ reports. All the variables used in the sample were extracted from formal financial statements,

DT experiments and results

This process uses the finance and non-finance ratios, and constructs a financial distress prediction model after carrying out a second time factor analysis. The variables are then loaded as DT and LR input nodes. In addition, we also apply these experiment parameters to investigate the past 2 seasons, the past 4 seasons, the past 6 seasons, and the past 8 seasons before the financial distress occurred, for the sake of prediction accuracy. In this experiment, we will use the C5.0, CART, CHAID as

The FDP comparing phase

After the implementation for the FDP modeling phase, we will compare the DT and LR approaches with the accuracy rate, Type II error rate, and factor analysis. The detail descriptions will be discussed as following sections.

Conclusions

This research aimed at the financial and the non-financial ratios in the financial statement, and used the DT and the LR models to compare the performance of the financial distress predictions, in order to find a better early-warning method. This research took 50 companies that were facing a financial crisis, and matched them with 50 normal companies of the similar industry. In addition, we adopted the necessary dataset from the TSEC database and sampled them into the past 2, 4, 6, 8 seasons

Acknowledgements

We thank the support of National Scientific Council (NSC) of the Republic of China (ROC) to this work under Grant No. NSC 96-2416-H-018-011. We also gratefully acknowledge the Editor and anonymous reviewers for their valuable comments and constructive suggestions.

References (36)

  • T. Lensberg et al.

    Bankruptcy theory development and classification via genetic programming

    European Journal of Operational Research

    (2006)
  • H. Li et al.

    Majority voting combination of multiple case-based reasoning for financial distress prediction

    Expert Systems with Applications

    (2009)
  • T.E. McKee et al.

    Genetic programming and rough sets: A hybrid approach to bankruptcy classification

    European Journal of Operational Research

    (2002)
  • P. Xidonas et al.

    On the selection of equity securities: An expert systems methodology and an application on the Athens Stock Exchange

    Expert Systems with Applications

    (2009)
  • E.L. Altman

    Financial ratios, discriminant analysis and the prediction of corporate bankruptcy

    The Journal of Finance

    (1968)
  • W. Beaver

    Financial ratios as predictors of failure, empirical research in accounting: Selected studied

    Journal of Accounting Research

    (1966)
  • M. Blum

    Failing company discriminant analysis

    Journal of Accounting Research

    (1974)
  • L. Breiman et al.

    Classification and regression trees

    (1984)
  • Cited by (158)

    • Predicting financial distress using machine learning approaches: Evidence China

      2024, Journal of Contemporary Accounting and Economics
    • Contextual combinatorial bandit on portfolio management

      2023, Expert Systems with Applications
    • Mining semantic features in patent text for financial distress prediction

      2023, Technological Forecasting and Social Change
    View all citing articles on Scopus
    View full text