Predicting corporate financial distress based on integration of decision tree classification and logistic regression
Highlights
► Our empirical results show that PCA increases the error of classifying companies that are in a financial crisis as normal companies. ► The Decision Tree classification approach obtains better prediction accuracy in short run. ► The Logistic Regression approach gets better prediction accuracy in long run.
Introduction
Recently, one of the most attractive business news is a series of financial crisis events related to the public companies. Some of these companies are famous and also at high stock prices, originally (e.g. Enron Corp., Kmart Corp., WorldCom Corp., Lehman Brothers Bank, etc.). In consequence of the financial crisis, it is always too late for many creditors to withdraw their loans, as well as for investors to sell their own stocks, futures, or options. Therefore, corporate bankruptcy is a very important economic phenomenon and also affects the economy of every country. In Taiwan, domestic and foreign capital markets have developed rapidly in recent years, gradually giving people the idea of making a financial investment. Nevertheless, Procomp Corp. and Cdbank Corp. bankruptcy events have also caused tremendous disorder in the financial market and related industries are also affected by these economic shocks in Taiwan. The number of bankruptcy firms is important for the economy of a country and it can be viewed as an indictor of the development and robustness of the economy (Zopounidis & Dimitras, 1998). The high individual, economic, and social costs encountered in corporate failures or bankruptcies have spurred searches for better understanding and prediction capability (McKee & Lensberg, 2002). Therefore, forecasting corporate financial distress plays an increasingly important role in today’s society since it has a significant impact on lending decisions and the profitability of financial institutions.
A common methodology to bankruptcy prediction is to summarize the literature to search a large set of potential predictive financial and/or non-financial variables and then reduce a set of not significant variables, through traditional mathematical analysis that will predict bankruptcy (Lensberg, Eilifsen, & McKee, 2006). Many traditional classification techniques have been presented to predict financial distress using ratios, e.g., univariate approaches (Beaver, 1966), multivariate approaches, linear multiple discriminant approaches (MDA) (Altman, 1968, Altman et al., 1977), multiple regression (Meyer & Pifer, 1970), logistic regression (Dimitras, Zanakis, & Zopounidis, 1996), factor analysis (Blum, 1974), and stepwise (Laitinen & Laitinen, 2000). However strict assumptions of traditional statistics such as linearity, normality, independence among predictor variables and pre-existing functional form relating to the criterion variable and the predictor variable limit their application in the real world (Hua, Wang, Xu, Zhang, & Liang, 2007).
Therefore, this paper proposes a model of financial distress prediction comparing decision tree (DT) classification and logistic regression (LR) techniques. The main objectives of this paper are to (1) adopt DT and LR techniques to construct a financial distress prediction model, (2) use financial and non-financial ratios to enhance the accuracy of the financial distress prediction model, (3) employ a traditional statistical method (principle component analysis, PCA) to compare the degree of accuracy with that of the artificial intelligent (AI) approach, and (4) to expand this model so that it will work within a financial distress prediction system to provide information to investors as well as investment monitoring organizations. The data for our experiment were collected from the Taiwan Stock Exchange Corporation (TSEC) database.
The rest of this paper is organized as follows. A literature review of related techniques is provided in Section 2. We describe our proposed approach and its capabilities of each step in Section 3. Section 4 presents the process for choosing appropriate variables by PCA. In Section 5, we analyzed the prediction performance of our approach and fulfilled several experiments. Moreover, we compared our results with the DT, and LR approaches in Section 6. Finally, we inference our conclusions and discuss future research in Section 7.
Section snippets
Decision trees algorithm
Data mining (DM), also known as “knowledge discovery in databases” (KDD), is the process of discovering meaningful patterns in huge databases (Han & Kamber, 2001). In addition, it is also an application that can provide significant competitive advantages for making the right decision (Huang, Chen, & Lee, 2007). The more common model functions in the current data mining process include the classification, regression, clustering, association rules, summarization, dependency modeling and sequence
Research methodology
In this research, we compare DT and LR techniques for financial distress prediction (FDP) performance. The research methodology is as shown in Fig. 1. In the FDP Choosing phase, we handle the original huge datasets from the TSEC which will be processed by data pre-processing. Data pre-processing includes cleaning, normalization, transformation, feature extraction and selection. The product of data pre-processing is the final training and testing set. The goal in this phase is to choose the
Data
Our samples contained raw data from 100 Taiwan firms listed in the TSEC. The period of sampling was from 2000 January to May, 2007, amounting to 7 years and 5 months. The 50 firms in financial distress were matched with 50 non-bankruptcy firms. These firms were distinguished as non-bankruptcy based on the absence of any indication or proof concerning the issuing of financial distress in the auditors’ reports. All the variables used in the sample were extracted from formal financial statements,
DT experiments and results
This process uses the finance and non-finance ratios, and constructs a financial distress prediction model after carrying out a second time factor analysis. The variables are then loaded as DT and LR input nodes. In addition, we also apply these experiment parameters to investigate the past 2 seasons, the past 4 seasons, the past 6 seasons, and the past 8 seasons before the financial distress occurred, for the sake of prediction accuracy. In this experiment, we will use the C5.0, CART, CHAID as
The FDP comparing phase
After the implementation for the FDP modeling phase, we will compare the DT and LR approaches with the accuracy rate, Type II error rate, and factor analysis. The detail descriptions will be discussed as following sections.
Conclusions
This research aimed at the financial and the non-financial ratios in the financial statement, and used the DT and the LR models to compare the performance of the financial distress predictions, in order to find a better early-warning method. This research took 50 companies that were facing a financial crisis, and matched them with 50 normal companies of the similar industry. In addition, we adopted the necessary dataset from the TSEC database and sampled them into the past 2, 4, 6, 8 seasons
Acknowledgements
We thank the support of National Scientific Council (NSC) of the Republic of China (ROC) to this work under Grant No. NSC 96-2416-H-018-011. We also gratefully acknowledge the Editor and anonymous reviewers for their valuable comments and constructive suggestions.
References (36)
- et al.
A new model to identify bankruptcy risk of corporations
Journal of Banking and Finance
(1977) - et al.
Forecasting stock market short-term trends using a neuro-fuzzy based methodology
Expert Systems with Applications
(2009) - et al.
Comparison of logistic regression model and classification tree: An application to postpartum depression data
Expert Systems with Applications
(2007) - et al.
Applying decision tree and neural network to increase quality of dermatologic diagnosis
Expert Systems with Applications
(2009) - et al.
A survey of business failure with an emphasis on prediction methods and industrial applications
European Journal of Operational Research
(1996) - et al.
Predicting corporate financial distress based on integration of support vector machine and logistic regression
Expert Systems with Applications
(2007) - et al.
Integrating data mining with case-based reasoning for chronic diseases prognosis and diagnosis
Expert Systems with Applications
(2007) - et al.
Characteristics of firms correcting previously reported quarterly earnings
Journal of Accounting and Economics
(1989) - et al.
Data mining techniques for the detection of fraudulent financial statements
Expert Systems with Applications
(2007) - et al.
Bankruptcy prediction application of the Taylor’s expansion in logistic regression
International Review of Financial Analysis
(2000)
Bankruptcy theory development and classification via genetic programming
European Journal of Operational Research
Majority voting combination of multiple case-based reasoning for financial distress prediction
Expert Systems with Applications
Genetic programming and rough sets: A hybrid approach to bankruptcy classification
European Journal of Operational Research
On the selection of equity securities: An expert systems methodology and an application on the Athens Stock Exchange
Expert Systems with Applications
Financial ratios, discriminant analysis and the prediction of corporate bankruptcy
The Journal of Finance
Financial ratios as predictors of failure, empirical research in accounting: Selected studied
Journal of Accounting Research
Failing company discriminant analysis
Journal of Accounting Research
Classification and regression trees
Cited by (158)
Predicting financial distress using machine learning approaches: Evidence China
2024, Journal of Contemporary Accounting and EconomicsSimultaneous optimal prediction of various influent indexes based on a model fusion algorithm in wastewater treatment plant
2023, Biochemical Engineering JournalContextual combinatorial bandit on portfolio management
2023, Expert Systems with ApplicationsMining semantic features in patent text for financial distress prediction
2023, Technological Forecasting and Social Change