Abstract
Bankruptcy is one of the most critical financial problems that reflects the company’s failure. From a machine learning perspective, the problem of bankruptcy prediction is considered a challenging one mainly because of the highly imbalanced distribution of the classes in the datasets. Therefore, developing an efficient prediction model that is able to detect the risky situation of a company is a challenging and complex task. To tackle this problem, in this paper, we propose a hybrid approach that combines the synthetic minority oversampling technique with ensemble methods. Moreover, we apply five different feature selection methods to find out what are the most dominant attributes on bankruptcy prediction. The proposed approach is evaluated based on a real dataset collected from Spanish companies. The conducted experiments show promising results, which prove that the proposed approach can be used as an efficient alternative in case of highly imbalanced datasets.
Similar content being viewed by others
Notes
Bought from http://infotel.es.
References
Adnan Aziz, M., Dar, H.A.: Predicting corporate bankruptcy: where we stand? Corp. Gov. Int. J. Bus. Soc. 6(1), 18–33 (2006)
Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Mach. Learn. 6(1), 37–66 (1991)
Alejo, R., García, V., Marqués, A., Sánchez, J., Antonio-Velázquez, J.: Making accurate credit risk predictions with cost-sensitive MLP neural networks. In: Management Intelligent Systems. Springer, Berlin, pp. 1–8 (2013)
Alfaro-Cid, E., Castillo, P., Esparcia, A., Sharman, K., Merelo, J., Prieto, A., Mora, A.M., Laredo, J. L.J.: Comparing multiobjective evolutionary ensembles for minimizing type I and II errors for bankruptcy prediction. In: Evolutionary Computation, 2008. CEC 2008. (IEEE World Congress on Computational Intelligence), pp. 2902–2908 (2008)
Alhaj, T.A., Siraj, M.M., Zainal, A., Elshoush, H.T., Elhaj, F.: Feature selection using information gain for improved structural-based alert correlation. PloS one 11(11), e0166017 (2016)
Altman, E.I.: Financial ratios, discriminant analysis and the prediction of corporate bankruptcy. J. Finance 23(4), 589–609 (1968)
Amjadian, S., Pardegi, K., et al.: New approach to bankruptcy prediction using genetic algorithm. Int. J. Comput. Appl. 44(4), 34–38 (2012)
Aoki, S., Hosonuma, Y.: Bankruptcy prediction using decision tree. In: The Application of Econophysics. Springer, Berlin, pp. 299–302 (2004)
Barboza, F., Kimura, H., Altman, E.: Machine learning models and bankruptcy prediction. Expert Syst. Appl. 83, 405–417 (2017)
Beaver, W.H.: Financial ratios as predictors of failure. J. Account. Res. 4, 71–111 (1966)
Brabazon, A., Keenan, P.B.: A hybrid genetic model for the prediction of corporate failure. Comput. Manag. Sci. 1(3–4), 293–310 (2004)
Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Castillo, P.A., Mora, A.M., Faris, H., Merelo, J., García-Sánchez, P., Fernández-Ares, A.J., De las Cuevas, P., García-Arenas, M.I.: Applying computational intelligence methods for predicting the sales of newly published books in a real editorial business management environment. Knowl. Based Syst. 115, 133–151 (2017)
Chandrashekar, G., Sahin, F.: A survey on feature selection methods. Comput. Electr. Eng. 40(1), 16–28 (2014)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Chawla, N.V., Japkowicz, N., Kotcz, A.: Special issue on learning from imbalanced data sets. ACM SIGKDD Explor. Newsl. 6(1), 1–6 (2004)
Chen, M.-Y.: Bankruptcy prediction in firms with statistical and intelligent techniques and a comparison of evolutionary computation approaches. Comput. Math. Appl. 62(12), 4514–4524 (2011)
Chen, N., Chen, A., Ribeiro, B.: Influence of class distribution on cost-sensitive learning: a case study of bankruptcy analysis. Intell. Data Anal. 17(3), 423–437 (2013)
Cover, T.M., Thomas, J.A.: Elements of Information Theory, 2nd edn, p. 776. Wiley, Hoboken, New Jersey (2006)
Dietterich, T.G.: Ensemble methods in machine learning. In: International Workshop on Multiple Classifier Systems. Springer, Berlin, pp. 1–15 (2000)
Drummond, C., Holte, R.C., et al.: C4.5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling. In: Workshop on Learning from Imbalanced Datasets II. Vol. 11. Citeseer (2003)
Fatourechi, M., Ward, R.K., Mason, S.G., Huggins, J., Schlögl, A., Birch, G.E.: Comparison of evaluation metrics in classification applications with imbalanced datasets. In: Seventh International Conference on Machine Learning and Applications, 2008. ICMLA’08, pp. 777–782 (2008)
Fayyad, U., Piatetsky-Shapiro, G., Smyth, P.: From data mining to knowledge discovery in databases. AI Mag. 17(3), 37 (1996)
Freund, Y., Mason, L.: The alternating decision tree learning algorithm. In: ICML, vol. 99. pp. 124–133 (1999)
Freund, Y., Schapire, R., Abe, N.: A short introduction to boosting. J. Jpn. Soc. Artif. Intell. 14(771–780), 1612 (1999)
Galathiya, A., Ganatra, A., Bhensdadia, C.: Classification with an improved decision tree algorithm. Int. J. Comput. Appl. 46(23), 1–6 (2012)
García, V., Marqués, A.I., Sánchez, J.S.: Exploring the synergetic effects of sample types on the performance of ensembles for credit risk and corporate bankruptcy prediction. Inf. Fusion 47, 88–101 (2019)
Gopika, D., Azhagusundari, B.: A novel approach on ensemble classifiers with fast rotation forest algorithm. Int. J. Innov. Res. Comput. Commun. Eng. 2, 5380–5387 (2014)
Hall, M.A.: Correlation-based feature selection for machine learning. Ph.D. thesis, The University of Waikato (1999)
Hall, M.A.: Correlation-based feature subset selection for machine learning. Ph. D. dissertation, Univ. Waikato, Waikato, New Zealand (1999)
Hall, M.A.: Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of the Seventeenth International Conference on Machine Learning, pp. 359–366. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2000)
Hall, M.A., Smith, L.A.: Feature subset selection: a correlation based filter approach. In: Proceedings of international conference on neural information processing and intelligent information systems, pp 855–858 (1997)
Han, H., Wang, W., Mao, B.: Borderline-smote: a new over-sampling method in imbalanced data sets learning. In: 2005 International Conference on Intelligent Computing (ICIC05). Lecture Notes on Computer Science, vol. 3644. Springer, New York, pp. 878–887 (2005)
Han, H., Wang, W.-Y., Mao, B.-H.: Borderline-smote: a new over-sampling method in imbalanced data sets learning. In: Advances in Intelligent Computing, pp. 878–887 (2005)
Han, J., Pei, J., Kamber, M.: Data Mining: Concepts and Techniques. Elsevier, Amsterdam (2011)
He, H., Bai, Y., Garcia, E., Li, S.: Adasyn: adaptive synthetic sampling approach for imbalanced learning. In: 2008 International Joint Conference on Neural Networks (IJCNN08). pp. 1322–1328 (2008)
Hecht-Nielsen, R., et al.: Theory of the backpropagation neural network. Neural Netw. 1(Supplement–1), 445–448 (1988)
Hosaka, T.: Bankruptcy prediction using imaged financial ratios and convolutional neural networks. Expert Syst. Appl. 117, 287–299 (2019)
Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. Intell. Data Anal. 6(5), 429–449 (2002)
Jawazneh, H., Mora, A., Castillo, P.: Predicting the financial status of companies using data balancing and classification methods. In: International Work-Conference on Time Series (ITISE 2017). Godel Impresiones Digitales S.L, Granada, Spain, pp. 661–673 (September 2017)
Jayanthi, S., Sasikala, S.: Reptree classifier for identifying link spam in web search engines. IJSC 3(2), 498–505 (2013)
Jeni, L.A., Cohn, J.F., De La Torre, F.: Facing imbalanced data-recommendations for the use of performance metrics. In: 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction (ACII), pp. 245–251 (2013)
Jiang, S.-Y., Wang, L.-X.: Efficient feature selection based on correlation measure between continuous and discrete features. Inf. Process. Lett. 116(2), 203–215 (2016)
Kalmegh, S.: Analysis of weka data mining algorithm reptree, simple cart and randomtree for classification of indian news. Int. J. Innov. Sci. Eng. Technol. 2(2), 438–46 (2015)
Kim, H.-J., Jo, N.-O., Shin, K.-S.: Optimization of cluster-based evolutionary undersampling for the artificial neural networks in corporate bankruptcy prediction. Expert Syst. Appl. 59, 226–234 (2016)
Kim, M.-J., Kang, D.-K., Kim, H.B.: Geometric mean based boosting algorithm with over-sampling to resolve data imbalance problem for bankruptcy prediction. Expert Syst. Appl. 42(3), 1074–1082 (2015)
Kononenko, I.: Estimating attributes: analysis and extensions of relief. In: European Conference on Machine Learning. Springer, Berlin, pp. 171–182 (1994)
Kotsiantis, S., Kanellopoulos, D., Pintelas, P., et al.: Handling imbalanced datasets: a review. GESTS Int. Trans. Comput. Sci. Eng. 30(1), 25–36 (2006)
Kuncheva, L.I., Rodríguez, J.J.: An experimental study on rotation forest ensembles. In: International Workshop on Multiple Classifier Systems. Springer, Berlin, pp. 459–468 (2007)
Lakshmi Devasena, C.: Comparative analysis of random forest, REP Tree and J48 classifiers for credit risk prediction. In: IJCA Proceedings on International Conference on Communication, Computing and Information Technology ICCCMIT 2014 (3), pp. 30–36 (2015, March)
Le, T., Lee, M., Park, J., Baik, S.: Oversampling techniques for bankruptcy prediction: novel features from a transaction dataset. Symmetry 10(4), 79 (2018)
Le, T., Vo, B., Fujita, H., Nguyen, N.-T., Baik, S.W.: A fast and accurate approach for bankruptcy forecasting using squared logistics loss with GPU-based extreme gradient boosting. Inf. Sci. 494, 294–310 (2019)
Lee, K., Caverlee, J., Webb, S.: Uncovering social spammers: social honeypots+ machine learning. In: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval. ACM, pp. 435–442 (2010)
Liaw, A., Wiener, M.: Classification and regression by random forest. R News 2(3), 18–22 (2002)
Lin, W.-C., Lu, Y.-H., Tsai, C.-F.: Feature selection in single and ensemble learning-based bankruptcy prediction models. Expert Syst. 36(1), e12335 (2019)
Ling, C.X., Sheng, V.S.: Cost-sensitive learning. In: Sammut, C., Webb, G.I. (eds.) Encyclopedia of Machine Learning, pp. 231–235. Springer, New York (2010)
Liu, H., Motoda, H.: Feature extraction, construction and selection: a data mining perspective. Springer, Berlin (1998)
Mai, F., Tian, S., Lee, C., Ma, L.: Deep learning models for bankruptcy prediction using textual disclosures. Eur. J. Oper. Res. 274(2), 743–758 (2019)
Marqués, A., García, V., Sánchez, J.S.: Exploring the behaviour of base classifiers in credit scoring ensembles. Expert Syst. Appl. 39(11), 10244–10250 (2012)
Marqués, A.I., García, V., Sánchez, J.S.: On the suitability of resampling techniques for the class imbalance problem in credit scoring. J. Oper. Res. Soc. 64(7), 1060–1070 (2013)
McCallum, A., Nigam, K., et al.: A comparison of event models for naive Bayes text classification. In: AAAI-98 Workshop on Learning for Text Categorization. vol. 752. Citeseer, pp. 41–48 (1998)
Melville, P.: Creating Diverse Ensemble Classifiers. University of Texas at Austin, Computer Science Department (2003)
Min, J.H., Lee, Y.-C.: Bankruptcy prediction using support vector machine with optimal choice of kernel function parameters. Expert Syst. Appl. 28(4), 603–614 (2005)
Mora, A.M., Herrera, L.J., Urquiza, J., Rojas, I., Merelo, J.: Applying support vector machines and mutual information to book losses prediction. In: The 2010 International Joint Conference on Neural Networks (IJCNN), pp. 1–7 (2010)
Novaković, J., Strbac, P., Bulatović, D.: Toward optimal feature selection using ranking methods and classification algorithms. Yugosl. J. Oper. Res. 21(1), 119–135 (2011)
Ohlson, J.A.: Financial ratios and the probabilistic prediction of bankruptcy. J. Account. Res. 18, 109–131 (1980)
Opitz, D.W.: Feature selection for ensembles. In: AAAI/IAAI, pp. 379–384 (1999)
Pal, S.K., Mitra, S.: Multilayer perceptron, fuzzy sets, and classification. IEEE Trans. Neural Netw. 3(5), 683–697 (1992)
Pandya, R., Pandya, J.: C5.0 algorithm to improved decision tree with feature selection and reduced error pruning. Int. J. Comput. Appl. 117(16), 18–21 (2015)
Park, H., Kwon, H.-C.: Extended relief algorithms in instance-based feature filtering. In: Sixth International Conference on Advanced Language Processing and Web Information Technology, 2007. ALPIT 2007. pp. 123–128 (2007)
Patro, S., Sahu, K.K.: Normalization: a preprocessing stage (2015). arXiv preprint arXiv:1503.06462
Rodan, A., Castillo, P., Faris, H., Al-Zoubi, A.M., Mora, A., Jawazneh, H.: Forecasting business failure in highly imbalanced distribution based on delay line reservoir. In: European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESSAN, 2018). i6doc Publishers, Bruges, Belgium, pp. 431–436 (2018)
Rodriguez, J.J., Kuncheva, L.I., Alonso, C.J.: Rotation forest: a new classifier ensemble method. IEEE Trans. Pattern Anal. Mach. Intell. 28(10), 1619–1630 (2006)
Rosario, S.F., Thangadurai, K.: RELIEF: feature selection approach. Int. J. Innov. Res. Dev. 4(11) ISSN 2278-0211 (2015)
Schapire, R.E.: The strength of weak learnability. Mach. Learn. 5(2), 197–227 (1990)
Sewell, M.: Ensemble learning. RN 11(02) (2008)
Shin, K.-S., Lee, Y.-J.: A genetic algorithm application in bankruptcy prediction modeling. Expert Syst. Appl. 23(3), 321–328 (2002)
Singh, A., Purohit, A.: A survey on methods for solving data imbalance problem for classification. Int. J. Comput. Appl. 127(15), 37–41 (2015)
Sokolova, M., Lapalme, G.: A systematic analysis of performance measures for classification tasks. Inf. Process. Manag. 45(4), 427–437 (2009)
Tian, S., Yu, Y., Zhou, M.: Data sample selection issues for bankruptcy prediction. Risk Hazards Crisis Public Policy 6(1), 91–116 (2015)
Tsai, C.-F.: Feature selection in bankruptcy prediction. Knowl. Based Syst. 22(2), 120–127 (2009)
Tsai, C.-F., Cheng, K.-C.: Simple instance selection for bankruptcy prediction. Knowl. Based Syst. 27, 333–342 (2012)
Van Hulse, J., Khoshgoftaar, T.M., Napolitano, A.: Experimental perspectives on learning from imbalanced data. In: Proceedings of the 24th International Conference on Machine Learning. ACM, pp. 935–942 (2007)
Veganzones, D., Séverin, E.: An investigation of bankruptcy prediction in imbalanced datasets. Decis. Support Syst. 112, 111–124 (2018)
Wang, S., Yao, X.: Diversity analysis on imbalanced data sets by using ensemble models. In: IEEE Symposium on Computational Intelligence and Data Mining, 2009. CIDM’09. pp. 324–331 (2009)
Wilson, R.L., Sharda, R.: Bankruptcy prediction using neural networks. Decis. Support Syst. 11(5), 545–557 (1994)
Witten, I.H., Frank, E., Hall, M.A., Pal, C.J.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, Los Altos (2016)
Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: ICML. vol. 97, pp. 412–420 (1997)
Zebardast, M., Javid, D., Taherinia, M.: The use of artificial neural network in predicting bankruptcy and its comparison with genetic algorithm in firms accepted in Tehran Stock Exchange. J. Novel Appl. Sci. 3(2), 151–160 (2014)
Acknowledgements
This work has been partially funded by projects TIN2017-85727-C4-2-P, RTI2018-102002-A-I00 (Spanish Ministry of Science, Innovation and Universities) and TEC2015-68752 (Spanish Ministry of Economy and Competitiveness \(+\) FEDER).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Faris, H., Abukhurma, R., Almanaseer, W. et al. Improving financial bankruptcy prediction in a highly imbalanced class distribution using oversampling and ensemble learning: a case from the Spanish market. Prog Artif Intell 9, 31–53 (2020). https://doi.org/10.1007/s13748-019-00197-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13748-019-00197-9