Improving financial bankruptcy prediction in a highly imbalanced class distribution using oversampling and ensemble learning: a case from the Spanish market

Faris, Hossam; Abukhurma, Ruba; Almanaseer, Waref; Saadeh, Mohammed; Mora, Antonio M.; Castillo, Pedro A.; Aljarah, Ibrahim

doi:10.1007/s13748-019-00197-9

Improving financial bankruptcy prediction in a highly imbalanced class distribution using oversampling and ensemble learning: a case from the Spanish market

Regular Paper
Published: 06 July 2019

Volume 9, pages 31–53, (2020)
Cite this article

Progress in Artificial Intelligence Aims and scope Submit manuscript

Hossam Faris¹,
Ruba Abukhurma¹,
Waref Almanaseer¹,
Mohammed Saadeh¹,
Antonio M. Mora²,
Pedro A. Castillo³ &
…
Ibrahim Aljarah ORCID: orcid.org/0000-0002-9265-9819¹

1495 Accesses
50 Citations
Explore all metrics

Abstract

Bankruptcy is one of the most critical financial problems that reflects the company’s failure. From a machine learning perspective, the problem of bankruptcy prediction is considered a challenging one mainly because of the highly imbalanced distribution of the classes in the datasets. Therefore, developing an efficient prediction model that is able to detect the risky situation of a company is a challenging and complex task. To tackle this problem, in this paper, we propose a hybrid approach that combines the synthetic minority oversampling technique with ensemble methods. Moreover, we apply five different feature selection methods to find out what are the most dominant attributes on bankruptcy prediction. The proposed approach is evaluated based on a real dataset collected from Spanish companies. The conducted experiments show promising results, which prove that the proposed approach can be used as an efficient alternative in case of highly imbalanced datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Combining weighted SMOTE with ensemble learning for the class-imbalanced prediction of small business credit risk

Article Open access 04 January 2022

One-class ensemble classifier for data imbalance problems

Article 27 July 2021

Exhaustive Search for Weighted Ensemble Classifiers to Improve Performance on Imbalanced Dataset

Notes

Bought from http://infotel.es.

References

Adnan Aziz, M., Dar, H.A.: Predicting corporate bankruptcy: where we stand? Corp. Gov. Int. J. Bus. Soc. 6(1), 18–33 (2006)
Google Scholar
Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Mach. Learn. 6(1), 37–66 (1991)
Google Scholar
Alejo, R., García, V., Marqués, A., Sánchez, J., Antonio-Velázquez, J.: Making accurate credit risk predictions with cost-sensitive MLP neural networks. In: Management Intelligent Systems. Springer, Berlin, pp. 1–8 (2013)
Google Scholar
Alfaro-Cid, E., Castillo, P., Esparcia, A., Sharman, K., Merelo, J., Prieto, A., Mora, A.M., Laredo, J. L.J.: Comparing multiobjective evolutionary ensembles for minimizing type I and II errors for bankruptcy prediction. In: Evolutionary Computation, 2008. CEC 2008. (IEEE World Congress on Computational Intelligence), pp. 2902–2908 (2008)
Alhaj, T.A., Siraj, M.M., Zainal, A., Elshoush, H.T., Elhaj, F.: Feature selection using information gain for improved structural-based alert correlation. PloS one 11(11), e0166017 (2016)
Google Scholar
Altman, E.I.: Financial ratios, discriminant analysis and the prediction of corporate bankruptcy. J. Finance 23(4), 589–609 (1968)
Google Scholar
Amjadian, S., Pardegi, K., et al.: New approach to bankruptcy prediction using genetic algorithm. Int. J. Comput. Appl. 44(4), 34–38 (2012)
Google Scholar
Aoki, S., Hosonuma, Y.: Bankruptcy prediction using decision tree. In: The Application of Econophysics. Springer, Berlin, pp. 299–302 (2004)
Google Scholar
Barboza, F., Kimura, H., Altman, E.: Machine learning models and bankruptcy prediction. Expert Syst. Appl. 83, 405–417 (2017)
Google Scholar
Beaver, W.H.: Financial ratios as predictors of failure. J. Account. Res. 4, 71–111 (1966)
Google Scholar
Brabazon, A., Keenan, P.B.: A hybrid genetic model for the prediction of corporate failure. Comput. Manag. Sci. 1(3–4), 293–310 (2004)
MATH Google Scholar
Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)
MATH Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
MATH Google Scholar
Castillo, P.A., Mora, A.M., Faris, H., Merelo, J., García-Sánchez, P., Fernández-Ares, A.J., De las Cuevas, P., García-Arenas, M.I.: Applying computational intelligence methods for predicting the sales of newly published books in a real editorial business management environment. Knowl. Based Syst. 115, 133–151 (2017)
Google Scholar
Chandrashekar, G., Sahin, F.: A survey on feature selection methods. Comput. Electr. Eng. 40(1), 16–28 (2014)
Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
MATH Google Scholar
Chawla, N.V., Japkowicz, N., Kotcz, A.: Special issue on learning from imbalanced data sets. ACM SIGKDD Explor. Newsl. 6(1), 1–6 (2004)
Google Scholar
Chen, M.-Y.: Bankruptcy prediction in firms with statistical and intelligent techniques and a comparison of evolutionary computation approaches. Comput. Math. Appl. 62(12), 4514–4524 (2011)
MathSciNet MATH Google Scholar
Chen, N., Chen, A., Ribeiro, B.: Influence of class distribution on cost-sensitive learning: a case study of bankruptcy analysis. Intell. Data Anal. 17(3), 423–437 (2013)
Google Scholar
Cover, T.M., Thomas, J.A.: Elements of Information Theory, 2nd edn, p. 776. Wiley, Hoboken, New Jersey (2006)
MATH Google Scholar
Dietterich, T.G.: Ensemble methods in machine learning. In: International Workshop on Multiple Classifier Systems. Springer, Berlin, pp. 1–15 (2000)
Google Scholar
Drummond, C., Holte, R.C., et al.: C4.5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling. In: Workshop on Learning from Imbalanced Datasets II. Vol. 11. Citeseer (2003)
Fatourechi, M., Ward, R.K., Mason, S.G., Huggins, J., Schlögl, A., Birch, G.E.: Comparison of evaluation metrics in classification applications with imbalanced datasets. In: Seventh International Conference on Machine Learning and Applications, 2008. ICMLA’08, pp. 777–782 (2008)
Fayyad, U., Piatetsky-Shapiro, G., Smyth, P.: From data mining to knowledge discovery in databases. AI Mag. 17(3), 37 (1996)
Google Scholar
Freund, Y., Mason, L.: The alternating decision tree learning algorithm. In: ICML, vol. 99. pp. 124–133 (1999)
Freund, Y., Schapire, R., Abe, N.: A short introduction to boosting. J. Jpn. Soc. Artif. Intell. 14(771–780), 1612 (1999)
Google Scholar
Galathiya, A., Ganatra, A., Bhensdadia, C.: Classification with an improved decision tree algorithm. Int. J. Comput. Appl. 46(23), 1–6 (2012)
Google Scholar
García, V., Marqués, A.I., Sánchez, J.S.: Exploring the synergetic effects of sample types on the performance of ensembles for credit risk and corporate bankruptcy prediction. Inf. Fusion 47, 88–101 (2019)
Google Scholar
Gopika, D., Azhagusundari, B.: A novel approach on ensemble classifiers with fast rotation forest algorithm. Int. J. Innov. Res. Comput. Commun. Eng. 2, 5380–5387 (2014)
Google Scholar
Hall, M.A.: Correlation-based feature selection for machine learning. Ph.D. thesis, The University of Waikato (1999)
Hall, M.A.: Correlation-based feature subset selection for machine learning. Ph. D. dissertation, Univ. Waikato, Waikato, New Zealand (1999)
Hall, M.A.: Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of the Seventeenth International Conference on Machine Learning, pp. 359–366. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2000)
Hall, M.A., Smith, L.A.: Feature subset selection: a correlation based filter approach. In: Proceedings of international conference on neural information processing and intelligent information systems, pp 855–858 (1997)
Han, H., Wang, W., Mao, B.: Borderline-smote: a new over-sampling method in imbalanced data sets learning. In: 2005 International Conference on Intelligent Computing (ICIC05). Lecture Notes on Computer Science, vol. 3644. Springer, New York, pp. 878–887 (2005)
Google Scholar
Han, H., Wang, W.-Y., Mao, B.-H.: Borderline-smote: a new over-sampling method in imbalanced data sets learning. In: Advances in Intelligent Computing, pp. 878–887 (2005)
Google Scholar
Han, J., Pei, J., Kamber, M.: Data Mining: Concepts and Techniques. Elsevier, Amsterdam (2011)
MATH Google Scholar
He, H., Bai, Y., Garcia, E., Li, S.: Adasyn: adaptive synthetic sampling approach for imbalanced learning. In: 2008 International Joint Conference on Neural Networks (IJCNN08). pp. 1322–1328 (2008)
Hecht-Nielsen, R., et al.: Theory of the backpropagation neural network. Neural Netw. 1(Supplement–1), 445–448 (1988)
Google Scholar
Hosaka, T.: Bankruptcy prediction using imaged financial ratios and convolutional neural networks. Expert Syst. Appl. 117, 287–299 (2019)
Google Scholar
Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. Intell. Data Anal. 6(5), 429–449 (2002)
MATH Google Scholar
Jawazneh, H., Mora, A., Castillo, P.: Predicting the financial status of companies using data balancing and classification methods. In: International Work-Conference on Time Series (ITISE 2017). Godel Impresiones Digitales S.L, Granada, Spain, pp. 661–673 (September 2017)
Jayanthi, S., Sasikala, S.: Reptree classifier for identifying link spam in web search engines. IJSC 3(2), 498–505 (2013)
Google Scholar
Jeni, L.A., Cohn, J.F., De La Torre, F.: Facing imbalanced data-recommendations for the use of performance metrics. In: 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction (ACII), pp. 245–251 (2013)
Jiang, S.-Y., Wang, L.-X.: Efficient feature selection based on correlation measure between continuous and discrete features. Inf. Process. Lett. 116(2), 203–215 (2016)
MathSciNet MATH Google Scholar
Kalmegh, S.: Analysis of weka data mining algorithm reptree, simple cart and randomtree for classification of indian news. Int. J. Innov. Sci. Eng. Technol. 2(2), 438–46 (2015)
Google Scholar
Kim, H.-J., Jo, N.-O., Shin, K.-S.: Optimization of cluster-based evolutionary undersampling for the artificial neural networks in corporate bankruptcy prediction. Expert Syst. Appl. 59, 226–234 (2016)
Google Scholar
Kim, M.-J., Kang, D.-K., Kim, H.B.: Geometric mean based boosting algorithm with over-sampling to resolve data imbalance problem for bankruptcy prediction. Expert Syst. Appl. 42(3), 1074–1082 (2015)
Google Scholar
Kononenko, I.: Estimating attributes: analysis and extensions of relief. In: European Conference on Machine Learning. Springer, Berlin, pp. 171–182 (1994)
Google Scholar
Kotsiantis, S., Kanellopoulos, D., Pintelas, P., et al.: Handling imbalanced datasets: a review. GESTS Int. Trans. Comput. Sci. Eng. 30(1), 25–36 (2006)
Google Scholar
Kuncheva, L.I., Rodríguez, J.J.: An experimental study on rotation forest ensembles. In: International Workshop on Multiple Classifier Systems. Springer, Berlin, pp. 459–468 (2007)
Lakshmi Devasena, C.: Comparative analysis of random forest, REP Tree and J48 classifiers for credit risk prediction. In: IJCA Proceedings on International Conference on Communication, Computing and Information Technology ICCCMIT 2014 (3), pp. 30–36 (2015, March)
Le, T., Lee, M., Park, J., Baik, S.: Oversampling techniques for bankruptcy prediction: novel features from a transaction dataset. Symmetry 10(4), 79 (2018)
Google Scholar
Le, T., Vo, B., Fujita, H., Nguyen, N.-T., Baik, S.W.: A fast and accurate approach for bankruptcy forecasting using squared logistics loss with GPU-based extreme gradient boosting. Inf. Sci. 494, 294–310 (2019)
Google Scholar
Lee, K., Caverlee, J., Webb, S.: Uncovering social spammers: social honeypots+ machine learning. In: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval. ACM, pp. 435–442 (2010)
Liaw, A., Wiener, M.: Classification and regression by random forest. R News 2(3), 18–22 (2002)
Google Scholar
Lin, W.-C., Lu, Y.-H., Tsai, C.-F.: Feature selection in single and ensemble learning-based bankruptcy prediction models. Expert Syst. 36(1), e12335 (2019)
Google Scholar
Ling, C.X., Sheng, V.S.: Cost-sensitive learning. In: Sammut, C., Webb, G.I. (eds.) Encyclopedia of Machine Learning, pp. 231–235. Springer, New York (2010)
Google Scholar
Liu, H., Motoda, H.: Feature extraction, construction and selection: a data mining perspective. Springer, Berlin (1998)
MATH Google Scholar
Mai, F., Tian, S., Lee, C., Ma, L.: Deep learning models for bankruptcy prediction using textual disclosures. Eur. J. Oper. Res. 274(2), 743–758 (2019)
Google Scholar
Marqués, A., García, V., Sánchez, J.S.: Exploring the behaviour of base classifiers in credit scoring ensembles. Expert Syst. Appl. 39(11), 10244–10250 (2012)
Google Scholar
Marqués, A.I., García, V., Sánchez, J.S.: On the suitability of resampling techniques for the class imbalance problem in credit scoring. J. Oper. Res. Soc. 64(7), 1060–1070 (2013)
Google Scholar
McCallum, A., Nigam, K., et al.: A comparison of event models for naive Bayes text classification. In: AAAI-98 Workshop on Learning for Text Categorization. vol. 752. Citeseer, pp. 41–48 (1998)
Melville, P.: Creating Diverse Ensemble Classifiers. University of Texas at Austin, Computer Science Department (2003)
Min, J.H., Lee, Y.-C.: Bankruptcy prediction using support vector machine with optimal choice of kernel function parameters. Expert Syst. Appl. 28(4), 603–614 (2005)
Google Scholar
Mora, A.M., Herrera, L.J., Urquiza, J., Rojas, I., Merelo, J.: Applying support vector machines and mutual information to book losses prediction. In: The 2010 International Joint Conference on Neural Networks (IJCNN), pp. 1–7 (2010)
Novaković, J., Strbac, P., Bulatović, D.: Toward optimal feature selection using ranking methods and classification algorithms. Yugosl. J. Oper. Res. 21(1), 119–135 (2011)
MathSciNet MATH Google Scholar
Ohlson, J.A.: Financial ratios and the probabilistic prediction of bankruptcy. J. Account. Res. 18, 109–131 (1980)
Google Scholar
Opitz, D.W.: Feature selection for ensembles. In: AAAI/IAAI, pp. 379–384 (1999)
Pal, S.K., Mitra, S.: Multilayer perceptron, fuzzy sets, and classification. IEEE Trans. Neural Netw. 3(5), 683–697 (1992)
Google Scholar
Pandya, R., Pandya, J.: C5.0 algorithm to improved decision tree with feature selection and reduced error pruning. Int. J. Comput. Appl. 117(16), 18–21 (2015)
Google Scholar
Park, H., Kwon, H.-C.: Extended relief algorithms in instance-based feature filtering. In: Sixth International Conference on Advanced Language Processing and Web Information Technology, 2007. ALPIT 2007. pp. 123–128 (2007)
Patro, S., Sahu, K.K.: Normalization: a preprocessing stage (2015). arXiv preprint arXiv:1503.06462
Rodan, A., Castillo, P., Faris, H., Al-Zoubi, A.M., Mora, A., Jawazneh, H.: Forecasting business failure in highly imbalanced distribution based on delay line reservoir. In: European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESSAN, 2018). i6doc Publishers, Bruges, Belgium, pp. 431–436 (2018)
Rodriguez, J.J., Kuncheva, L.I., Alonso, C.J.: Rotation forest: a new classifier ensemble method. IEEE Trans. Pattern Anal. Mach. Intell. 28(10), 1619–1630 (2006)
Google Scholar
Rosario, S.F., Thangadurai, K.: RELIEF: feature selection approach. Int. J. Innov. Res. Dev. 4(11) ISSN 2278-0211 (2015)
Schapire, R.E.: The strength of weak learnability. Mach. Learn. 5(2), 197–227 (1990)
Google Scholar
Sewell, M.: Ensemble learning. RN 11(02) (2008)
Shin, K.-S., Lee, Y.-J.: A genetic algorithm application in bankruptcy prediction modeling. Expert Syst. Appl. 23(3), 321–328 (2002)
Google Scholar
Singh, A., Purohit, A.: A survey on methods for solving data imbalance problem for classification. Int. J. Comput. Appl. 127(15), 37–41 (2015)
Google Scholar
Sokolova, M., Lapalme, G.: A systematic analysis of performance measures for classification tasks. Inf. Process. Manag. 45(4), 427–437 (2009)
Google Scholar
Tian, S., Yu, Y., Zhou, M.: Data sample selection issues for bankruptcy prediction. Risk Hazards Crisis Public Policy 6(1), 91–116 (2015)
Google Scholar
Tsai, C.-F.: Feature selection in bankruptcy prediction. Knowl. Based Syst. 22(2), 120–127 (2009)
Google Scholar
Tsai, C.-F., Cheng, K.-C.: Simple instance selection for bankruptcy prediction. Knowl. Based Syst. 27, 333–342 (2012)
Google Scholar
Van Hulse, J., Khoshgoftaar, T.M., Napolitano, A.: Experimental perspectives on learning from imbalanced data. In: Proceedings of the 24th International Conference on Machine Learning. ACM, pp. 935–942 (2007)
Veganzones, D., Séverin, E.: An investigation of bankruptcy prediction in imbalanced datasets. Decis. Support Syst. 112, 111–124 (2018)
Google Scholar
Wang, S., Yao, X.: Diversity analysis on imbalanced data sets by using ensemble models. In: IEEE Symposium on Computational Intelligence and Data Mining, 2009. CIDM’09. pp. 324–331 (2009)
Wilson, R.L., Sharda, R.: Bankruptcy prediction using neural networks. Decis. Support Syst. 11(5), 545–557 (1994)
Google Scholar
Witten, I.H., Frank, E., Hall, M.A., Pal, C.J.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, Los Altos (2016)
Google Scholar
Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: ICML. vol. 97, pp. 412–420 (1997)
Zebardast, M., Javid, D., Taherinia, M.: The use of artificial neural network in predicting bankruptcy and its comparison with genetic algorithm in firms accepted in Tehran Stock Exchange. J. Novel Appl. Sci. 3(2), 151–160 (2014)
Google Scholar

Download references

Acknowledgements

This work has been partially funded by projects TIN2017-85727-C4-2-P, RTI2018-102002-A-I00 (Spanish Ministry of Science, Innovation and Universities) and TEC2015-68752 (Spanish Ministry of Economy and Competitiveness \(+\) FEDER).

Author information

Authors and Affiliations

King Abdullah II School for Information Technology, The University of Jordan, Amman, Jordan
Hossam Faris, Ruba Abukhurma, Waref Almanaseer, Mohammed Saadeh & Ibrahim Aljarah
Department of Signal Theory, Telematics and Communications, ETSIIT and CITIC, University of Granada, Granada, Spain
Antonio M. Mora
Department of Computer Architecture and Technology, ETSIIT and CITIC, University of Granada, Granada, Spain
Pedro A. Castillo

Authors

Hossam Faris
View author publications
You can also search for this author in PubMed Google Scholar
Ruba Abukhurma
View author publications
You can also search for this author in PubMed Google Scholar
Waref Almanaseer
View author publications
You can also search for this author in PubMed Google Scholar
Mohammed Saadeh
View author publications
You can also search for this author in PubMed Google Scholar
Antonio M. Mora
View author publications
You can also search for this author in PubMed Google Scholar
Pedro A. Castillo
View author publications
You can also search for this author in PubMed Google Scholar
Ibrahim Aljarah
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ibrahim Aljarah.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Note that the numbers in brackets in the following tables indicate the standard deviations. Also note that the names of the ensemble approaches in the tables are reported as follows ‘Ensemble technique/base learner (best number of iterations)’ (Tables 5, 6, 7 and Figs. 8, 9, 10).

Table 5 Results of bankruptcy prediction without re-sampling

Full size table

Table 6 Results of bankruptcy prediction with re-sampling

Full size table

Table 7 AB-Rep tree with re-sampling based on top selected attributes (feature selection)

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Faris, H., Abukhurma, R., Almanaseer, W. et al. Improving financial bankruptcy prediction in a highly imbalanced class distribution using oversampling and ensemble learning: a case from the Spanish market. Prog Artif Intell 9, 31–53 (2020). https://doi.org/10.1007/s13748-019-00197-9

Download citation

Received: 20 February 2019
Accepted: 26 June 2019
Published: 06 July 2019
Issue Date: March 2020
DOI: https://doi.org/10.1007/s13748-019-00197-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improving financial bankruptcy prediction in a highly imbalanced class distribution using oversampling and ensemble learning: a case from the Spanish market

Abstract

Access this article

Similar content being viewed by others

Combining weighted SMOTE with ensemble learning for the class-imbalanced prediction of small business credit risk

One-class ensemble classifier for data imbalance problems

Exhaustive Search for Weighted Ensemble Classifiers to Improve Performance on Imbalanced Dataset

Notes

References

Acknowledgements