Skip to main content

Financial Distress Prediction in an Imbalanced Data Stream Environment

  • Conference paper
  • First Online:
Hybrid Artificial Intelligent Systems (HAIS 2023)

Abstract

Corporate bankruptcy predictions are crucial to companies, investors, and authorities. However, most bankruptcy prediction studies have been based on stationary models, and they tend to ignore important challenges of financial distress like data non-stationarity, concept drift and data imbalance. This study proposes methods for dealing with these challenges and uses data collected from financial statements quarterly provided by companies to the Securities and Exchange Commission of Brazil (CVM). It is composed of information from 10 years (2011 to 2020), with 905 different corporations and 23,834 records with 82 indicators each. The sample majority have no financial difficulties, and only 651 companies have financial distress. The empirical experiment uses a sliding window, a history and a forgetting mechanism to avoid the degradation of the predictive model due to concept drift. The characteristics of the problem, especially the data imbalance, the performance of the models is measured through AUC, Gmean, and F1-Score and achieved 0.95, 0.68, and 0.58, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://dados.cvm.gov.br/.

  2. 2.

    https://github.com/rubensmchaves/ml-fdp.

References

  1. Agrahari, S., Singh, A.K.: Concept drift detection in data stream mining: a literature review. J. King Saud Univ. Comput. Inf. Sci. (2021). https://doi.org/10.1016/j.jksuci.2021.11.006

  2. Alaka, H.A., et al.: Systematic review of bankruptcy prediction models: towards a framework for tool selection. Expert Syst. Appl. 94, 164–184 (2018). https://doi.org/10.1016/j.eswa.2017.10.040

    Article  Google Scholar 

  3. Alam, T.M., et al.: Corporate bankruptcy prediction: an approach towards better corporate world. Comput. J. 64(11), 1731–1746 (2020). https://doi.org/10.1093/comjnl/bxaa056

    Article  Google Scholar 

  4. Altman, E.I.: Financial ratios, discriminant analysis and the prediction of corporate bankruptcy. J. Financ. 23(4), 589–609 (1968). https://doi.org/10.1111/j.1540-6261.1968.tb00843.x

    Article  Google Scholar 

  5. Barboza, F., Kimura, H., Altman, E.: Machine learning models and bankruptcy prediction. Expert Syst. Appl. 83, 405–417 (2017). https://doi.org/10.1016/j.eswa.2017.04.006

    Article  Google Scholar 

  6. Batista, G.E.A.P.A., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explor. Newsl. 6(1), 20–29 (2004). https://doi.org/10.1145/1007730.1007735

  7. Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn. 30(7), 1145–1159 (1997). https://doi.org/10.1016/S0031-3203(96)00142-2

    Article  Google Scholar 

  8. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16(1), 321–357 (2002). https://doi.org/10.5555/1622407.1622416

    Article  MATH  Google Scholar 

  9. Duarte, F., Jones, C.: Empirical network contagion for U.S. financial institutions. FRB of NY Staff Report 1(826) (2017)

    Google Scholar 

  10. Efrim Boritz, J., Kennedy, D.B.: Effectiveness of neural network types for prediction of business failure. Expert Syst. Appl. 9(4), 503–512 (1995). https://doi.org/10.1016/0957-4174(95)00020-8. https://www.sciencedirect.com/science/article/pii/0957417495000208. Expert systems in accounting, auditing, and finance

  11. Eichengreen, B., Mody, A., Nedeljkovic, M., Sarno, L.: How the subprime crisis went global: evidence from bank credit default swap spreads. J. Int. Money Financ. 31(5), 1299–1318 (2012). https://doi.org/10.1016/j.jimonfin.2012.02.002

    Article  Google Scholar 

  12. Fernández, A., García, S., Herrera, F., Chawla, N.V.: Smote for learning from imbalanced data: progress and challenges, marking the 15-year anniversary. J. Artif. Intell. Res. 61(1), 863–905 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  13. Fernández, A., García, S., Galar, M., Prati, R.C., Krawczyk, B., Herrera, F.: Learning from Imbalanced Data Sets. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98074-4

    Book  Google Scholar 

  14. Gama, J.: A survey on learning from data streams: current and future trends. Progress Artif. Intell. 1(1), 45–55 (2012). https://doi.org/10.1007/s13748-011-0002-6

    Article  Google Scholar 

  15. Gomes, H.M., Read, J., Bifet, A., Barddal, J.P., Gama, J.: Machine learning for streaming data: state of the art, challenges, and opportunities. ACM SIGKDD Explor. Newsl. 21(2), 6–22 (2019). https://doi.org/10.1145/3373464.3373470

    Article  Google Scholar 

  16. Han, H., Wang, W.-Y., Mao, B.-H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 878–887. Springer, Heidelberg (2005). https://doi.org/10.1007/11538059_91

    Chapter  Google Scholar 

  17. Hanley, J., Mcneil, B.: The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143, 29–36 (1982). https://doi.org/10.1148/radiology.143.1.7063747

    Article  Google Scholar 

  18. He, H., Bai, Y., Garcia, E.A., Li, S.: ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), pp. 1322–1328 (2008). https://doi.org/10.1109/IJCNN.2008.4633969

  19. He, H., Chen, S., Li, K., Xu, X.: Incremental learning from stream data. IEEE Trans. Neural Netw. 22(12), 1901–1914 (2011). https://doi.org/10.1109/TNN.2011.2171713

    Article  Google Scholar 

  20. Huang, Y.P., Yen, M.F.: A new perspective of performance comparison among machine learning algorithms for financial distress prediction. Appl. Soft Comput. 83, 105663 (2019). https://doi.org/10.1016/j.asoc.2019.105663

    Article  Google Scholar 

  21. Hyndman, R.J., Athanasopoulos, G.: Forecasting: Principles and Practice. OTexts (2021)

    Google Scholar 

  22. Jabeur, S.B., Gharib, C., Mefteh-Wali, S., Arfi, W.B.: CatBoost model and artificial intelligence techniques for corporate failure prediction. Technol. Forecast. Soc. Change 166, 120658 (2021). https://doi.org/10.1016/j.techfore.2021.120658

    Article  Google Scholar 

  23. Klinkenberg, R.: Learning drifting concepts: example selection vs. example weighting. Intell. Data Anal. 8(3), 281–300 (2004). https://doi.org/10.5555/1293831.1293836

    Article  Google Scholar 

  24. Kumbure, M.M., Lohrmann, C., Luukka, P., Porras, J.: Machine learning techniques and data for stock market forecasting: a literature review. Expert Syst. Appl. 197, 116659 (2022). https://doi.org/10.1016/j.eswa.2022.116659

    Article  Google Scholar 

  25. Li, Z., Huang, W., Xiong, Y., Ren, S., Zhu, T.: Incremental learning imbalanced data streams with concept drift: the dynamic updated ensemble algorithm. Knowl.-Based Syst. 195, 105694 (2020). https://doi.org/10.1016/j.knosys.2020.105694

    Article  Google Scholar 

  26. Lin, X., Zhang, Y., Wang, S., Ji, G.: A rule-based model for bankruptcy prediction based on an improved genetic ant colony algorithm. Math. Probl. Eng. 753251 (2013). https://doi.org/10.1155/2013/753251

  27. Nguyen, H.M., Cooper, E.W., Kamei, K.: Borderline over-sampling for imbalanced data classification. Int. J. Knowl. Eng. Soft Data Paradigm 3(1), 4–21 (2011). https://doi.org/10.1504/IJKESDP.2011.039875

    Article  Google Scholar 

  28. Rana, C., Chitre, N., Poyekar, B., Bide, P.: Stroke prediction using Smote-Tomek and neural network. In: 2021 12th International Conference on Computing Communication and Networking Technologies (ICCCNT), pp. 1–5 (2021). https://doi.org/10.1109/ICCCNT51525.2021.9579763

  29. Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach, 3rd edn. Prentice Hall, Hoboken (2010)

    MATH  Google Scholar 

  30. Saito, T., Rehmsmeier, M.: The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS ONE 10, 1–21 (2015). https://doi.org/10.1371/journal.pone.0118432

    Article  Google Scholar 

  31. Shen, F., Liu, Y., Wang, R., Zhou, W.: A dynamic financial distress forecast model with multiple forecast results under unbalanced data environment. Knowl.-Based Syst. 192, 105365 (2020). https://doi.org/10.1016/j.knosys.2019.105365

    Article  Google Scholar 

  32. Shi, Y., Li, X.: A bibliometric study on intelligent techniques of bankruptcy prediction for corporate firms. Heliyon 5(12), 12 (2019). https://doi.org/10.1016/j.heliyon.2019.e02997

    Article  Google Scholar 

  33. Silva, T.C., da Silva Alexandre, M., Tabak, B.M.: Bank lending and systemic risk: a financial-real sector network approach with feedback. J. Financ. Stab. 38, 98–118 (2017). https://doi.org/10.1016/j.jfs.2017.08.006

    Article  Google Scholar 

  34. Sun, J., Li, H., Huang, Q.H., He, K.Y.: Predicting financial distress and corporate failure: a review from the state-of-the-art definitions, modeling, sampling, and featuring approaches. Knowl.-Based Syst. 57, 41–56 (2014). https://doi.org/10.1016/j.knosys.2013.12.006

    Article  Google Scholar 

  35. Sun, J., Zhou, M., Ai, W., Li, H.: Dynamic prediction of relative financial distress based on imbalanced data stream: from the view of one industry. Risk Manag. 21(4), 215–242 (2019). https://doi.org/10.1057/s41283-018-0047-y

    Article  Google Scholar 

  36. Wang, S., Minku, L.L., Yao, X.: A systematic study of online class imbalance learning with concept drift. IEEE Trans. Neural Netw. Learn. Syst. 29(10), 4802–4821 (2018). https://doi.org/10.1109/TNNLS.2017.2771290

    Article  Google Scholar 

  37. Wilson, R.L., Sharda, R.: Bankruptcy prediction using neural networks. Decis. Support Syst. 11(5), 545–557 (1994). https://doi.org/10.1016/0167-9236(94)90024-8

    Article  Google Scholar 

Download references

Acknowledgment

This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Rubens Marques Chaves or Luís Paulo Faina Garcia .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chaves, R.M., Rossi, A.L.D., Garcia, L.P.F. (2023). Financial Distress Prediction in an Imbalanced Data Stream Environment. In: García Bringas, P., et al. Hybrid Artificial Intelligent Systems. HAIS 2023. Lecture Notes in Computer Science(), vol 14001. Springer, Cham. https://doi.org/10.1007/978-3-031-40725-3_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-40725-3_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-40724-6

  • Online ISBN: 978-3-031-40725-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics