Abstract
Corporate bankruptcy predictions are crucial to companies, investors, and authorities. However, most bankruptcy prediction studies have been based on stationary models, and they tend to ignore important challenges of financial distress like data non-stationarity, concept drift and data imbalance. This study proposes methods for dealing with these challenges and uses data collected from financial statements quarterly provided by companies to the Securities and Exchange Commission of Brazil (CVM). It is composed of information from 10 years (2011 to 2020), with 905 different corporations and 23,834 records with 82 indicators each. The sample majority have no financial difficulties, and only 651 companies have financial distress. The empirical experiment uses a sliding window, a history and a forgetting mechanism to avoid the degradation of the predictive model due to concept drift. The characteristics of the problem, especially the data imbalance, the performance of the models is measured through AUC, Gmean, and F1-Score and achieved 0.95, 0.68, and 0.58, respectively.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Agrahari, S., Singh, A.K.: Concept drift detection in data stream mining: a literature review. J. King Saud Univ. Comput. Inf. Sci. (2021). https://doi.org/10.1016/j.jksuci.2021.11.006
Alaka, H.A., et al.: Systematic review of bankruptcy prediction models: towards a framework for tool selection. Expert Syst. Appl. 94, 164–184 (2018). https://doi.org/10.1016/j.eswa.2017.10.040
Alam, T.M., et al.: Corporate bankruptcy prediction: an approach towards better corporate world. Comput. J. 64(11), 1731–1746 (2020). https://doi.org/10.1093/comjnl/bxaa056
Altman, E.I.: Financial ratios, discriminant analysis and the prediction of corporate bankruptcy. J. Financ. 23(4), 589–609 (1968). https://doi.org/10.1111/j.1540-6261.1968.tb00843.x
Barboza, F., Kimura, H., Altman, E.: Machine learning models and bankruptcy prediction. Expert Syst. Appl. 83, 405–417 (2017). https://doi.org/10.1016/j.eswa.2017.04.006
Batista, G.E.A.P.A., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explor. Newsl. 6(1), 20–29 (2004). https://doi.org/10.1145/1007730.1007735
Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn. 30(7), 1145–1159 (1997). https://doi.org/10.1016/S0031-3203(96)00142-2
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16(1), 321–357 (2002). https://doi.org/10.5555/1622407.1622416
Duarte, F., Jones, C.: Empirical network contagion for U.S. financial institutions. FRB of NY Staff Report 1(826) (2017)
Efrim Boritz, J., Kennedy, D.B.: Effectiveness of neural network types for prediction of business failure. Expert Syst. Appl. 9(4), 503–512 (1995). https://doi.org/10.1016/0957-4174(95)00020-8. https://www.sciencedirect.com/science/article/pii/0957417495000208. Expert systems in accounting, auditing, and finance
Eichengreen, B., Mody, A., Nedeljkovic, M., Sarno, L.: How the subprime crisis went global: evidence from bank credit default swap spreads. J. Int. Money Financ. 31(5), 1299–1318 (2012). https://doi.org/10.1016/j.jimonfin.2012.02.002
Fernández, A., García, S., Herrera, F., Chawla, N.V.: Smote for learning from imbalanced data: progress and challenges, marking the 15-year anniversary. J. Artif. Intell. Res. 61(1), 863–905 (2018)
Fernández, A., García, S., Galar, M., Prati, R.C., Krawczyk, B., Herrera, F.: Learning from Imbalanced Data Sets. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98074-4
Gama, J.: A survey on learning from data streams: current and future trends. Progress Artif. Intell. 1(1), 45–55 (2012). https://doi.org/10.1007/s13748-011-0002-6
Gomes, H.M., Read, J., Bifet, A., Barddal, J.P., Gama, J.: Machine learning for streaming data: state of the art, challenges, and opportunities. ACM SIGKDD Explor. Newsl. 21(2), 6–22 (2019). https://doi.org/10.1145/3373464.3373470
Han, H., Wang, W.-Y., Mao, B.-H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 878–887. Springer, Heidelberg (2005). https://doi.org/10.1007/11538059_91
Hanley, J., Mcneil, B.: The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143, 29–36 (1982). https://doi.org/10.1148/radiology.143.1.7063747
He, H., Bai, Y., Garcia, E.A., Li, S.: ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), pp. 1322–1328 (2008). https://doi.org/10.1109/IJCNN.2008.4633969
He, H., Chen, S., Li, K., Xu, X.: Incremental learning from stream data. IEEE Trans. Neural Netw. 22(12), 1901–1914 (2011). https://doi.org/10.1109/TNN.2011.2171713
Huang, Y.P., Yen, M.F.: A new perspective of performance comparison among machine learning algorithms for financial distress prediction. Appl. Soft Comput. 83, 105663 (2019). https://doi.org/10.1016/j.asoc.2019.105663
Hyndman, R.J., Athanasopoulos, G.: Forecasting: Principles and Practice. OTexts (2021)
Jabeur, S.B., Gharib, C., Mefteh-Wali, S., Arfi, W.B.: CatBoost model and artificial intelligence techniques for corporate failure prediction. Technol. Forecast. Soc. Change 166, 120658 (2021). https://doi.org/10.1016/j.techfore.2021.120658
Klinkenberg, R.: Learning drifting concepts: example selection vs. example weighting. Intell. Data Anal. 8(3), 281–300 (2004). https://doi.org/10.5555/1293831.1293836
Kumbure, M.M., Lohrmann, C., Luukka, P., Porras, J.: Machine learning techniques and data for stock market forecasting: a literature review. Expert Syst. Appl. 197, 116659 (2022). https://doi.org/10.1016/j.eswa.2022.116659
Li, Z., Huang, W., Xiong, Y., Ren, S., Zhu, T.: Incremental learning imbalanced data streams with concept drift: the dynamic updated ensemble algorithm. Knowl.-Based Syst. 195, 105694 (2020). https://doi.org/10.1016/j.knosys.2020.105694
Lin, X., Zhang, Y., Wang, S., Ji, G.: A rule-based model for bankruptcy prediction based on an improved genetic ant colony algorithm. Math. Probl. Eng. 753251 (2013). https://doi.org/10.1155/2013/753251
Nguyen, H.M., Cooper, E.W., Kamei, K.: Borderline over-sampling for imbalanced data classification. Int. J. Knowl. Eng. Soft Data Paradigm 3(1), 4–21 (2011). https://doi.org/10.1504/IJKESDP.2011.039875
Rana, C., Chitre, N., Poyekar, B., Bide, P.: Stroke prediction using Smote-Tomek and neural network. In: 2021 12th International Conference on Computing Communication and Networking Technologies (ICCCNT), pp. 1–5 (2021). https://doi.org/10.1109/ICCCNT51525.2021.9579763
Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach, 3rd edn. Prentice Hall, Hoboken (2010)
Saito, T., Rehmsmeier, M.: The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS ONE 10, 1–21 (2015). https://doi.org/10.1371/journal.pone.0118432
Shen, F., Liu, Y., Wang, R., Zhou, W.: A dynamic financial distress forecast model with multiple forecast results under unbalanced data environment. Knowl.-Based Syst. 192, 105365 (2020). https://doi.org/10.1016/j.knosys.2019.105365
Shi, Y., Li, X.: A bibliometric study on intelligent techniques of bankruptcy prediction for corporate firms. Heliyon 5(12), 12 (2019). https://doi.org/10.1016/j.heliyon.2019.e02997
Silva, T.C., da Silva Alexandre, M., Tabak, B.M.: Bank lending and systemic risk: a financial-real sector network approach with feedback. J. Financ. Stab. 38, 98–118 (2017). https://doi.org/10.1016/j.jfs.2017.08.006
Sun, J., Li, H., Huang, Q.H., He, K.Y.: Predicting financial distress and corporate failure: a review from the state-of-the-art definitions, modeling, sampling, and featuring approaches. Knowl.-Based Syst. 57, 41–56 (2014). https://doi.org/10.1016/j.knosys.2013.12.006
Sun, J., Zhou, M., Ai, W., Li, H.: Dynamic prediction of relative financial distress based on imbalanced data stream: from the view of one industry. Risk Manag. 21(4), 215–242 (2019). https://doi.org/10.1057/s41283-018-0047-y
Wang, S., Minku, L.L., Yao, X.: A systematic study of online class imbalance learning with concept drift. IEEE Trans. Neural Netw. Learn. Syst. 29(10), 4802–4821 (2018). https://doi.org/10.1109/TNNLS.2017.2771290
Wilson, R.L., Sharda, R.: Bankruptcy prediction using neural networks. Decis. Support Syst. 11(5), 545–557 (1994). https://doi.org/10.1016/0167-9236(94)90024-8
Acknowledgment
This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Chaves, R.M., Rossi, A.L.D., Garcia, L.P.F. (2023). Financial Distress Prediction in an Imbalanced Data Stream Environment. In: García Bringas, P., et al. Hybrid Artificial Intelligent Systems. HAIS 2023. Lecture Notes in Computer Science(), vol 14001. Springer, Cham. https://doi.org/10.1007/978-3-031-40725-3_15
Download citation
DOI: https://doi.org/10.1007/978-3-031-40725-3_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-40724-6
Online ISBN: 978-3-031-40725-3
eBook Packages: Computer ScienceComputer Science (R0)