Abstract
Credit scoring is a vital task in the financial industry for assessing the creditworthiness of companies and mitigating credit risks. In recent years, machine learning algorithms have shown promising results in credit scoring by leveraging large amounts of tabular data. However, the traditional tabular data alone may not capture all the information relevant to credit scoring that is typically used by credit risk analysts. In this paper, we propose a novel approach for company credit scoring that integrates text and tabular data. Our method uses natural language processing techniques to extract key features from risk assessments made by credit risk experts which are then combined with financial data to predict the likelihood of default within a one-year horizon. We compare different Machine Learning based models for different text embedding techniques. Our results show that the fact of adding a textual feature improves the ability of the model to capture defaulted companies. More concretely, adding a categorical feature generated by the application of sentiment analysis over text risk assessments yields the best results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Financial embarrassment refers to a state of financial difficulty. Companies in financial embarrassment may have problems refunding their loans.
References
World Bank. Global Economic Prospects, June 2019: Heightened Tensions, Subdued Investment. World Bank, Washington, DC (2019). https://doi.org/10.1596/978-1-4648-1398-6
Addo, P., Guegan, D., Hassani, B.: Credit risk analysis using machine and deep learning models. Risks 6(2), 38 (2018). https://doi.org/10.3390/risks6020038
Niu, B., Ren, J., Li, X.: Credit scoring using machine learning by combing social network information: evidence from peer-to-peer lending. Information 10(12), 397 (2019). https://doi.org/10.3390/info10120397
Altman, E.I.: Financial ratios, discriminant analysis and the prediction of corporate bankruptcy. J. Financ. 23(4), 589–609 (1968). https://doi.org/10.1111/j.1540-6261.1968.tb00843.x
Sohn, S.Y., Kim, D.H., Yoon, J.H.: Technology credit scoring model with fuzzy logistic regression. Appl. Soft Comput. 43, 150–158 (2016). https://doi.org/10.1016/j.asoc.2016.02.025
Lessmann, S., Baesens, B., Seow, H.-V., Thomas, L.C.: Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research. Eur. J. Oper. Res. 247(1), 124–136 (2015)
El Qadi, A., Trocan, M., Dıaz-Rodr’ıguez, N., Frossard, T.: Feature contribution alignment with expert knowledge for artificial intelligence credit scoring. Signal Image Video Process. 17, 427–434 (2022). https://doi.org/10.1007/s11760-022-02239-7
Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794 (2016). https://doi.org/10.1145/2939672.2939785
Bussmann, N., Giudici, P., Marinelli, D., Papenbrock, J.: Explainable AI in fintech risk management. Front. Artif. Intell. 3 (2020). https://doi.org/10.3389/frai.2020.00026
Alzubaidi, L., et al.: Review of deep learning: concepts, CNN architectures, challenges, applications, future directions. J. Big Data 8(1), 53 (2021). https://doi.org/10.1186/s40537-021-00444-8
Otter, D.W., Medina, J.R., Kalita, J.K.: A survey of the usages of deep learning in natural language processing. arXiv (2019). http://arxiv.org/abs/1807.10854. Accessed 10 Jan 2023
Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77(2), 257–286 (1989)
Brown, P.F., et al.: A statistical approach to machine translation. Comput. Linguist. 16(2), 79–85 (1990)
Shen, Y., He, X., Gao, J., Deng, L., Mesnil, G.: Learning semantic representations using convolutional neural networks for web search. In: International Conference on World Wide Web, pp. 373–374 (2014)
dos Santos, C.N., Guimaraes, V.: Boosting named entity recognition with neural character embeddings. arXiv preprint arXiv:1505.05008 (2015)
Conde-Cespedes, P., Chavando, J., Deberry, E.: Detection of suspicious accounts on Twitter using word2vec and sentiment analysis. In: Choroś, K., Kopel, M., Kukla, E., Siemiński, A. (eds.) MISSI 2018. AISC, vol. 833, pp. 362–371. Springer, Cham (2019). https://doi.org/10.1007/978-3-319-98678-4_37
Berger, A., Lafferty, J.: Information retrieval as statistical translation. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1999), pp. 222–229. Association for Computing Machinery, New York (1999). https://doi.org/10.1145/312624.312681
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv (2013). http://arxiv.org/abs/1301.3781. Accessed 10 Jan 2023
Pennington, J., Socher, M., Richard, C.: Glove: global vectors for word representation. EMNLP 14, 1532–1543 (2014). https://doi.org/10.3115/v1/D14-1162
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv (2019). http://arxiv.org/abs/1810.04805. Accessed 27 Feb 2023
Sivarajah, U., Kamal, M.M., Irani, Z., Weerakkody, V.: Critical analysis of big data challenges and analytical methods. J. Bus. Res. 70, 263–286 (2017)
Raunak, V., Gupta, V., Metze, F.: Effective dimensionality reduction for word embeddings. In: Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019), Florence, Italy, pp. 235–243. Association for Computational Linguistics (2019)
Agarwal, B., Mittal, N., Bansal, P., Garg, S.: Sentiment analysis using common-sense and context information. J. Comput. Intell. Neurosci. 9 (2015)
Rambocas, M., Pacheco, B.G.: Online sentiment analysis in marketing research: a review. JRIM 12(2), 146–163 (2018). https://doi.org/10.1108/JRIM-05-2017-0030
Gupta, A., Dengre, V., Kheruwala, H.A., Shah, M.: Comprehensive review of text-mining applications in finance. Financ. Innov. 6(1), 39 (2020). https://doi.org/10.1186/s40854-020-00205-1
Gupta, R., Chen, M.: Sentiment analysis for stock price prediction. In: 2020 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR), Shenzhen, Guangdong, China, pp. 213–218 (2020). https://doi.org/10.1109/MIPR49039.2020.00051
Stevenson, M., Mues, C., Bravo, C.: The value of text for small business default prediction: a deep learning approach. Eur. J. Oper. Res. 295(2), 758–771 (2021). https://doi.org/10.1016/j.ejor.2021.03.008
Provenzano, A.R., et al.: Machine learning approach for credit scoring. arXiv (2020). http://arxiv.org/abs/2008.01687. Accessed 11 Jan 2023
Hazourli, A.R.: FinancialBERT - a pretrained language model for financial text mining (2022). https://doi.org/10.13140/RG.2.2.34032.12803
Jolliffe, I.T., Cadima, J.: Principal component analysis: a review and recent developments. Phil. Trans. R. Soc. A 374(2065), 20150202 (2016). https://doi.org/10.1098/rsta.2015.0202
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Gers, F.A., Schmidhuber, J., Cummins, F.: Learning to forget: continual prediction with LSTM. Neural Comput. 12(10), 2451–2471 (2000)
Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv (2014). http://arxiv.org/abs/1412.3555. Accessed 28 Feb 2023
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
El-Qadi, A., Trocan, M., Conde-Cespedes, P., Frossard, T., Díaz-Rodríguez, N. (2023). Credit Risk Scoring Using a Data Fusion Approach. In: Nguyen, N.T., et al. Computational Collective Intelligence. ICCCI 2023. Lecture Notes in Computer Science(), vol 14162. Springer, Cham. https://doi.org/10.1007/978-3-031-41456-5_58
Download citation
DOI: https://doi.org/10.1007/978-3-031-41456-5_58
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-41455-8
Online ISBN: 978-3-031-41456-5
eBook Packages: Computer ScienceComputer Science (R0)