Credit Risk Scoring Using a Data Fusion Approach

El-Qadi, Ayoub; Trocan, Maria; Conde-Cespedes, Patricia; Frossard, Thomas; Díaz-Rodríguez, Natalia

doi:10.1007/978-3-031-41456-5_58

Ayoub El-Qadi^14,16,
Maria Trocan¹⁵,
Patricia Conde-Cespedes¹⁵,
Thomas Frossard¹⁶ &
…
Natalia Díaz-Rodríguez¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14162))

Included in the following conference series:

International Conference on Computational Collective Intelligence

576 Accesses

Abstract

Credit scoring is a vital task in the financial industry for assessing the creditworthiness of companies and mitigating credit risks. In recent years, machine learning algorithms have shown promising results in credit scoring by leveraging large amounts of tabular data. However, the traditional tabular data alone may not capture all the information relevant to credit scoring that is typically used by credit risk analysts. In this paper, we propose a novel approach for company credit scoring that integrates text and tabular data. Our method uses natural language processing techniques to extract key features from risk assessments made by credit risk experts which are then combined with financial data to predict the likelihood of default within a one-year horizon. We compare different Machine Learning based models for different text embedding techniques. Our results show that the fact of adding a textual feature improves the ability of the model to capture defaulted companies. More concretely, adding a categorical feature generated by the application of sentiment analysis over text risk assessments yields the best results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Financial embarrassment refers to a state of financial difficulty. Companies in financial embarrassment may have problems refunding their loans.

References

World Bank. Global Economic Prospects, June 2019: Heightened Tensions, Subdued Investment. World Bank, Washington, DC (2019). https://doi.org/10.1596/978-1-4648-1398-6
Addo, P., Guegan, D., Hassani, B.: Credit risk analysis using machine and deep learning models. Risks 6(2), 38 (2018). https://doi.org/10.3390/risks6020038
Article Google Scholar
Niu, B., Ren, J., Li, X.: Credit scoring using machine learning by combing social network information: evidence from peer-to-peer lending. Information 10(12), 397 (2019). https://doi.org/10.3390/info10120397
Article Google Scholar
Altman, E.I.: Financial ratios, discriminant analysis and the prediction of corporate bankruptcy. J. Financ. 23(4), 589–609 (1968). https://doi.org/10.1111/j.1540-6261.1968.tb00843.x
Article Google Scholar
Sohn, S.Y., Kim, D.H., Yoon, J.H.: Technology credit scoring model with fuzzy logistic regression. Appl. Soft Comput. 43, 150–158 (2016). https://doi.org/10.1016/j.asoc.2016.02.025
Article Google Scholar
Lessmann, S., Baesens, B., Seow, H.-V., Thomas, L.C.: Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research. Eur. J. Oper. Res. 247(1), 124–136 (2015)
Article MATH Google Scholar
El Qadi, A., Trocan, M., Dıaz-Rodr’ıguez, N., Frossard, T.: Feature contribution alignment with expert knowledge for artificial intelligence credit scoring. Signal Image Video Process. 17, 427–434 (2022). https://doi.org/10.1007/s11760-022-02239-7
Article Google Scholar
Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794 (2016). https://doi.org/10.1145/2939672.2939785
Bussmann, N., Giudici, P., Marinelli, D., Papenbrock, J.: Explainable AI in fintech risk management. Front. Artif. Intell. 3 (2020). https://doi.org/10.3389/frai.2020.00026
Alzubaidi, L., et al.: Review of deep learning: concepts, CNN architectures, challenges, applications, future directions. J. Big Data 8(1), 53 (2021). https://doi.org/10.1186/s40537-021-00444-8
Article Google Scholar
Otter, D.W., Medina, J.R., Kalita, J.K.: A survey of the usages of deep learning in natural language processing. arXiv (2019). http://arxiv.org/abs/1807.10854. Accessed 10 Jan 2023
Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77(2), 257–286 (1989)
Article Google Scholar
Brown, P.F., et al.: A statistical approach to machine translation. Comput. Linguist. 16(2), 79–85 (1990)
Google Scholar
Shen, Y., He, X., Gao, J., Deng, L., Mesnil, G.: Learning semantic representations using convolutional neural networks for web search. In: International Conference on World Wide Web, pp. 373–374 (2014)
Google Scholar
dos Santos, C.N., Guimaraes, V.: Boosting named entity recognition with neural character embeddings. arXiv preprint arXiv:1505.05008 (2015)
Conde-Cespedes, P., Chavando, J., Deberry, E.: Detection of suspicious accounts on Twitter using word2vec and sentiment analysis. In: Choroś, K., Kopel, M., Kukla, E., Siemiński, A. (eds.) MISSI 2018. AISC, vol. 833, pp. 362–371. Springer, Cham (2019). https://doi.org/10.1007/978-3-319-98678-4_37
Chapter Google Scholar
Berger, A., Lafferty, J.: Information retrieval as statistical translation. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1999), pp. 222–229. Association for Computing Machinery, New York (1999). https://doi.org/10.1145/312624.312681
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv (2013). http://arxiv.org/abs/1301.3781. Accessed 10 Jan 2023
Pennington, J., Socher, M., Richard, C.: Glove: global vectors for word representation. EMNLP 14, 1532–1543 (2014). https://doi.org/10.3115/v1/D14-1162
Article Google Scholar
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv (2019). http://arxiv.org/abs/1810.04805. Accessed 27 Feb 2023
Sivarajah, U., Kamal, M.M., Irani, Z., Weerakkody, V.: Critical analysis of big data challenges and analytical methods. J. Bus. Res. 70, 263–286 (2017)
Article Google Scholar
Raunak, V., Gupta, V., Metze, F.: Effective dimensionality reduction for word embeddings. In: Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019), Florence, Italy, pp. 235–243. Association for Computational Linguistics (2019)
Google Scholar
Agarwal, B., Mittal, N., Bansal, P., Garg, S.: Sentiment analysis using common-sense and context information. J. Comput. Intell. Neurosci. 9 (2015)
Google Scholar
Rambocas, M., Pacheco, B.G.: Online sentiment analysis in marketing research: a review. JRIM 12(2), 146–163 (2018). https://doi.org/10.1108/JRIM-05-2017-0030
Article Google Scholar
Gupta, A., Dengre, V., Kheruwala, H.A., Shah, M.: Comprehensive review of text-mining applications in finance. Financ. Innov. 6(1), 39 (2020). https://doi.org/10.1186/s40854-020-00205-1
Article Google Scholar
Gupta, R., Chen, M.: Sentiment analysis for stock price prediction. In: 2020 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR), Shenzhen, Guangdong, China, pp. 213–218 (2020). https://doi.org/10.1109/MIPR49039.2020.00051
Stevenson, M., Mues, C., Bravo, C.: The value of text for small business default prediction: a deep learning approach. Eur. J. Oper. Res. 295(2), 758–771 (2021). https://doi.org/10.1016/j.ejor.2021.03.008
Article MathSciNet MATH Google Scholar
Provenzano, A.R., et al.: Machine learning approach for credit scoring. arXiv (2020). http://arxiv.org/abs/2008.01687. Accessed 11 Jan 2023
Hazourli, A.R.: FinancialBERT - a pretrained language model for financial text mining (2022). https://doi.org/10.13140/RG.2.2.34032.12803
Jolliffe, I.T., Cadima, J.: Principal component analysis: a review and recent developments. Phil. Trans. R. Soc. A 374(2065), 20150202 (2016). https://doi.org/10.1098/rsta.2015.0202
Article MathSciNet MATH Google Scholar
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Gers, F.A., Schmidhuber, J., Cummins, F.: Learning to forget: continual prediction with LSTM. Neural Comput. 12(10), 2451–2471 (2000)
Article Google Scholar
Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv (2014). http://arxiv.org/abs/1412.3555. Accessed 28 Feb 2023

Download references

Author information

Authors and Affiliations

Sorbonne Université, Paris, France
Ayoub El-Qadi
Institut Supérieur d’Électronique de Paris, Issy-les-Moulineaux, France
Maria Trocan & Patricia Conde-Cespedes
Tinubu Square, Issy-les-Moulineaux, France
Ayoub El-Qadi & Thomas Frossard
DaSCI Institute, University of Granada, Granada, Spain
Natalia Díaz-Rodríguez

Authors

Ayoub El-Qadi
View author publications
You can also search for this author in PubMed Google Scholar
Maria Trocan
View author publications
You can also search for this author in PubMed Google Scholar
Patricia Conde-Cespedes
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Frossard
View author publications
You can also search for this author in PubMed Google Scholar
Natalia Díaz-Rodríguez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ayoub El-Qadi .

Editor information

Editors and Affiliations

Wrocław University of Science and Technology, Wrocław, Poland
Ngoc Thanh Nguyen
Eötvös Loránd University, Budapest, Hungary
János Botzheim
Eötvös Loránd University, Budapest, Hungary
László Gulyás
Universidad Complutense de Madrid, Madrid, Spain
Manuel Núñez
Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
Jan Treur
Universität Münster, Münster, Germany
Gottfried Vossen
Wrocław University of Science and Technology, Wrocław, Poland
Adrianna Kozierkiewicz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

El-Qadi, A., Trocan, M., Conde-Cespedes, P., Frossard, T., Díaz-Rodríguez, N. (2023). Credit Risk Scoring Using a Data Fusion Approach. In: Nguyen, N.T., et al. Computational Collective Intelligence. ICCCI 2023. Lecture Notes in Computer Science(), vol 14162. Springer, Cham. https://doi.org/10.1007/978-3-031-41456-5_58

Download citation

DOI: https://doi.org/10.1007/978-3-031-41456-5_58
Published: 13 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-41455-8
Online ISBN: 978-3-031-41456-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Credit Risk Scoring Using a Data Fusion Approach