Abstract
Non-traditional data like the applicant’s bank statement is a significant source for decision-making when granting loans. We find that we can use methods from network science on the applicant’s bank statements to convert inherent cash flow characteristics to predictors for default prediction in a credit scoring or credit risk assessment model. First, the credit cash flow is extracted from a bank statement and later converted into a visibility graph or network. Afterwards, we use this visibility network to find features that predict the borrowers’ repayment behaviour. We see that feature selection methods select all the five extracted features. Finally, SMOTE is used to balance the training data. The model using the features from the network and the standard features together is shown having superior performance compared to the model that uses only the standard features, indicating the network features’ predictive power.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ruiz, S., Gomes, P., Rodrigues, L., Gama, J.: Assembled feature selection for credit scoring in microfinance with non-traditional features. In: Appice, A., Tsoumakas, G., Manolopoulos, Y., Matwin, S. (eds.) DS 2020. LNCS (LNAI), vol. 12323, pp. 207–216. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-61527-7_14
Nalić, J., Švraka, A.: Using data mining approaches to build credit scoring model: case study - implementation of credit scoring model in microfinance institution. In: 2018 17th International Symposium INFOTEH-JAHORINA (INFOTEH), pp. 1–5 (2018)
Blanco, A., Pino-Mejías, R., Lara, J., Rayo, S.: Credit scoring models for the microfinance industry using neural networks: evidence from peru. Expert Syst. Appl. 40(1), 356–364 (2013)
Bunker, R., Zhang, W., Naeem, M.A.: Improving a credit scoring model by incorporating bank statement derived features. 10 (2016)
Lacasa, L., Luque, B., Ballesteros, F., Luque, J., Nuño, J.C.: From time series to complex networks: the visibility graph. 105(13), 4972–4975 (2008)
Petropoulos, A., Siakoulis, V., Stavroulakis, E., Klamargias, A.: A robust machine learning approach for credit risk analysis of large loan level datasets using deep learning and extreme gradient boosting, in are post-crisis statistical initiatives completed? (B. for International Settlements, ed.), vol. 49 of IFC Bulletins chapters, Bank for International Settlements (2019)
Ruiz, S., Gomes, P., Rodrigues, L., Gama, J.: Credit scoring in microfinance using non-traditional data. In: Oliveira, E., Gama, J., Vale, Z., Lopes Cardoso, H. (eds.) EPIA 2017. LNCS (LNAI), vol. 10423, pp. 447–458. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-65340-2_37
Paraíso, P., Gomes, P., Ruiz, S., Rodrigues, L., Gama, J.: Using network features for credit scoring in microfinance: extended abstract. In: 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA), pp. 783–784 (2020)
Provenzano, A.R., et al.: Machine learning approach for credit scoring (2020)
Silva, V., Silva, M., Ribeiro, P., Silva, F.: Time series analysis via network science: concepts and algorithms. Wiley Interdisc. Rev. Data Min. Knowl. Discovery 11, 05 (2021)
Ghosh, S.K.: Visibility algorithms in the Plane. Cambridge University Press (2007)
Luque, B., Lacasa, L., Ballesteros, F., Luque, J.: Horizontal visibility graphs: exact results for random time series. Phys. Rev. E. 80, 046103 (2009)
Lacasa, L., Nuñez, A., Roldán, E., Parrondo, J.M.R., Luque, B.: Time series irreversibility: a visibility graph approach. Eur. Phys. J. B 85, 1–11 (2012)
Bowyer, K.W., Chawla, N.V., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. CoRR, vol. abs/1106.1813 (2011)
Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)
Costa, L.D.F., Rodrigues, A., Travieso, G., Boas, P.R.V.: Characterization of complex networks: a survey of measurements. Adv. Phys. 56(1), 167–242 (2007)
Campanharo, A.S.L.O., Sirer, M.I., Malmgren, R.D., Ramos, F.M., Amaral, L.A.N.: Duality between time series and networks. PLOS ONE 6, 1–13 (2011)
Takahashi, D.Y., Sato, J.R., Ferreira, C.E., Fujita, A.: Discriminating different classes of biological networks by analyzing the graphs spectra distribution. PLOS ONE 7, 1–12 (2012)
Kursa, M.B., Rudnicki, W.R.: Feature selection with the boruta package. J. Stat. Softw. 36, 1–13 (2010)
Bastian, M., Heymann, S., Jacomy, M.: Gephi: an open source software for exploring and manipulating networks (2009)
Acknowledgements
This article is a result of the project Risk Assessment for Microfinance, supported by Norte Portugal Regional Operational Programme (NORTE 2020), under the PORTUGAL 2020 Partnership Agreement, through the European Regional Development Fund (ERDF).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Shaji, N., Gama, J., Ribeiro, R.P., Gomes, P. (2022). Bank Statements to Network Features: Extracting Features Out of Time Series Using Visibility Graph. In: Bouadi, T., Fromont, E., Hüllermeier, E. (eds) Advances in Intelligent Data Analysis XX. IDA 2022. Lecture Notes in Computer Science, vol 13205. Springer, Cham. https://doi.org/10.1007/978-3-031-01333-1_22
Download citation
DOI: https://doi.org/10.1007/978-3-031-01333-1_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-01332-4
Online ISBN: 978-3-031-01333-1
eBook Packages: Computer ScienceComputer Science (R0)