Bank Statements to Network Features: Extracting Features Out of Time Series Using Visibility Graph

Shaji, Nirbhaya; Gama, João; Ribeiro, Rita P.; Gomes, Pedro

doi:10.1007/978-3-031-01333-1_22

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13205))

Included in the following conference series:

International Symposium on Intelligent Data Analysis

1211 Accesses

Abstract

Non-traditional data like the applicant’s bank statement is a significant source for decision-making when granting loans. We find that we can use methods from network science on the applicant’s bank statements to convert inherent cash flow characteristics to predictors for default prediction in a credit scoring or credit risk assessment model. First, the credit cash flow is extracted from a bank statement and later converted into a visibility graph or network. Afterwards, we use this visibility network to find features that predict the borrowers’ repayment behaviour. We see that feature selection methods select all the five extracted features. Finally, SMOTE is used to balance the training data. The model using the features from the network and the standard features together is shown having superior performance compared to the model that uses only the standard features, indicating the network features’ predictive power.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Risk attribution and interconnectedness in the EU via CDS data

Article Open access 01 December 2020

Systemic Risk and Network Science: A Bibliometric and Systematic Review

Systemic risk: a network approach

Article 16 September 2021

References

Ruiz, S., Gomes, P., Rodrigues, L., Gama, J.: Assembled feature selection for credit scoring in microfinance with non-traditional features. In: Appice, A., Tsoumakas, G., Manolopoulos, Y., Matwin, S. (eds.) DS 2020. LNCS (LNAI), vol. 12323, pp. 207–216. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-61527-7_14
Chapter Google Scholar
Nalić, J., Švraka, A.: Using data mining approaches to build credit scoring model: case study - implementation of credit scoring model in microfinance institution. In: 2018 17th International Symposium INFOTEH-JAHORINA (INFOTEH), pp. 1–5 (2018)
Google Scholar
Blanco, A., Pino-Mejías, R., Lara, J., Rayo, S.: Credit scoring models for the microfinance industry using neural networks: evidence from peru. Expert Syst. Appl. 40(1), 356–364 (2013)
Article Google Scholar
Bunker, R., Zhang, W., Naeem, M.A.: Improving a credit scoring model by incorporating bank statement derived features. 10 (2016)
Google Scholar
Lacasa, L., Luque, B., Ballesteros, F., Luque, J., Nuño, J.C.: From time series to complex networks: the visibility graph. 105(13), 4972–4975 (2008)
Google Scholar
Petropoulos, A., Siakoulis, V., Stavroulakis, E., Klamargias, A.: A robust machine learning approach for credit risk analysis of large loan level datasets using deep learning and extreme gradient boosting, in are post-crisis statistical initiatives completed? (B. for International Settlements, ed.), vol. 49 of IFC Bulletins chapters, Bank for International Settlements (2019)
Google Scholar
Ruiz, S., Gomes, P., Rodrigues, L., Gama, J.: Credit scoring in microfinance using non-traditional data. In: Oliveira, E., Gama, J., Vale, Z., Lopes Cardoso, H. (eds.) EPIA 2017. LNCS (LNAI), vol. 10423, pp. 447–458. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-65340-2_37
Chapter Google Scholar
Paraíso, P., Gomes, P., Ruiz, S., Rodrigues, L., Gama, J.: Using network features for credit scoring in microfinance: extended abstract. In: 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA), pp. 783–784 (2020)
Google Scholar
Provenzano, A.R., et al.: Machine learning approach for credit scoring (2020)
Google Scholar
Silva, V., Silva, M., Ribeiro, P., Silva, F.: Time series analysis via network science: concepts and algorithms. Wiley Interdisc. Rev. Data Min. Knowl. Discovery 11, 05 (2021)
Google Scholar
Ghosh, S.K.: Visibility algorithms in the Plane. Cambridge University Press (2007)
Google Scholar
Luque, B., Lacasa, L., Ballesteros, F., Luque, J.: Horizontal visibility graphs: exact results for random time series. Phys. Rev. E. 80, 046103 (2009)
Article Google Scholar
Lacasa, L., Nuñez, A., Roldán, E., Parrondo, J.M.R., Luque, B.: Time series irreversibility: a visibility graph approach. Eur. Phys. J. B 85, 1–11 (2012)
Article Google Scholar
Bowyer, K.W., Chawla, N.V., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. CoRR, vol. abs/1106.1813 (2011)
Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)
Article Google Scholar
Costa, L.D.F., Rodrigues, A., Travieso, G., Boas, P.R.V.: Characterization of complex networks: a survey of measurements. Adv. Phys. 56(1), 167–242 (2007)
Article Google Scholar
Campanharo, A.S.L.O., Sirer, M.I., Malmgren, R.D., Ramos, F.M., Amaral, L.A.N.: Duality between time series and networks. PLOS ONE 6, 1–13 (2011)
Article Google Scholar
Takahashi, D.Y., Sato, J.R., Ferreira, C.E., Fujita, A.: Discriminating different classes of biological networks by analyzing the graphs spectra distribution. PLOS ONE 7, 1–12 (2012)
Google Scholar
Kursa, M.B., Rudnicki, W.R.: Feature selection with the boruta package. J. Stat. Softw. 36, 1–13 (2010)
Article Google Scholar
Bastian, M., Heymann, S., Jacomy, M.: Gephi: an open source software for exploring and manipulating networks (2009)
Google Scholar

Download references

Acknowledgements

This article is a result of the project Risk Assessment for Microfinance, supported by Norte Portugal Regional Operational Programme (NORTE 2020), under the PORTUGAL 2020 Partnership Agreement, through the European Regional Development Fund (ERDF).

Author information

Authors and Affiliations

Faculty of Sciences, University of Porto, Porto, Portugal
Nirbhaya Shaji & Rita P. Ribeiro
Faculty of Economics, University of Porto, Porto, Portugal
João Gama
INESC TEC, Porto, Portugal
João Gama & Rita P. Ribeiro
Pelican Rhythms LDA, Porto, Portugal
Nirbhaya Shaji & Pedro Gomes

Authors

Nirbhaya Shaji
View author publications
You can also search for this author in PubMed Google Scholar
João Gama
View author publications
You can also search for this author in PubMed Google Scholar
Rita P. Ribeiro
View author publications
You can also search for this author in PubMed Google Scholar
Pedro Gomes
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nirbhaya Shaji .

Editor information

Editors and Affiliations

University of Rennes, Rennes, France
Tassadit Bouadi
University of Rennes, Rennes, France
Elisa Fromont
University of Munich, LMU, Munich, Germany
Eyke Hüllermeier

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shaji, N., Gama, J., Ribeiro, R.P., Gomes, P. (2022). Bank Statements to Network Features: Extracting Features Out of Time Series Using Visibility Graph. In: Bouadi, T., Fromont, E., Hüllermeier, E. (eds) Advances in Intelligent Data Analysis XX. IDA 2022. Lecture Notes in Computer Science, vol 13205. Springer, Cham. https://doi.org/10.1007/978-3-031-01333-1_22

Download citation

DOI: https://doi.org/10.1007/978-3-031-01333-1_22
Published: 07 April 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-01332-4
Online ISBN: 978-3-031-01333-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics