Skip to main content

Advertisement

Log in

Propension to customer churn in a financial institution: a machine learning approach

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

This paper examines churn prediction of customers in the banking sector using a unique customer-level dataset from a large Brazilian bank. Our main contribution is in exploring this rich dataset, which contains prior client behavior traits that enable us to document new insights into the main determinants predicting future client churn. We conduct a horserace of many supervised machine learning algorithms under the same cross-validation and evaluation setup, enabling a fair comparison across algorithms. We find that the random forests technique outperforms decision trees, k-nearest neighbors, elastic net, logistic regression, and support vector machines models in several metrics. Our investigation reveals that customers with a stronger relationship with the institution, who have more products and services, who borrow more from the bank, are less likely to close their checking accounts. Using a back-of-the-envelope estimation, we find that our model has the potential to forecast potential losses of up to 10% of the operating result reported by the largest Brazilian banks in 2019, suggesting the model has a significant economic impact. Our results corroborate the importance of investing in cross-selling and upselling strategies focused on their current customers. These strategies can have positive side effects on customer retention.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Explore related subjects

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

Notes

  1. Model’s stability is an important characteristic in empirical works. We refer the reader to [44, 45] for theoretical research in the context of neural networks. An application of a related mathematical modeling for COVID-19 spreading is studied in [1].

  2. For interesting applications see also [2] and [55].

  3. That is, those who move their accounts frequently, have a greater variety of products and services, and obtain conforming bank loans.

  4. We can obtain the results over a year by extrapolating the sample numbers for the year (834,716 customers dropped out within a semester multiplied by two).

References

  1. Agarwal P, Nieto JJ, Ruzhansky M, Torres DF (2021) Analysis of infectious disease problems (Covid-19) and their global impact. Springer, New York

    Book  MATH  Google Scholar 

  2. Ahmed M, Afzal H, Siddiqi I, Amjad M, Khurshid K (2020) Exploring nested ensemble learners using overproduction and choose approach for churn prediction in telecom industry. Neural Comput Appl 32:3237–3251

    Article  Google Scholar 

  3. Au T, Ma G, Li S (2003) Applying and evaluating models to predict customer attrition using data mining techniques. J Comp Int Manag 6(1):10–22

    Google Scholar 

  4. Avon V (2016) Machine learning techniques for customer churn prediction in banking environments. Doctorate Thesis. Universita degli Studi di, Padova, Italy

  5. BACEN (2018) Relatório de Economia Bancária (Banking Report). Banco Central do Brasil. https://www.bcb.gov.br/content/publicacoes/relatorioeconomiabancaria/reb_2018.pdf

  6. Ballings M, Van den Poel D (2012) Customer event history for churn prediction: how long is long enough? Expert Syst Appl 39(18):13517–13522

    Article  Google Scholar 

  7. Berry MJ, Linoff GS (2004) Data mining techniques: for marketing, sales, and customer relationship management. Wiley, USA

    Google Scholar 

  8. Bin L, Peiji S, Juan L (2007) Customer churn prediction based on the decision tree in personal handyphone system service In 2007 International Conference on Service Systems and Service Management, pp 1–5 IEEE

  9. Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers In Proceedings of the fifth annual workshop on Computational learning theory, pp 144–152

  10. Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    Article  MATH  Google Scholar 

  11. Breslow LA, Aha DW (1997) Simplifying decision trees: a survey. Knowl Eng Rev 12(1):1–40

    Article  Google Scholar 

  12. Burez J, Van den Poel D (2009) Handling class imbalance in customer churn prediction. Expert Syst Appl 36(3):4626–4636

    Article  Google Scholar 

  13. Cai J, Luo J, Wang S, Yang S (2018) Feature selection in machine learning: a new perspective. Neurocomputing 300:70–79

    Article  Google Scholar 

  14. Capgemini E (2019) World retail banking report (last accessed on 03/28/2020)

  15. Chapman P, Clinton J, Kerber R, Khabaza T, Reinartz T, Shearer C, Wirth R et al (2000) CRISP-DM 1.0: step-by-step data mining guide, vol 9. SPSS inc., p 13

  16. Coussement K, Van den Poel D (2008) Churn prediction in subscription services: an application of support vector machines while comparing two parameter-selection techniques. Expert Syst Appl 34(1):313–327

    Article  Google Scholar 

  17. Dehghan A, Trafalis T (2012) Examining churn and loyalty using support vector machine. Bus Manag Res 1(4):153

    Article  Google Scholar 

  18. Eastwood M, Gabrys B (2009) A non-sequential representation of sequential data for churn prediction. In International Conference on Knowledge-Based and Intelligent Information and Engineering Systems, pp 209–218 Springer

  19. Farquad MAH, Ravi V, Raju SB (2014) Churn prediction using comprehensible support vector machine: an analytical CRM application. Appl Soft Comput 19:31–40

    Article  Google Scholar 

  20. Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Computer Syst Sci 55(1):119–139

    Article  MathSciNet  MATH  Google Scholar 

  21. Galar M, Fernandez A, Barrenechea E, Bustince H, Herrera F (2011) A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Transactions Syst Man Cybern Part C (Appl Rev) 42(4):463–484

    Article  Google Scholar 

  22. Gomber P, Kauffman RJ, Parker C, Weber BW (2018) On the fintech revolution: interpreting the forces of innovation, disruption, and transformation in financial services. J Manag Information Syst 35(1):220–265

    Article  Google Scholar 

  23. Guo G, Wang H, Bell D, Bi Y, Greer K (2003) Knn model-based approach in classification In OTM Confederated International Conferences” On the Move to Meaningful Internet Systems”, pp 986–996 Springer

  24. Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3(Mar):1157–1182

    MATH  Google Scholar 

  25. Haixiang G, Yijing L, Shang J, Mingyun G, Yuanyue H, Bing G (2017) Learning from class-imbalanced data: review of methods and applications. Expert Syst Appl 73:220–239

    Article  Google Scholar 

  26. Hiziroglu A, Seymen OF (2014) Modelling customer churn using segmentation and data mining. Front Artif Intell Appl 270:259–271

    Google Scholar 

  27. Idris A, Khan A (2012) Customer churn prediction for telecommunication: employing various various features selection techniques and tree based ensemble classifiers In 2012 15th International Multitopic Conference (INMIC), pp 23–27 IEEE

  28. Kaur M, Singh K, Sharma N (2013) Data mining as a tool to predict the churn behaviour among Indian bank customers. Int J Recent Innov Trends Comput Commun 1(9):720–725

    Google Scholar 

  29. Krawczyk B, Schaefer G (2013) An improved ensemble approach for imbalanced classification problems In 2013 IEEE 8th international symposium on applied computational intelligence and informatics (SACI), pp 423–426 IEEE

  30. Larivière B, Van den Poel D (2005) Predicting customer retention and profitability by using random forests and regression forests techniques. Expert Syst Appl 29(2):472–484

    Article  Google Scholar 

  31. Liaw A, Wiener M et al (2002) Classification and regression by randomForest. R news 2(3):18–22

    Google Scholar 

  32. Maldonado S, Weber R, Famili F (2014) Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Information Sci 286:228–246

    Article  Google Scholar 

  33. Miguéis VL, Van den Poel D, Camanho AS, e Cunha JF (2012) Modeling partial customer churn: on the value of first product-category purchase sequences. Expert Syst Appl 39(12):11250–11256

    Article  MATH  Google Scholar 

  34. Mutanen T, Ahola J, Nousiainen S (2006) Customer churn prediction-a case study in retail banking In Proc of ECML/PKDD Workshop on Practical Data Mining, pp 13–19

  35. Neslin SA, Gupta S, Kamakura W, Lu J, Mason CH (2006) Defection detection: measuring and understanding the predictive accuracy of customer churn models. J Market Res 43(2):204–211

    Article  Google Scholar 

  36. Nguyen EHX (2011) Customer churn prediction for the Icelandic mobile telephony market Ph. D. thesis, Faculty of Industrial Engineering, Mechanical Engineering and Computer Science, University of Iceland

  37. Nie G, Rowe W, Zhang L, Tian Y, Shi Y (2011) Credit card churn forecasting by logistic regression and decision tree. Expert Syst Appl 38(12):15273–15285

    Article  Google Scholar 

  38. Nie G, Wang G, Zhang P, Tian Y, Shi Y (2009) Finding the hidden pattern of credit card holder’s churn: a case of China In International Conference on Computational Science, pp 561–569 Springer

  39. Patil AP, Deepshika M, Mittal S, Shetty S, Hiremath SS, Patil YE (2017) Customer churn prediction for retail business In 2017 International Conference on Energy, Communication, Data Analytics and Soft Computing (ICECDS), pp 845–851 IEEE

  40. Popović D, Bašić BD (2009) Churn prediction model in retail banking using fuzzy C-means algorithm. Informatica 33:2

    Google Scholar 

  41. Prasad UD, Madhavi S (2012) Prediction of churn behavior of bank customers using data mining tools. Bus Intell J 5(1):96–101

    Google Scholar 

  42. Prashanth R, Deepak K, Meher AK (2017) High accuracy predictive modelling for customer churn prediction in telecom industry In International Conference on Machine Learning and Data Mining in Pattern Recognition, pp 391–402 Springer

  43. PwC (2014) Retail banking 2020 evolution or revolution? (last accessed on 03/28/2020)

  44. Rajchakit G, Agarwal P, Ramalingam S (2021) Stability analysis of neural networks. Springer, New York

    Book  MATH  Google Scholar 

  45. Rajchakit G, Sriraman R, Boonsatit N, Hammachukiattikul P, Lim CP, Agarwal P (2021) Exponential stability in the Lagrange sense for Clifford-valued recurrent neural networks with time delays. Adv Diff Equ 2021:256

    Article  MathSciNet  Google Scholar 

  46. Rajeswari M, Devi T (2015) Design of modified ripper algorithm to predict customer churn. Int J Eng Technol 4(2):408

    Article  Google Scholar 

  47. Sabbeh SF (2018) Machine-learning techniques for customer retention: a comparative study Int J Adv Computer Sci Appl 9(2):273–281

    Google Scholar 

  48. Shaaban E, Helmy Y, Khedr A, Nasr M (2012) A proposed churn prediction model. Int J Eng Res Appl 2(4):693–697

    Google Scholar 

  49. Sharma A, Panigrahi D, Kumar P (2013) A neural network based approach for predicting customer churn in cellular network services arXiv preprint arXiv:1309.3945

  50. Sia SK, Soh C, Weill P (2016) How DBS bank pursued a digital business strategy. MIS Q Executive 15(2):105–121

    Google Scholar 

  51. Silva TC, Zhao L (2012) Network-based high level data classification. IEEE Transactions Neural Netw Learn Syst 23(6):954–970

    Article  Google Scholar 

  52. Silva TC, Zhao L (2012) Network-based stochastic semisupervised learning. IEEE Transactions Neural Netw Learn Syst 23(3):451–466

    Article  Google Scholar 

  53. Silva TC, Zhao L (2012) Stochastic competitive learning in complex networks. IEEE Transactions Neural Netw Learn Syst 23(3):385–398

    Article  Google Scholar 

  54. Silva TC, Zhao L (2016) Machine learning in complex networks, vol 1. Springer, New York

    Book  MATH  Google Scholar 

  55. Sivasankar E, Vijaya J (2019) Hybrid PPFCM-ANN model: an efficient system for customer churn prediction through probabilistic possibilistic fuzzy clustering and artificial neural network. Neural Comput Appl 31:7181–7200

    Article  Google Scholar 

  56. Vafeiadis T, Diamantaras KI, Sarigiannidis G, Chatzisavvas KC (2015) A comparison of machine learning techniques for customer churn prediction. Simul Model Pract Theory 55:1–9

    Article  Google Scholar 

  57. Wang G, Liu L, Peng Y, Nie G, Kou G, Shi Y (2010) Predicting credit card holder churn in banks of China using data mining and MCDM In 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, Volume 3, pp 215–218 IEEE

  58. Wen Z, Yan J, Zhou L, Liu Y, Zhu K, Guo Z, Li Y, Zhang F (2018) Customer churn warning with machine learning In The Euro-China Conference on Intelligent Data Analysis and Applications, pp 343–350 Springer

  59. Wolpert DH (1996) The lack of a priori distinctions between learning algorithms. Neural Comput 8(7):1341–1390

    Article  Google Scholar 

  60. Wolpert DH, Macready WG (1997) No free lunch theorems for optimization. IEEE Transactions Evolut Comput 1(1):67–82

    Article  Google Scholar 

  61. Xiao J, Xiao Y, Huang A, Liu D, Wang S (2015) Feature-selection-based dynamic transfer ensemble model for customer churn prediction. Knowl Information Syst 43(1):29–51

    Article  Google Scholar 

  62. Xue B, Zhang M, Browne WN, Yao X (2015) A survey on evolutionary computation approaches to feature selection. IEEE Transactions Evolut Comput 20(4):606–626

    Article  Google Scholar 

  63. Zhang Y, Qi J, Shu H, Cao J (2007) A hybrid KNN-LR classifier and its application in customer churn prediction In 2007 IEEE International Conference on Systems, Man and Cybernetics, pp 3265–3269 IEEE

  64. Zhao Y, Li B, Li X, Liu W, Ren S (2005) Customer churn prediction using improved one-class support vector machine In International Conference on Advanced Data Mining and Applications, pp 300–306 Springer

  65. Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Statistical Soc series B (Statistical Methodol) 67(2):301–320

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

Thiago C. Silva (Grant no. 308171/2019-5, 408546/2018-2) and Benjamin M. Tabak (Grants no. 310541/2018-2, 425123/2018-9) have received financial support from the CNPq foundation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Benjamin Miranda Tabak.

Ethics declarations

Conflict of interest

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

de Lima Lemos, R.A., Silva, T.C. & Tabak, B.M. Propension to customer churn in a financial institution: a machine learning approach. Neural Comput & Applic 34, 11751–11768 (2022). https://doi.org/10.1007/s00521-022-07067-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-022-07067-x

Keywords

Profiles

  1. Benjamin Miranda Tabak