Abstract
This paper examines churn prediction of customers in the banking sector using a unique customer-level dataset from a large Brazilian bank. Our main contribution is in exploring this rich dataset, which contains prior client behavior traits that enable us to document new insights into the main determinants predicting future client churn. We conduct a horserace of many supervised machine learning algorithms under the same cross-validation and evaluation setup, enabling a fair comparison across algorithms. We find that the random forests technique outperforms decision trees, k-nearest neighbors, elastic net, logistic regression, and support vector machines models in several metrics. Our investigation reveals that customers with a stronger relationship with the institution, who have more products and services, who borrow more from the bank, are less likely to close their checking accounts. Using a back-of-the-envelope estimation, we find that our model has the potential to forecast potential losses of up to 10% of the operating result reported by the largest Brazilian banks in 2019, suggesting the model has a significant economic impact. Our results corroborate the importance of investing in cross-selling and upselling strategies focused on their current customers. These strategies can have positive side effects on customer retention.










Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.Notes
That is, those who move their accounts frequently, have a greater variety of products and services, and obtain conforming bank loans.
We can obtain the results over a year by extrapolating the sample numbers for the year (834,716 customers dropped out within a semester multiplied by two).
References
Agarwal P, Nieto JJ, Ruzhansky M, Torres DF (2021) Analysis of infectious disease problems (Covid-19) and their global impact. Springer, New York
Ahmed M, Afzal H, Siddiqi I, Amjad M, Khurshid K (2020) Exploring nested ensemble learners using overproduction and choose approach for churn prediction in telecom industry. Neural Comput Appl 32:3237–3251
Au T, Ma G, Li S (2003) Applying and evaluating models to predict customer attrition using data mining techniques. J Comp Int Manag 6(1):10–22
Avon V (2016) Machine learning techniques for customer churn prediction in banking environments. Doctorate Thesis. Universita degli Studi di, Padova, Italy
BACEN (2018) Relatório de Economia Bancária (Banking Report). Banco Central do Brasil. https://www.bcb.gov.br/content/publicacoes/relatorioeconomiabancaria/reb_2018.pdf
Ballings M, Van den Poel D (2012) Customer event history for churn prediction: how long is long enough? Expert Syst Appl 39(18):13517–13522
Berry MJ, Linoff GS (2004) Data mining techniques: for marketing, sales, and customer relationship management. Wiley, USA
Bin L, Peiji S, Juan L (2007) Customer churn prediction based on the decision tree in personal handyphone system service In 2007 International Conference on Service Systems and Service Management, pp 1–5 IEEE
Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers In Proceedings of the fifth annual workshop on Computational learning theory, pp 144–152
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Breslow LA, Aha DW (1997) Simplifying decision trees: a survey. Knowl Eng Rev 12(1):1–40
Burez J, Van den Poel D (2009) Handling class imbalance in customer churn prediction. Expert Syst Appl 36(3):4626–4636
Cai J, Luo J, Wang S, Yang S (2018) Feature selection in machine learning: a new perspective. Neurocomputing 300:70–79
Capgemini E (2019) World retail banking report (last accessed on 03/28/2020)
Chapman P, Clinton J, Kerber R, Khabaza T, Reinartz T, Shearer C, Wirth R et al (2000) CRISP-DM 1.0: step-by-step data mining guide, vol 9. SPSS inc., p 13
Coussement K, Van den Poel D (2008) Churn prediction in subscription services: an application of support vector machines while comparing two parameter-selection techniques. Expert Syst Appl 34(1):313–327
Dehghan A, Trafalis T (2012) Examining churn and loyalty using support vector machine. Bus Manag Res 1(4):153
Eastwood M, Gabrys B (2009) A non-sequential representation of sequential data for churn prediction. In International Conference on Knowledge-Based and Intelligent Information and Engineering Systems, pp 209–218 Springer
Farquad MAH, Ravi V, Raju SB (2014) Churn prediction using comprehensible support vector machine: an analytical CRM application. Appl Soft Comput 19:31–40
Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Computer Syst Sci 55(1):119–139
Galar M, Fernandez A, Barrenechea E, Bustince H, Herrera F (2011) A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Transactions Syst Man Cybern Part C (Appl Rev) 42(4):463–484
Gomber P, Kauffman RJ, Parker C, Weber BW (2018) On the fintech revolution: interpreting the forces of innovation, disruption, and transformation in financial services. J Manag Information Syst 35(1):220–265
Guo G, Wang H, Bell D, Bi Y, Greer K (2003) Knn model-based approach in classification In OTM Confederated International Conferences” On the Move to Meaningful Internet Systems”, pp 986–996 Springer
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3(Mar):1157–1182
Haixiang G, Yijing L, Shang J, Mingyun G, Yuanyue H, Bing G (2017) Learning from class-imbalanced data: review of methods and applications. Expert Syst Appl 73:220–239
Hiziroglu A, Seymen OF (2014) Modelling customer churn using segmentation and data mining. Front Artif Intell Appl 270:259–271
Idris A, Khan A (2012) Customer churn prediction for telecommunication: employing various various features selection techniques and tree based ensemble classifiers In 2012 15th International Multitopic Conference (INMIC), pp 23–27 IEEE
Kaur M, Singh K, Sharma N (2013) Data mining as a tool to predict the churn behaviour among Indian bank customers. Int J Recent Innov Trends Comput Commun 1(9):720–725
Krawczyk B, Schaefer G (2013) An improved ensemble approach for imbalanced classification problems In 2013 IEEE 8th international symposium on applied computational intelligence and informatics (SACI), pp 423–426 IEEE
Larivière B, Van den Poel D (2005) Predicting customer retention and profitability by using random forests and regression forests techniques. Expert Syst Appl 29(2):472–484
Liaw A, Wiener M et al (2002) Classification and regression by randomForest. R news 2(3):18–22
Maldonado S, Weber R, Famili F (2014) Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Information Sci 286:228–246
Miguéis VL, Van den Poel D, Camanho AS, e Cunha JF (2012) Modeling partial customer churn: on the value of first product-category purchase sequences. Expert Syst Appl 39(12):11250–11256
Mutanen T, Ahola J, Nousiainen S (2006) Customer churn prediction-a case study in retail banking In Proc of ECML/PKDD Workshop on Practical Data Mining, pp 13–19
Neslin SA, Gupta S, Kamakura W, Lu J, Mason CH (2006) Defection detection: measuring and understanding the predictive accuracy of customer churn models. J Market Res 43(2):204–211
Nguyen EHX (2011) Customer churn prediction for the Icelandic mobile telephony market Ph. D. thesis, Faculty of Industrial Engineering, Mechanical Engineering and Computer Science, University of Iceland
Nie G, Rowe W, Zhang L, Tian Y, Shi Y (2011) Credit card churn forecasting by logistic regression and decision tree. Expert Syst Appl 38(12):15273–15285
Nie G, Wang G, Zhang P, Tian Y, Shi Y (2009) Finding the hidden pattern of credit card holder’s churn: a case of China In International Conference on Computational Science, pp 561–569 Springer
Patil AP, Deepshika M, Mittal S, Shetty S, Hiremath SS, Patil YE (2017) Customer churn prediction for retail business In 2017 International Conference on Energy, Communication, Data Analytics and Soft Computing (ICECDS), pp 845–851 IEEE
Popović D, Bašić BD (2009) Churn prediction model in retail banking using fuzzy C-means algorithm. Informatica 33:2
Prasad UD, Madhavi S (2012) Prediction of churn behavior of bank customers using data mining tools. Bus Intell J 5(1):96–101
Prashanth R, Deepak K, Meher AK (2017) High accuracy predictive modelling for customer churn prediction in telecom industry In International Conference on Machine Learning and Data Mining in Pattern Recognition, pp 391–402 Springer
PwC (2014) Retail banking 2020 evolution or revolution? (last accessed on 03/28/2020)
Rajchakit G, Agarwal P, Ramalingam S (2021) Stability analysis of neural networks. Springer, New York
Rajchakit G, Sriraman R, Boonsatit N, Hammachukiattikul P, Lim CP, Agarwal P (2021) Exponential stability in the Lagrange sense for Clifford-valued recurrent neural networks with time delays. Adv Diff Equ 2021:256
Rajeswari M, Devi T (2015) Design of modified ripper algorithm to predict customer churn. Int J Eng Technol 4(2):408
Sabbeh SF (2018) Machine-learning techniques for customer retention: a comparative study Int J Adv Computer Sci Appl 9(2):273–281
Shaaban E, Helmy Y, Khedr A, Nasr M (2012) A proposed churn prediction model. Int J Eng Res Appl 2(4):693–697
Sharma A, Panigrahi D, Kumar P (2013) A neural network based approach for predicting customer churn in cellular network services arXiv preprint arXiv:1309.3945
Sia SK, Soh C, Weill P (2016) How DBS bank pursued a digital business strategy. MIS Q Executive 15(2):105–121
Silva TC, Zhao L (2012) Network-based high level data classification. IEEE Transactions Neural Netw Learn Syst 23(6):954–970
Silva TC, Zhao L (2012) Network-based stochastic semisupervised learning. IEEE Transactions Neural Netw Learn Syst 23(3):451–466
Silva TC, Zhao L (2012) Stochastic competitive learning in complex networks. IEEE Transactions Neural Netw Learn Syst 23(3):385–398
Silva TC, Zhao L (2016) Machine learning in complex networks, vol 1. Springer, New York
Sivasankar E, Vijaya J (2019) Hybrid PPFCM-ANN model: an efficient system for customer churn prediction through probabilistic possibilistic fuzzy clustering and artificial neural network. Neural Comput Appl 31:7181–7200
Vafeiadis T, Diamantaras KI, Sarigiannidis G, Chatzisavvas KC (2015) A comparison of machine learning techniques for customer churn prediction. Simul Model Pract Theory 55:1–9
Wang G, Liu L, Peng Y, Nie G, Kou G, Shi Y (2010) Predicting credit card holder churn in banks of China using data mining and MCDM In 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, Volume 3, pp 215–218 IEEE
Wen Z, Yan J, Zhou L, Liu Y, Zhu K, Guo Z, Li Y, Zhang F (2018) Customer churn warning with machine learning In The Euro-China Conference on Intelligent Data Analysis and Applications, pp 343–350 Springer
Wolpert DH (1996) The lack of a priori distinctions between learning algorithms. Neural Comput 8(7):1341–1390
Wolpert DH, Macready WG (1997) No free lunch theorems for optimization. IEEE Transactions Evolut Comput 1(1):67–82
Xiao J, Xiao Y, Huang A, Liu D, Wang S (2015) Feature-selection-based dynamic transfer ensemble model for customer churn prediction. Knowl Information Syst 43(1):29–51
Xue B, Zhang M, Browne WN, Yao X (2015) A survey on evolutionary computation approaches to feature selection. IEEE Transactions Evolut Comput 20(4):606–626
Zhang Y, Qi J, Shu H, Cao J (2007) A hybrid KNN-LR classifier and its application in customer churn prediction In 2007 IEEE International Conference on Systems, Man and Cybernetics, pp 3265–3269 IEEE
Zhao Y, Li B, Li X, Liu W, Ren S (2005) Customer churn prediction using improved one-class support vector machine In International Conference on Advanced Data Mining and Applications, pp 300–306 Springer
Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Statistical Soc series B (Statistical Methodol) 67(2):301–320
Acknowledgements
Thiago C. Silva (Grant no. 308171/2019-5, 408546/2018-2) and Benjamin M. Tabak (Grants no. 310541/2018-2, 425123/2018-9) have received financial support from the CNPq foundation.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no competing interests to declare that are relevant to the content of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
de Lima Lemos, R.A., Silva, T.C. & Tabak, B.M. Propension to customer churn in a financial institution: a machine learning approach. Neural Comput & Applic 34, 11751–11768 (2022). https://doi.org/10.1007/s00521-022-07067-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-022-07067-x