Abstract
Telecommunication companies are evolving in a highly competitive market where attracting new customers is much more expensive than retaining existing ones. Though retention campaigns may be used to prevent customer churn, their success depends on the availability of accurate prediction models. Churn prediction is notoriously a difficult problem because of the large amount of data, non-linearity, imbalance and low separability between the classes of churners and non-churners. In this paper, we discuss a real case of churn prediction based on Orange Belgium customer data. In the first part of the paper we focus on the design of an accurate prediction model. The large class imbalance between the two classes is handled with the EasyEnsemble algorithm using a random forest classifier. We assess also the impact of different data preprocessing techniques including feature selection and engineering. Results show that feature selection can be used to reduce computation time and memory requirements, though engineering variables does not necessarily improve performance. In the second part of the paper we explore the application of data-driven causal inference, which aims to infer causal relationships between variables from observational data. We conclude that the bill shock and the wrong tariff plan positioning are putative causes of churn. This is supported by the prior knowledge of experts at Orange Belgium. Finally, we present a novel method to evaluate, in terms of the direction and magnitude, the impact of causally relevant variables on churn, making the assumption of no confounding factors.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
SIM-only indicates that the customer bought no other product than the SIM card.
- 2.
For confidentiality reasons, the precise value of the churn rate cannot be disclosed.
- 3.
For confidentiality reasons, the axes scales are concealed.
References
Bontempi, G., Flauder, M.: From dependency to causality: a machine learning approach. J. Mach. Learn. Res. 16(1), 2437–2457 (2015)
Bontempi, G., Meyer, P.E.: Causal filter selection in microarray data. In: Proceedings of the 27th International Conference on Machine Learning (icml-10), pp. 95–102 (2010)
Dal Pozzolo, A., Bontempi, G.: Adaptive machine learning for credit card fraud detection (2015)
Dal Pozzolo, A., Caelen, O., Waterschoot, S., Bontempi, G.: Racing for unbalanced methods selection. In: Yin, H., et al. (eds.) IDEAL 2013. LNCS, vol. 8206, pp. 24–31. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41278-3_4
De Caigny, A., Coussement, K., De Bock, K.W.: A new hybrid classification algorithm for customer churn prediction based on logistic regression and decision trees. Eur. J. Oper. Res. 269(2), 760–772 (2018). https://doi.org/10.1016/j.ejor.2018.02.009
Elazmeh, W., Japkowicz, N., Matwin, S.: Evaluating misclassifications in imbalanced data. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 126–137. Springer, Heidelberg (2006). https://doi.org/10.1007/11871842_16
Fisher, R.A.: The Design of Experiments. Oliver and Boyd, Edinburgh, London (1937)
Good, P.: Permutation Tests: A Practical Guide to Resampling Methods for Testing Hypotheses. Springer, New York (2013). https://doi.org/10.1007/978-1-4757-2346-5
Gu, Q., Zhu, L., Cai, Z.: Evaluation measures of the classification performance of imbalanced data sets. In: Cai, Z., Li, Z., Kang, Z., Liu, Y. (eds.) ISICA 2009. CCIS, vol. 51, pp. 461–471. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04962-0_53
Gutierrez, P., Gérardy, J.Y.: Causal inference and uplift modelling: a review of the literature. In: International Conference on Predictive Applications and APIs, pp. 1–13 (2017)
Hadden, J., Tiwari, A., Roy, R., Ruta, D.: Computer assisted customer churn management: state-of-the-art and future trends. Comput. Oper. Res. 34(10), 2902–2917 (2007)
Idris, A., Khan, A.: Ensemble based efficient churn prediction model for telecom. In: 2014 12th International Conference on Frontiers of Information Technology (FIT), pp. 238–244 (2014). https://doi.org/10.1109/fit.2014.52
ITU: ITU releases 2018 global and regional ICT estimates (2018). https://www.itu.int/en/ITU-D/Statistics/Pages/stat/
Krieger, N., Davey Smith, G.: The tale wagged by the dag: broadening the scope of causal inference and explanation for epidemiology. Int. J. Epidemiol. 45(6), 1787–1808 (2016)
Lemeire, J., Meganck, S., Cartella, F., Liu, T.: Conservative independence-based causal structure learning in absence of adjacency faithfulness. Int. J. Approx. Reason. 53(9), 1305–1325 (2012)
Liu, X.Y., Wu, J., Zhou, Z.H.: Exploratory undersampling for class-imbalance learning. IEEE Trans. Syst. Man Cybern. Part B Cybern. 39(2), 539–550 (2009). https://doi.org/10.1109/tsmcb.2008.2007853
Margaritis, D., Thrun, S.: Bayesian network induction via local neighborhoods. In: Advances in Neural Information Processing Systems, pp. 505–511 (2000)
Mitrović, S., Baesens, B., Lemahieu, W., De Weerdt, J.: On the operational efficiency of different feature types for telco Churn prediction. Eur. J. Oper. Res. 267(3), 1141–1155 (2018). https://doi.org/10.1016/j.ejor.2017.12.015
Olsen, C., Meyer, P.E., Bontempi, G.: On the impact of entropy estimation on transcriptional regulatory network inference based on mutual information. EURASIP J. Bioinform. Syst. Biol. 2009(1), 308959 (2008)
Pearl, J.: Causality: models, reasoning, and inference. IIE Trans. 34(6), 583–589 (2002)
Petersen, M.L., Sinisi, S.E., van der Laan, M.J.: Estimation of direct causal effects. In: Epidemiology, pp. 276–284 (2006)
Raeder, T., Forman, G., Chawla, N.V.: Learning from imbalanced data: evaluation matters. In: Holmes, D.E., Jain, L.C. (eds.) Data Mining: Foundations and Intelligent Paradigms, pp. 315–331. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-23166-7_12
Scutari, M.: Learning Bayesian networks with the bnlearn R package. arXiv preprint arXiv:0908.3817 (2009)
Spirtes, P., Glymour, C.: An algorithm for fast recovery of sparse causal graphs. Soc. Sci. Comput. Rev. 9(1), 62–72 (1991)
Spirtes, P., Glymour, C., Scheines, R.: Causation, Prediction, and Search, vol. 81. Springer, New York (1993). https://doi.org/10.1007/978-1-4612-2748-9
Tsamardinos, I., Aliferis, C.F., Statnikov, A.R., Statnikov, E.: Algorithms for large scale markov blanket discovery. In: FLAIRS Conference, vol. 2, pp. 376–380 (2003)
Verbeke, W., Dejaeger, K., Martens, D., Hur, J., Baesens, B.: New insights into churn prediction in the telecommunication sector: a profit driven data mining approach. Eur. J. Oper. Res. 218(1), 211–229 (2012)
Verbeke, W., Martens, D., Baesens, B.: Social network analysis for customer churn prediction. Appl. Soft Comput. 14, 431–446 (2014). https://doi.org/10.1016/j.asoc.2013.09.017
Zhu, B., Baesens, B., vanden Broucke, S.K., : An empirical comparison of techniques for the class imbalance problem in churn prediction. Inf. Sci. 408, 84–99 (2017). https://doi.org/10.1016/j.ins.2017.04.015
Óskarsdóttir, M., Bravo, C., Verbeke, W., Sarraute, C., Baesens, B., Vanthienen, J.: Social network analytics for churn prediction in telco: model building, evaluation and network architecture. Expert Syst. Appl. 85, 204–220 (2017). https://doi.org/10.1016/j.eswa.2017.05.028
Óskarsdóttir, M., Van Calster, T., Baesens, B., Lemahieu, W., Vanthienen, J.: Time series for early churn detection: Using similarity based classification for dynamic networks. Expert Syst. Appl. 106, 55–65 (2018). https://doi.org/10.1016/j.eswa.2018.04.003
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Additional Figures on Sensitivity Analysis
Additional Figures on Sensitivity Analysis
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Verhelst, T., Caelen, O., Dewitte, JC., Lebichot, B., Bontempi, G. (2020). Understanding Telecom Customer Churn with Machine Learning: From Prediction to Causal Inference. In: Bogaerts, B., et al. Artificial Intelligence and Machine Learning. BNAIC BENELEARN 2019 2019. Communications in Computer and Information Science, vol 1196. Springer, Cham. https://doi.org/10.1007/978-3-030-65154-1_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-65154-1_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-65153-4
Online ISBN: 978-3-030-65154-1
eBook Packages: Computer ScienceComputer Science (R0)