Skip to main content

Understanding Telecom Customer Churn with Machine Learning: From Prediction to Causal Inference

  • Conference paper
  • First Online:
  • 597 Accesses

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1196))

Abstract

Telecommunication companies are evolving in a highly competitive market where attracting new customers is much more expensive than retaining existing ones. Though retention campaigns may be used to prevent customer churn, their success depends on the availability of accurate prediction models. Churn prediction is notoriously a difficult problem because of the large amount of data, non-linearity, imbalance and low separability between the classes of churners and non-churners. In this paper, we discuss a real case of churn prediction based on Orange Belgium customer data. In the first part of the paper we focus on the design of an accurate prediction model. The large class imbalance between the two classes is handled with the EasyEnsemble algorithm using a random forest classifier. We assess also the impact of different data preprocessing techniques including feature selection and engineering. Results show that feature selection can be used to reduce computation time and memory requirements, though engineering variables does not necessarily improve performance. In the second part of the paper we explore the application of data-driven causal inference, which aims to infer causal relationships between variables from observational data. We conclude that the bill shock and the wrong tariff plan positioning are putative causes of churn. This is supported by the prior knowledge of experts at Orange Belgium. Finally, we present a novel method to evaluate, in terms of the direction and magnitude, the impact of causally relevant variables on churn, making the assumption of no confounding factors.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    SIM-only indicates that the customer bought no other product than the SIM card.

  2. 2.

    For confidentiality reasons, the precise value of the churn rate cannot be disclosed.

  3. 3.

    For confidentiality reasons, the axes scales are concealed.

References

  1. Bontempi, G., Flauder, M.: From dependency to causality: a machine learning approach. J. Mach. Learn. Res. 16(1), 2437–2457 (2015)

    MathSciNet  MATH  Google Scholar 

  2. Bontempi, G., Meyer, P.E.: Causal filter selection in microarray data. In: Proceedings of the 27th International Conference on Machine Learning (icml-10), pp. 95–102 (2010)

    Google Scholar 

  3. Dal Pozzolo, A., Bontempi, G.: Adaptive machine learning for credit card fraud detection (2015)

    Google Scholar 

  4. Dal Pozzolo, A., Caelen, O., Waterschoot, S., Bontempi, G.: Racing for unbalanced methods selection. In: Yin, H., et al. (eds.) IDEAL 2013. LNCS, vol. 8206, pp. 24–31. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41278-3_4

    Chapter  Google Scholar 

  5. De Caigny, A., Coussement, K., De Bock, K.W.: A new hybrid classification algorithm for customer churn prediction based on logistic regression and decision trees. Eur. J. Oper. Res. 269(2), 760–772 (2018). https://doi.org/10.1016/j.ejor.2018.02.009

    Article  MathSciNet  MATH  Google Scholar 

  6. Elazmeh, W., Japkowicz, N., Matwin, S.: Evaluating misclassifications in imbalanced data. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 126–137. Springer, Heidelberg (2006). https://doi.org/10.1007/11871842_16

    Chapter  Google Scholar 

  7. Fisher, R.A.: The Design of Experiments. Oliver and Boyd, Edinburgh, London (1937)

    MATH  Google Scholar 

  8. Good, P.: Permutation Tests: A Practical Guide to Resampling Methods for Testing Hypotheses. Springer, New York (2013). https://doi.org/10.1007/978-1-4757-2346-5

    Book  MATH  Google Scholar 

  9. Gu, Q., Zhu, L., Cai, Z.: Evaluation measures of the classification performance of imbalanced data sets. In: Cai, Z., Li, Z., Kang, Z., Liu, Y. (eds.) ISICA 2009. CCIS, vol. 51, pp. 461–471. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04962-0_53

    Chapter  Google Scholar 

  10. Gutierrez, P., Gérardy, J.Y.: Causal inference and uplift modelling: a review of the literature. In: International Conference on Predictive Applications and APIs, pp. 1–13 (2017)

    Google Scholar 

  11. Hadden, J., Tiwari, A., Roy, R., Ruta, D.: Computer assisted customer churn management: state-of-the-art and future trends. Comput. Oper. Res. 34(10), 2902–2917 (2007)

    Article  Google Scholar 

  12. Idris, A., Khan, A.: Ensemble based efficient churn prediction model for telecom. In: 2014 12th International Conference on Frontiers of Information Technology (FIT), pp. 238–244 (2014). https://doi.org/10.1109/fit.2014.52

  13. ITU: ITU releases 2018 global and regional ICT estimates (2018). https://www.itu.int/en/ITU-D/Statistics/Pages/stat/

  14. Krieger, N., Davey Smith, G.: The tale wagged by the dag: broadening the scope of causal inference and explanation for epidemiology. Int. J. Epidemiol. 45(6), 1787–1808 (2016)

    Google Scholar 

  15. Lemeire, J., Meganck, S., Cartella, F., Liu, T.: Conservative independence-based causal structure learning in absence of adjacency faithfulness. Int. J. Approx. Reason. 53(9), 1305–1325 (2012)

    Article  MathSciNet  Google Scholar 

  16. Liu, X.Y., Wu, J., Zhou, Z.H.: Exploratory undersampling for class-imbalance learning. IEEE Trans. Syst. Man Cybern. Part B Cybern. 39(2), 539–550 (2009). https://doi.org/10.1109/tsmcb.2008.2007853

  17. Margaritis, D., Thrun, S.: Bayesian network induction via local neighborhoods. In: Advances in Neural Information Processing Systems, pp. 505–511 (2000)

    Google Scholar 

  18. Mitrović, S., Baesens, B., Lemahieu, W., De Weerdt, J.: On the operational efficiency of different feature types for telco Churn prediction. Eur. J. Oper. Res. 267(3), 1141–1155 (2018). https://doi.org/10.1016/j.ejor.2017.12.015

    Article  Google Scholar 

  19. Olsen, C., Meyer, P.E., Bontempi, G.: On the impact of entropy estimation on transcriptional regulatory network inference based on mutual information. EURASIP J. Bioinform. Syst. Biol. 2009(1), 308959 (2008)

    Google Scholar 

  20. Pearl, J.: Causality: models, reasoning, and inference. IIE Trans. 34(6), 583–589 (2002)

    Google Scholar 

  21. Petersen, M.L., Sinisi, S.E., van der Laan, M.J.: Estimation of direct causal effects. In: Epidemiology, pp. 276–284 (2006)

    Google Scholar 

  22. Raeder, T., Forman, G., Chawla, N.V.: Learning from imbalanced data: evaluation matters. In: Holmes, D.E., Jain, L.C. (eds.) Data Mining: Foundations and Intelligent Paradigms, pp. 315–331. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-23166-7_12

    Chapter  MATH  Google Scholar 

  23. Scutari, M.: Learning Bayesian networks with the bnlearn R package. arXiv preprint arXiv:0908.3817 (2009)

  24. Spirtes, P., Glymour, C.: An algorithm for fast recovery of sparse causal graphs. Soc. Sci. Comput. Rev. 9(1), 62–72 (1991)

    Article  Google Scholar 

  25. Spirtes, P., Glymour, C., Scheines, R.: Causation, Prediction, and Search, vol. 81. Springer, New York (1993). https://doi.org/10.1007/978-1-4612-2748-9

    Book  MATH  Google Scholar 

  26. Tsamardinos, I., Aliferis, C.F., Statnikov, A.R., Statnikov, E.: Algorithms for large scale markov blanket discovery. In: FLAIRS Conference, vol. 2, pp. 376–380 (2003)

    Google Scholar 

  27. Verbeke, W., Dejaeger, K., Martens, D., Hur, J., Baesens, B.: New insights into churn prediction in the telecommunication sector: a profit driven data mining approach. Eur. J. Oper. Res. 218(1), 211–229 (2012)

    Article  Google Scholar 

  28. Verbeke, W., Martens, D., Baesens, B.: Social network analysis for customer churn prediction. Appl. Soft Comput. 14, 431–446 (2014). https://doi.org/10.1016/j.asoc.2013.09.017

    Article  Google Scholar 

  29. Zhu, B., Baesens, B., vanden Broucke, S.K., : An empirical comparison of techniques for the class imbalance problem in churn prediction. Inf. Sci. 408, 84–99 (2017). https://doi.org/10.1016/j.ins.2017.04.015

  30. Óskarsdóttir, M., Bravo, C., Verbeke, W., Sarraute, C., Baesens, B., Vanthienen, J.: Social network analytics for churn prediction in telco: model building, evaluation and network architecture. Expert Syst. Appl. 85, 204–220 (2017). https://doi.org/10.1016/j.eswa.2017.05.028

    Article  Google Scholar 

  31. Óskarsdóttir, M., Van Calster, T., Baesens, B., Lemahieu, W., Vanthienen, J.: Time series for early churn detection: Using similarity based classification for dynamic networks. Expert Syst. Appl. 106, 55–65 (2018). https://doi.org/10.1016/j.eswa.2018.04.003

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Théo Verhelst .

Editor information

Editors and Affiliations

Additional Figures on Sensitivity Analysis

Additional Figures on Sensitivity Analysis

Fig. 7.
figure 7

Distribution of the predicted probability of churn when a standard deviation is added separately to each variable. Run on the SIM-only dataset. Only variables inducing the most significant change in the distribution are shown (\(p < 10^{-10}\) with a two-sided t-test).

Fig. 8.
figure 8

Distribution of the predicted probability of churn when a standard deviation is subtracted separately from each variable. Run on the SIM-only dataset. Only variables inducing the most significant change in the distribution are shown (\(p < 10^{-10}\) with a two-sided t-test).

Fig. 9.
figure 9

Difference of mean predicted probability of churn when half a standard deviation is added separately to each variable. Run on the SIM-only dataset. Only variables inducing the most significant change in the distribution are shown (\(p < 10^{-10}\) with a two-sided t-test).

Fig. 10.
figure 10

Difference of mean predicted probability of churn when half a standard deviation is subtracted separately from each variable. Run on the SIM-only dataset. Only variables inducing the most significant change in the distribution are shown (\(p < 10^{-10}\) with a two-sided t-test).

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Verhelst, T., Caelen, O., Dewitte, JC., Lebichot, B., Bontempi, G. (2020). Understanding Telecom Customer Churn with Machine Learning: From Prediction to Causal Inference. In: Bogaerts, B., et al. Artificial Intelligence and Machine Learning. BNAIC BENELEARN 2019 2019. Communications in Computer and Information Science, vol 1196. Springer, Cham. https://doi.org/10.1007/978-3-030-65154-1_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-65154-1_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-65153-4

  • Online ISBN: 978-3-030-65154-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics