Skip to main content

A Churn Prediction Dataset from the Telecom Sector: A New Benchmark for Uplift Modeling

  • Conference paper
  • First Online:
Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2023)

Abstract

Uplift modeling, also known as individual treatment effect (ITE) estimation, is an important approach for data-driven decision making that aims to identify the causal impact of an intervention on individuals. This paper introduces a new benchmark dataset for uplift modeling focused on churn prediction, coming from a telecom company in Belgium. Churn, in this context, refers to customers terminating their subscription to the telecom service. This is the first publicly available dataset offering the possibility to evaluate the efficiency of uplift modeling on the churn prediction problem. Moreover, its unique characteristics make it more challenging than the few other public uplift datasets.

Funded by the Brussels-Capital Region - Innoviris (Brussels Public Organisation for Research and Innovation) under grant number 2019-PHD-16.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://github.com/joshxinjie/Data_Scientist_Nanodegree/tree/master/starbucks_portfolio_exercise.

  2. 2.

    https://www.uplift-modeling.com/en/latest/api/datasets/fetch_lenta.html.

  3. 3.

    https://github.com/TheoVerhelst/Churn-Uplift-Dataset-Paper.

References

  1. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  MATH  Google Scholar 

  2. Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley (1991)

    Google Scholar 

  3. Dal Pozzolo, A., Caelen, O., Johnson, R.A., Bontempi, G.: Calibrating probability with undersampling for unbalanced classification. In: 2015 IEEE Symposium Series on Computational Intelligence, pp. 159–166. IEEE (2015)

    Google Scholar 

  4. Diemert Eustache, B.A., Renaudin, C., Massih-Reza, A.: A large scale benchmark for uplift modeling. In: Proceedings of the AdKDD and TargetAd Workshop, KDD, London, United Kingdom, 20 August 2018. ACM (2018)

    Google Scholar 

  5. Fernández-Loria, C., Provost, F.: Causal classification: treatment effect estimation vs. outcome prediction. J. Mach. Learn. Res. 23(59), 1–35 (2022)

    Google Scholar 

  6. Fernández-Loria, C., Provost, F.: Causal decision making and causal effect estimation are not the same. . . and why it matters. INFORMS J. Data Sci. (2022)

    Google Scholar 

  7. Gubela, R.M., Lessmann, S.: Uplift modeling with value-driven evaluation metrics. Decision Support Syst. (2021)

    Google Scholar 

  8. Guelman, L., Guillén, M., Pérez-Marín, A.M.: Uplift random forests. Cybern. Syst. 46(3–4), 230–248 (2015). https://doi.org/10.1080/01969722.2015.1012892

    Article  Google Scholar 

  9. Gutierrez, P., Gérardy, J.-Y.: Causal inference and uplift modelling: a review of the literature. In: Hardgrove, C., Dorard, L., Thompson, K., Douetteau, F. (eds.) Proceedings of The 3rd International Conference on Predictive Applications and APIs, pp. 1–13. PMLR, Microsoft NERD, Boston, USA (2016)

    Google Scholar 

  10. Hillstrom, K.: The MineThatData E-mail analytics and data mining challenge (2008). https://blog.minethatdata.com/2008/03/minethatdata-e-mail-analytics-and-data.html

  11. Künzel, S.R., Sekhon, J.S., Bickel, P.J., Yu, B.: Metalearners for estimating heterogeneous treatment effects using machine learning. Proc. Natl. Acad. Sci. U.S.A. 116(10), 4156–4165 (2019). https://doi.org/10.1073/pnas.1804597116

    Article  ADS  PubMed  PubMed Central  MATH  Google Scholar 

  12. Li, A., Pearl, J.: Unit selection based on counterfactual logic. In: IJCAI, International Joint Conferences on Artificial Intelligence Organization, pp. 1793–1799 (2019). https://doi.org/10.24963/ijcai.2019/248

  13. Liu, X.-Y., Wu, J., Zhou, Z.-H.: Exploratory undersampling for class-imbalance learning. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 39(2), 539–550 (2009). https://doi.org/10.1109/tsmcb.2008.2007853

  14. Rößler, J., Schoder, D.: Bridging the gap: a systematic benchmarking of uplift modeling and heterogeneous treatment effects methods. J. Interact. Mark. 57(4), 629–650 (2022)

    Article  MATH  Google Scholar 

  15. Verhelst, T., Mercier, D., Shrestha, J., Bontempi, G.: Partial counterfactual identification and uplift modeling: theoretical results and real-world assessment. Mach. Learn. (2023). https://doi.org/10.1007/s10994-023-06317-w

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Théo Verhelst .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Verhelst, T., Mercier, D., Shestha, J., Bontempi, G. (2025). A Churn Prediction Dataset from the Telecom Sector: A New Benchmark for Uplift Modeling. In: Meo, R., Silvestri, F. (eds) Machine Learning and Principles and Practice of Knowledge Discovery in Databases. ECML PKDD 2023. Communications in Computer and Information Science, vol 2136. Springer, Cham. https://doi.org/10.1007/978-3-031-74640-6_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-74640-6_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-74639-0

  • Online ISBN: 978-3-031-74640-6

  • eBook Packages: Artificial Intelligence (R0)

Publish with us

Policies and ethics