Abstract
Uplift modeling, also known as individual treatment effect (ITE) estimation, is an important approach for data-driven decision making that aims to identify the causal impact of an intervention on individuals. This paper introduces a new benchmark dataset for uplift modeling focused on churn prediction, coming from a telecom company in Belgium. Churn, in this context, refers to customers terminating their subscription to the telecom service. This is the first publicly available dataset offering the possibility to evaluate the efficiency of uplift modeling on the churn prediction problem. Moreover, its unique characteristics make it more challenging than the few other public uplift datasets.
Funded by the Brussels-Capital Region - Innoviris (Brussels Public Organisation for Research and Innovation) under grant number 2019-PHD-16.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley (1991)
Dal Pozzolo, A., Caelen, O., Johnson, R.A., Bontempi, G.: Calibrating probability with undersampling for unbalanced classification. In: 2015 IEEE Symposium Series on Computational Intelligence, pp. 159–166. IEEE (2015)
Diemert Eustache, B.A., Renaudin, C., Massih-Reza, A.: A large scale benchmark for uplift modeling. In: Proceedings of the AdKDD and TargetAd Workshop, KDD, London, United Kingdom, 20 August 2018. ACM (2018)
Fernández-Loria, C., Provost, F.: Causal classification: treatment effect estimation vs. outcome prediction. J. Mach. Learn. Res. 23(59), 1–35 (2022)
Fernández-Loria, C., Provost, F.: Causal decision making and causal effect estimation are not the same. . . and why it matters. INFORMS J. Data Sci. (2022)
Gubela, R.M., Lessmann, S.: Uplift modeling with value-driven evaluation metrics. Decision Support Syst. (2021)
Guelman, L., Guillén, M., Pérez-Marín, A.M.: Uplift random forests. Cybern. Syst. 46(3–4), 230–248 (2015). https://doi.org/10.1080/01969722.2015.1012892
Gutierrez, P., Gérardy, J.-Y.: Causal inference and uplift modelling: a review of the literature. In: Hardgrove, C., Dorard, L., Thompson, K., Douetteau, F. (eds.) Proceedings of The 3rd International Conference on Predictive Applications and APIs, pp. 1–13. PMLR, Microsoft NERD, Boston, USA (2016)
Hillstrom, K.: The MineThatData E-mail analytics and data mining challenge (2008). https://blog.minethatdata.com/2008/03/minethatdata-e-mail-analytics-and-data.html
Künzel, S.R., Sekhon, J.S., Bickel, P.J., Yu, B.: Metalearners for estimating heterogeneous treatment effects using machine learning. Proc. Natl. Acad. Sci. U.S.A. 116(10), 4156–4165 (2019). https://doi.org/10.1073/pnas.1804597116
Li, A., Pearl, J.: Unit selection based on counterfactual logic. In: IJCAI, International Joint Conferences on Artificial Intelligence Organization, pp. 1793–1799 (2019). https://doi.org/10.24963/ijcai.2019/248
Liu, X.-Y., Wu, J., Zhou, Z.-H.: Exploratory undersampling for class-imbalance learning. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 39(2), 539–550 (2009). https://doi.org/10.1109/tsmcb.2008.2007853
Rößler, J., Schoder, D.: Bridging the gap: a systematic benchmarking of uplift modeling and heterogeneous treatment effects methods. J. Interact. Mark. 57(4), 629–650 (2022)
Verhelst, T., Mercier, D., Shrestha, J., Bontempi, G.: Partial counterfactual identification and uplift modeling: theoretical results and real-world assessment. Mach. Learn. (2023). https://doi.org/10.1007/s10994-023-06317-w
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Verhelst, T., Mercier, D., Shestha, J., Bontempi, G. (2025). A Churn Prediction Dataset from the Telecom Sector: A New Benchmark for Uplift Modeling. In: Meo, R., Silvestri, F. (eds) Machine Learning and Principles and Practice of Knowledge Discovery in Databases. ECML PKDD 2023. Communications in Computer and Information Science, vol 2136. Springer, Cham. https://doi.org/10.1007/978-3-031-74640-6_21
Download citation
DOI: https://doi.org/10.1007/978-3-031-74640-6_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-74639-0
Online ISBN: 978-3-031-74640-6
eBook Packages: Artificial Intelligence (R0)