ABSTRACT
Class imbalance presents significant challenges to customer churn prediction. Traditional machine learning algorithms like decision tree tend to be biased towards majority class. In this paper, we comprehensively study the performance of decision tree in churn prediction with class imbalance. We investigate the issue of pruning setting and optimal sampling strategy based on a recently developed expected maximum profit criterion. The experiments present some different conclusions from the previous research when the area under the ROC curve is used and the optimal sampling strategy are recommended. Our findings provides a useful guideline for usage of decision tree in churn prediction.
- Tamaddoni Jahromi, A., Stakhovych, S., and Ewing, M. 2014. Managing B2B customer churn, retention and profitability. Industrial Marketing Management. 43, 7, 1258--1268.Google ScholarCross Ref
- M. Colgate and P. Danaher. 2000. Implementing a customer relationship strategy: the asymmetric impact of poor versus excellent execution. Journal of the Academy of Marketing Science. 28, 3, 375--387.Google ScholarCross Ref
- Garcia, D.L., Nebot, A., and Vellido, A. 2017. Intelligent data analysis approaches to churn as a business problem: a survey. Knowledge and Information Systems. 51, 3, 719--744. Google ScholarDigital Library
- Verbraken, T., Verbeke, W., and Baesens, B. 2013. A novel profit maximizing metric for measuring classification performance of customer churn prediction models. IEEE Transactions on Knowledge and Data Engineering. 25, 5, 961--973. Google ScholarDigital Library
- Verbeke, W. Dejaeger, K, Martens, D, Hur, J, Baesens, B., et al. 2012. New insights into churn prediction in the telecommunication sector: a profit driven data mining approach. European Journal of Operational Research. 218, 1, 211--229.Google ScholarCross Ref
- Keramati, A., Jafari-Marandi, R. Aliannejadi, M., Ahmadian, I, M. Mozaffari, M., Abbasi, U. 2014. Improved churn prediction in telecommunication industry using data mining techniques. Applied Soft Computing Journal. 994--1012. Google ScholarDigital Library
- Maimon, L. R. O. 2008. Data Mining with decision trees: theory and applications. World Scientific Publishing Company. Google ScholarDigital Library
- Haibo H, E.A., and Garcia, E.A. 2009. Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering. 21, 9, 1263--1284. Google ScholarDigital Library
- Chawla, N.V., Bowyer, K.W., Hall, L.O., and Kegelmeyer, W.P. 2002. SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research. 16, 3, 321--357. Google ScholarCross Ref
- Ali, O., and Ariturk, U. 2014. Dynamic churn prediction framework with more effective use of rare event data: the case of private banking. Expert Systems with Applications. 41, 17, 7889--7903. Google ScholarDigital Library
- Correa Bahnsen, A., Aouada, D., and Ottersten, B. 2015. Example-dependent cost-sensitive decision trees. Expert Systems with Applications. 42, 19, 6609--6619. Google ScholarDigital Library
- Chawla, N.V. 2003. C4.5 and imbalanced data sets: investigating the effect of sampling method, probabilistic estimate, and decision tree structure. In Proceedings of the ICML2003.Google Scholar
- Weiss, G.M., and Provost, F. 2003. Learning when training data are costly: the effect of class distribution on tree induction. Journal of Artificial Intelligence Research. 19, 315--354. Google ScholarCross Ref
- Raeder, T., Forman, G., and Chawla, N.V. 2012. Learning from imbalanced data: evaluation matters. Data Mining: Found. Intell. Paradigms Springer. 315--331.Google Scholar
Index Terms
- Investigating Decision Tree in Churn Prediction with Class Imbalance
Recommendations
Application of Active Learning for Churn Prediction with Class Imbalance
ICMLT '18: Proceedings of the 2018 International Conference on Machine Learning TechnologiesChurn prediction is a major focus that all the companies need to concern. Many studies have shown that class imbalance has a significant impact on churn prediction, but there is still no consensus on which technique is the best to cope with this issue. ...
An empirical comparison of techniques for the class imbalance problem in churn prediction
State-of-the-art solutions to class imbalance in churn prediction are compared.An experimental evaluation is done with 21 techniques and 11 real-world data sets.The expected maximum profit measure is used together with AUC and top-decile lift.Results ...
Decision tree induction based on minority entropy for the class imbalance problem
Most well-known classifiers can predict a balanced data set efficiently, but they misclassify an imbalanced data set. To overcome this problem, this research proposes a new impurity measure called minority entropy, which uses information from the ...
Comments