skip to main content
10.1145/3224207.3224217acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicdpaConference Proceedingsconference-collections
research-article

Investigating Decision Tree in Churn Prediction with Class Imbalance

Authors Info & Claims
Published:12 May 2018Publication History

ABSTRACT

Class imbalance presents significant challenges to customer churn prediction. Traditional machine learning algorithms like decision tree tend to be biased towards majority class. In this paper, we comprehensively study the performance of decision tree in churn prediction with class imbalance. We investigate the issue of pruning setting and optimal sampling strategy based on a recently developed expected maximum profit criterion. The experiments present some different conclusions from the previous research when the area under the ROC curve is used and the optimal sampling strategy are recommended. Our findings provides a useful guideline for usage of decision tree in churn prediction.

References

  1. Tamaddoni Jahromi, A., Stakhovych, S., and Ewing, M. 2014. Managing B2B customer churn, retention and profitability. Industrial Marketing Management. 43, 7, 1258--1268.Google ScholarGoogle ScholarCross RefCross Ref
  2. M. Colgate and P. Danaher. 2000. Implementing a customer relationship strategy: the asymmetric impact of poor versus excellent execution. Journal of the Academy of Marketing Science. 28, 3, 375--387.Google ScholarGoogle ScholarCross RefCross Ref
  3. Garcia, D.L., Nebot, A., and Vellido, A. 2017. Intelligent data analysis approaches to churn as a business problem: a survey. Knowledge and Information Systems. 51, 3, 719--744. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Verbraken, T., Verbeke, W., and Baesens, B. 2013. A novel profit maximizing metric for measuring classification performance of customer churn prediction models. IEEE Transactions on Knowledge and Data Engineering. 25, 5, 961--973. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Verbeke, W. Dejaeger, K, Martens, D, Hur, J, Baesens, B., et al. 2012. New insights into churn prediction in the telecommunication sector: a profit driven data mining approach. European Journal of Operational Research. 218, 1, 211--229.Google ScholarGoogle ScholarCross RefCross Ref
  6. Keramati, A., Jafari-Marandi, R. Aliannejadi, M., Ahmadian, I, M. Mozaffari, M., Abbasi, U. 2014. Improved churn prediction in telecommunication industry using data mining techniques. Applied Soft Computing Journal. 994--1012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Maimon, L. R. O. 2008. Data Mining with decision trees: theory and applications. World Scientific Publishing Company. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Haibo H, E.A., and Garcia, E.A. 2009. Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering. 21, 9, 1263--1284. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Chawla, N.V., Bowyer, K.W., Hall, L.O., and Kegelmeyer, W.P. 2002. SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research. 16, 3, 321--357. Google ScholarGoogle ScholarCross RefCross Ref
  10. Ali, O., and Ariturk, U. 2014. Dynamic churn prediction framework with more effective use of rare event data: the case of private banking. Expert Systems with Applications. 41, 17, 7889--7903. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Correa Bahnsen, A., Aouada, D., and Ottersten, B. 2015. Example-dependent cost-sensitive decision trees. Expert Systems with Applications. 42, 19, 6609--6619. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Chawla, N.V. 2003. C4.5 and imbalanced data sets: investigating the effect of sampling method, probabilistic estimate, and decision tree structure. In Proceedings of the ICML2003.Google ScholarGoogle Scholar
  13. Weiss, G.M., and Provost, F. 2003. Learning when training data are costly: the effect of class distribution on tree induction. Journal of Artificial Intelligence Research. 19, 315--354. Google ScholarGoogle ScholarCross RefCross Ref
  14. Raeder, T., Forman, G., and Chawla, N.V. 2012. Learning from imbalanced data: evaluation matters. Data Mining: Found. Intell. Paradigms Springer. 315--331.Google ScholarGoogle Scholar

Index Terms

  1. Investigating Decision Tree in Churn Prediction with Class Imbalance

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        ICDPA 2018: Proceedings of the International Conference on Data Processing and Applications
        May 2018
        73 pages
        ISBN:9781450364188
        DOI:10.1145/3224207

        Copyright © 2018 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 12 May 2018

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader