Skip to main content
Log in

Feature-selection-based dynamic transfer ensemble model for customer churn prediction

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Customer churn prediction is one of the key steps to maximize the value of customers for an enterprise. It is difficult to get satisfactory prediction effect by traditional models constructed on the assumption that the training and test data are subject to the same distribution, because the customers usually come from different districts and may be subject to different distributions in reality. This study proposes a feature-selection-based dynamic transfer ensemble (FSDTE) model that aims to introduce transfer learning theory for utilizing the customer data in both the target and related source domains. The model mainly conducts a two-layer feature selection. In the first layer, an initial feature subset is selected by GMDH-type neural network only in the target domain. In the second layer, several appropriate patterns from the source domain to target training set are selected, and some features with higher mutual information between them and the class variable are combined with the initial subset to construct a new feature subset. The selection in the second layer is repeated several times to generate a series of new feature subsets, and then, we train a base classifier in each one. Finally, a best base classifier is selected dynamically for each test pattern. The experimental results in two customer churn prediction datasets show that FSDTE can achieve better performance compared with the traditional churn prediction strategies, as well as three existing transfer learning strategies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Dyché J (2001) The CRM handbook: a business guide to customer relationship management. Addison-Wesley, Reading

    Google Scholar 

  2. Bhattacharya CB (1998) When customers are members: customer retention in paid membership contexts. J Acad Market Sci 26(1):31–44

    Article  Google Scholar 

  3. Neslin SA, Gupta S, Kamakura W, Lu JX, Mason CH (2006) Detection defection: measuring and understanding the predictive accuracy of customer churn models. J Market Res 43(2):204–211

    Article  Google Scholar 

  4. Au W, Chan KCC, Yao X (2004) A novel evolutionary data mining algorithm with applications to churn prediction. IEEE T Evol Comput 7(6):532–545

    Google Scholar 

  5. Kisioglu P, Topcu YI (2011) Applying Bayesian belief network approach to customer churn analysis: a case study on the telecom industry of Turkey. Expert Syst Appl 38(6):7151–7157

    Article  Google Scholar 

  6. Pendharkar PC (2005) A threshold-varying artificial neural network approach for classification and its application to bankruptcy prediction problem. Comput Oper Res 32(10):2561–2582

    Article  MATH  Google Scholar 

  7. Wei CP, Chiu IT (2002) Turning telecommunications call details to churn prediction: a data mining approach. Expert Syst Appl 23(2):103–112

    Article  Google Scholar 

  8. Zhao Y, Li B, Li X, Liu W, Ren S (2005) Customer churn prediction using improved one-class support vector machine. In: Li X, Wang S, Dong ZY (eds) ADMA 2005, LNAI 3584. Springer, Berlin, pp 300–306

    Google Scholar 

  9. Wang BX, Japkowicz N (2010) Boosting support vector machines for imbalanced data sets. Knowl Inf Syst 25(1):1–20

    Article  Google Scholar 

  10. Yang Q, Wu X (2006) 10 challenging problems in data mining research. Int J Inf Tech Decis 5(4):597–604

    Article  Google Scholar 

  11. Verbeke W, Martens D, Mues C, Baesens B (2011) Building comprehensible customer churn prediction models with advanced rule induction techniques. Expert Syst Appl 38(3):2354–2364

    Article  Google Scholar 

  12. Xia G, Jin W (2008) Model of customer churn prediction on support vector machine. Syst Eng Theor Pract 28(1):71–77

    Article  Google Scholar 

  13. Lemmens A, Croux C (2006) Bagging and boosting classification trees to predict churn. J Market Res 43(2):276–286

    Article  Google Scholar 

  14. Glady N, Baesens B, Croux C (2009) Modeling churn using customer lifetime value. Eur J Oper Res 197(1):402–411

    Article  MATH  Google Scholar 

  15. Vapnik V (1998) Statistical learning theory. Wiley, New York

    MATH  Google Scholar 

  16. Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE T Knowl Data En 22(10):1345–1359

    Article  Google Scholar 

  17. Kittler J, Hatef M, Duin RPW, Matas J (1998) On combining classifiers. IEEE T Pattern Anal 20(3):226–239

    Article  Google Scholar 

  18. Amanifard N, Nariman-Zadeh N, Borji M, Khalkhali A, Habibdoust A (2008) Modelling and Pareto optimization of heat transfer and flow coefficients in microchannels using GMDH type neural networks and genetic algorithms. Energ Convers Manag 49(2):311–325

    Article  Google Scholar 

  19. Ivakhnenko AG (1976) The group method of data handling in prediction problems. Soviet Autom Contr 9(6):21–30

    MathSciNet  Google Scholar 

  20. Ranawana R, Palade V (2006) Multi-classifier systems: review and a roadmap for developers. Int J Hybr Intell Syst 3(1):35–61

    MATH  Google Scholar 

  21. Hansen LK, Salamon P (1990) Neural network ensembles. IEEE T Pattern Anal 12(10):993–1001

    Article  Google Scholar 

  22. Woods K, Kegelmeyer WP, Bowyer K (1997) Combination of multiple classifiers using local accuracy estimates. IEEE T Pattern Anal 19(4):405–410

    Article  Google Scholar 

  23. Kuncheva L, Whitaker C (2003) Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Mach Learn 51(2):181–207

    Article  MATH  Google Scholar 

  24. Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140

    MATH  MathSciNet  Google Scholar 

  25. Ho TK (1998) The random space method for constructing decision forests. IEEE T Pattern Anal 20(8):832–844

    Article  Google Scholar 

  26. Zhu X, Wu X, Yang Y (2006) Effective classification of noisy data streams with attribute-oriented dynamic classifier selection. Knowl Inf Syst 9(3):339–363

    Article  MathSciNet  Google Scholar 

  27. Ko AHR, Sabourin R, Britto AS Jr (2008) From dynamic classifier selection to dynamic ensemble selection. Pattern Recogn 41(5):1718–1731

    Google Scholar 

  28. Bi W, Shi Y, Lan Z (2009) Transferred feature selection. In: Proceedings of IEEE international conference on data mining workshops, pp 416–421

  29. Kamishima T, Hamasaki M, Akaho S (2009) TrBagg: a simple transfer learning method and its application to personalization in collaborative tagging. In: Proceedings of ninth IEEE international conference on data mining, Miami, FL, USA, pp 219–228

  30. Dai W, Yang Q, Xue GR, Yu Y (2007) Boosting for transfer learning. In: Proceedings of the 24th international conference on machine learning, pp 193–200

  31. Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139

    Article  MATH  MathSciNet  Google Scholar 

  32. Mueller JA, Lemke F (2000) Self-organising data mining: an intelligent approach to extract knowledge from data. Libri

  33. Abdel-Aal RE, Elhadidy MA, Shaahid SM (2008) Modeling and forecasting the mean hourly wind speed time series using GMDH-based abductive networks. Renew Energ 34(7):1686–1699

    Article  Google Scholar 

  34. Puig V, Witczak M, Nejjari F, Quevedo J, Korbicz J (2007) A GMDH neural network-based approach to passive robust fault detection using a constraint satisfaction backward test. Eng Appl Artif Intell 20:886–897

    Article  Google Scholar 

  35. Xiao J, He CZ, Jiang XY, Liu DH (2010) A dynamic classifier ensemble selection approach for noise data. Inform Sci 180(18):3402–3421

    Article  Google Scholar 

  36. Xiao J, Xie L, He CZ, Jiang XY (2012) Dynamic classifier ensemble model for customer classification with imbalanced class distribution. Expert Syst Appl 39(3):3668–3675

    Article  Google Scholar 

  37. He CZ (2005) Self-organising data mining and economic forecasting. Science Publish, Beijing

    Google Scholar 

  38. Merz C, Murphy P (1995) UCI repository of machine learning databases. http://www.ics.uci.edu/~mlearn/MLRepository.html

  39. Friedman JH (2003) On multivariate goodness-of-fit and two-sample testing. In: Proceedings of Phystat 2003. SLAC, Stanford, CA, pp 1–3

  40. Cortes C, Vapnik V (1995) Support vector networks. Mach Learn 20:273–297

    MATH  Google Scholar 

  41. Tsymbal A, Puuronen S, Patterson DW (2003) Ensemble feature selection with the simple Bayesian classification. Inform Fusion 4(2):87–100

    Article  Google Scholar 

  42. Doumpos M, Zopounidis C (2004) A multicriteria classification approach based on pairwise comparisons. Eur J Oper Res 158(2):378–389

    Article  MATH  MathSciNet  Google Scholar 

  43. Van den Poel D, Buckinx W (2005) Predicting online-purchasing behaviour. Eur J Oper Res 166(2):557–575

    Article  MATH  Google Scholar 

  44. McNemar Q (1947) Note on the sampling error of differences between correlated proportions and percentages. Psychometrica 12:153–157

    Article  Google Scholar 

Download references

Acknowledgments

Thanks to the anonymous reviewers and the editor for helpful comments on earlier version of this paper. This research is partly supported by the Natural Science Foundation of China under Grant Nos. 71101100, 70731160635, and 71273036, New Teachers’ Fund for Doctor Stations, Ministry of Education under Grant No. 20110181120047, Excellent Youth fund of Sichuan University under Grant No. 2013SCU04A08, China Postdoctoral Science Foundation under Grant Nos. 2011M500418, 2012T50148 and 2013M530753, Frontier and Cross-innovation Foundation of Sichuan University under Grant No. skqy201352, Soft Science Foundation of Sichuan Province under Grant No. 2013ZR0016, Humanities and Social Sciences Youth Foundation of the Ministry of Education of PR China under Grant No. 11YJC870028, and Selfdetermined Research Funds of CCNU from the Colleges’ Basic Research and Operation of MOE under Grant No. CCNU13F030.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shouyang Wang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xiao, J., Xiao, Y., Huang, A. et al. Feature-selection-based dynamic transfer ensemble model for customer churn prediction. Knowl Inf Syst 43, 29–51 (2015). https://doi.org/10.1007/s10115-013-0722-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-013-0722-y

Keywords

Navigation