Abstract
Customer churn prediction is one of the key steps to maximize the value of customers for an enterprise. It is difficult to get satisfactory prediction effect by traditional models constructed on the assumption that the training and test data are subject to the same distribution, because the customers usually come from different districts and may be subject to different distributions in reality. This study proposes a feature-selection-based dynamic transfer ensemble (FSDTE) model that aims to introduce transfer learning theory for utilizing the customer data in both the target and related source domains. The model mainly conducts a two-layer feature selection. In the first layer, an initial feature subset is selected by GMDH-type neural network only in the target domain. In the second layer, several appropriate patterns from the source domain to target training set are selected, and some features with higher mutual information between them and the class variable are combined with the initial subset to construct a new feature subset. The selection in the second layer is repeated several times to generate a series of new feature subsets, and then, we train a base classifier in each one. Finally, a best base classifier is selected dynamically for each test pattern. The experimental results in two customer churn prediction datasets show that FSDTE can achieve better performance compared with the traditional churn prediction strategies, as well as three existing transfer learning strategies.
Similar content being viewed by others
References
Dyché J (2001) The CRM handbook: a business guide to customer relationship management. Addison-Wesley, Reading
Bhattacharya CB (1998) When customers are members: customer retention in paid membership contexts. J Acad Market Sci 26(1):31–44
Neslin SA, Gupta S, Kamakura W, Lu JX, Mason CH (2006) Detection defection: measuring and understanding the predictive accuracy of customer churn models. J Market Res 43(2):204–211
Au W, Chan KCC, Yao X (2004) A novel evolutionary data mining algorithm with applications to churn prediction. IEEE T Evol Comput 7(6):532–545
Kisioglu P, Topcu YI (2011) Applying Bayesian belief network approach to customer churn analysis: a case study on the telecom industry of Turkey. Expert Syst Appl 38(6):7151–7157
Pendharkar PC (2005) A threshold-varying artificial neural network approach for classification and its application to bankruptcy prediction problem. Comput Oper Res 32(10):2561–2582
Wei CP, Chiu IT (2002) Turning telecommunications call details to churn prediction: a data mining approach. Expert Syst Appl 23(2):103–112
Zhao Y, Li B, Li X, Liu W, Ren S (2005) Customer churn prediction using improved one-class support vector machine. In: Li X, Wang S, Dong ZY (eds) ADMA 2005, LNAI 3584. Springer, Berlin, pp 300–306
Wang BX, Japkowicz N (2010) Boosting support vector machines for imbalanced data sets. Knowl Inf Syst 25(1):1–20
Yang Q, Wu X (2006) 10 challenging problems in data mining research. Int J Inf Tech Decis 5(4):597–604
Verbeke W, Martens D, Mues C, Baesens B (2011) Building comprehensible customer churn prediction models with advanced rule induction techniques. Expert Syst Appl 38(3):2354–2364
Xia G, Jin W (2008) Model of customer churn prediction on support vector machine. Syst Eng Theor Pract 28(1):71–77
Lemmens A, Croux C (2006) Bagging and boosting classification trees to predict churn. J Market Res 43(2):276–286
Glady N, Baesens B, Croux C (2009) Modeling churn using customer lifetime value. Eur J Oper Res 197(1):402–411
Vapnik V (1998) Statistical learning theory. Wiley, New York
Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE T Knowl Data En 22(10):1345–1359
Kittler J, Hatef M, Duin RPW, Matas J (1998) On combining classifiers. IEEE T Pattern Anal 20(3):226–239
Amanifard N, Nariman-Zadeh N, Borji M, Khalkhali A, Habibdoust A (2008) Modelling and Pareto optimization of heat transfer and flow coefficients in microchannels using GMDH type neural networks and genetic algorithms. Energ Convers Manag 49(2):311–325
Ivakhnenko AG (1976) The group method of data handling in prediction problems. Soviet Autom Contr 9(6):21–30
Ranawana R, Palade V (2006) Multi-classifier systems: review and a roadmap for developers. Int J Hybr Intell Syst 3(1):35–61
Hansen LK, Salamon P (1990) Neural network ensembles. IEEE T Pattern Anal 12(10):993–1001
Woods K, Kegelmeyer WP, Bowyer K (1997) Combination of multiple classifiers using local accuracy estimates. IEEE T Pattern Anal 19(4):405–410
Kuncheva L, Whitaker C (2003) Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Mach Learn 51(2):181–207
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
Ho TK (1998) The random space method for constructing decision forests. IEEE T Pattern Anal 20(8):832–844
Zhu X, Wu X, Yang Y (2006) Effective classification of noisy data streams with attribute-oriented dynamic classifier selection. Knowl Inf Syst 9(3):339–363
Ko AHR, Sabourin R, Britto AS Jr (2008) From dynamic classifier selection to dynamic ensemble selection. Pattern Recogn 41(5):1718–1731
Bi W, Shi Y, Lan Z (2009) Transferred feature selection. In: Proceedings of IEEE international conference on data mining workshops, pp 416–421
Kamishima T, Hamasaki M, Akaho S (2009) TrBagg: a simple transfer learning method and its application to personalization in collaborative tagging. In: Proceedings of ninth IEEE international conference on data mining, Miami, FL, USA, pp 219–228
Dai W, Yang Q, Xue GR, Yu Y (2007) Boosting for transfer learning. In: Proceedings of the 24th international conference on machine learning, pp 193–200
Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139
Mueller JA, Lemke F (2000) Self-organising data mining: an intelligent approach to extract knowledge from data. Libri
Abdel-Aal RE, Elhadidy MA, Shaahid SM (2008) Modeling and forecasting the mean hourly wind speed time series using GMDH-based abductive networks. Renew Energ 34(7):1686–1699
Puig V, Witczak M, Nejjari F, Quevedo J, Korbicz J (2007) A GMDH neural network-based approach to passive robust fault detection using a constraint satisfaction backward test. Eng Appl Artif Intell 20:886–897
Xiao J, He CZ, Jiang XY, Liu DH (2010) A dynamic classifier ensemble selection approach for noise data. Inform Sci 180(18):3402–3421
Xiao J, Xie L, He CZ, Jiang XY (2012) Dynamic classifier ensemble model for customer classification with imbalanced class distribution. Expert Syst Appl 39(3):3668–3675
He CZ (2005) Self-organising data mining and economic forecasting. Science Publish, Beijing
Merz C, Murphy P (1995) UCI repository of machine learning databases. http://www.ics.uci.edu/~mlearn/MLRepository.html
Friedman JH (2003) On multivariate goodness-of-fit and two-sample testing. In: Proceedings of Phystat 2003. SLAC, Stanford, CA, pp 1–3
Cortes C, Vapnik V (1995) Support vector networks. Mach Learn 20:273–297
Tsymbal A, Puuronen S, Patterson DW (2003) Ensemble feature selection with the simple Bayesian classification. Inform Fusion 4(2):87–100
Doumpos M, Zopounidis C (2004) A multicriteria classification approach based on pairwise comparisons. Eur J Oper Res 158(2):378–389
Van den Poel D, Buckinx W (2005) Predicting online-purchasing behaviour. Eur J Oper Res 166(2):557–575
McNemar Q (1947) Note on the sampling error of differences between correlated proportions and percentages. Psychometrica 12:153–157
Acknowledgments
Thanks to the anonymous reviewers and the editor for helpful comments on earlier version of this paper. This research is partly supported by the Natural Science Foundation of China under Grant Nos. 71101100, 70731160635, and 71273036, New Teachers’ Fund for Doctor Stations, Ministry of Education under Grant No. 20110181120047, Excellent Youth fund of Sichuan University under Grant No. 2013SCU04A08, China Postdoctoral Science Foundation under Grant Nos. 2011M500418, 2012T50148 and 2013M530753, Frontier and Cross-innovation Foundation of Sichuan University under Grant No. skqy201352, Soft Science Foundation of Sichuan Province under Grant No. 2013ZR0016, Humanities and Social Sciences Youth Foundation of the Ministry of Education of PR China under Grant No. 11YJC870028, and Selfdetermined Research Funds of CCNU from the Colleges’ Basic Research and Operation of MOE under Grant No. CCNU13F030.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Xiao, J., Xiao, Y., Huang, A. et al. Feature-selection-based dynamic transfer ensemble model for customer churn prediction. Knowl Inf Syst 43, 29–51 (2015). https://doi.org/10.1007/s10115-013-0722-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-013-0722-y