Abstract
Knowledge-based churn prediction and decision making is invaluable for telecom companies due to highly competitive markets. The comprehensiveness and action ability of a data-driven churn prediction system depend on the effective extraction of hidden patterns from the data. Generally, data analytics is employed to extrapolate the extracted patterns from the training dataset to the test set. In this study, one more step is taken; the improved prediction performance is attained by capturing the individuality of each customer while discovering the hidden pattern from the train set and then applying all the knowledge to the test set. The proposed churn prediction system is developed using artificial neural networks that take advantage of both self-organizing and error-driven learning approaches (ChP-SOEDNN). We are introducing a new dimension to the study of churn prediction in telecom industry: a systematic and profit-driven churn decision-making framework. The comparison of the ChP-SOEDNN with other techniques shows its superiority regarding both accuracy and misclassification cost. Misclassification cost is a realistic criterion this article introduces to measure the success of a method in finding the best set of decisions that leads to the minimum possible loss of profit. Moreover, ChP-SOEDNN shows capability in devising a cost-efficient retention strategy for each cluster of customers, in addition to strength in dealing with the typical issue of imbalanced class distribution that is common in churn prediction problems.














Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.References
Ahmed U, Khan A, Khan SH, Basit A, Haq IU, Lee YS (2019) Transfer learning and meta classification based deep churn prediction system for telecom industry. Preprint arXiv:1901.06091
Amin A, Al-Obeidat F, Shah B, Adnan A, Loo J, Anwar S (2019) Customer churn prediction in telecommunication industry using data certainty. J Bus Res 94:290–301
Amin A, Anwar S, Adnan A, Nawaz M, Alawfi K, Hussain A, Huang K (2017) Customer churn prediction in the telecommunication sector using a rough set approach. Neurocomputing 237:242–254
Amin A, Anwar S, Adnan A, Nawaz M, Howard N, Qadir J, Hawalah A, Hussain A (2016) Comparing oversampling techniques to handle the class imbalance problem: a customer churn prediction case study. IEEE Access 4:7940–7957
Bahnsen AC, Aouada D, Ottersten B (2015) Example-dependent cost-sensitive decision trees. Expert Syst Appl 42(19):6609–6619
Bahnsen AC, Aouada D, Ottersten B (2015) A novel cost-sensitive framework for customer churn predictive modeling. Decis Anal 2(1):5
Berger PD, Nasr NI (1998) Customer lifetime value: Marketing models and applications. J Interact Mark 12(1):17–30
Bi W, Cai M, Liu M, Li G (2016) A big data clustering algorithm for mitigating the risk of customer churn. IEEE Trans Ind Inf 12(3):1270–1281
Burez J, Van den Poel D (2009) Handling class imbalance in customer churn prediction. Expert Syst Appl 36(3):4626–4636
Chen Z-Y, Fan Z-P (2012) Distributed customer behavior prediction using multiplex data: a collaborative MK-SVM approach. Knowl Based Syst 35:111–119
Chen Z-Y, Fan Z-P, Sun M (2012) A hierarchical multiple kernel support vector machine for customer churn prediction using longitudinal behavioral data. Eur J Oper Res 223(2):461–472
De Bock KW, Van den Poel D (2012) Reconciling performance and interpretability in customer churn prediction using ensemble learning based on generalized additive models. Expert Syst Appl 39(8):6816–6826
Ekinci Y, Ülengin F, Uray N, Ülengin B (2014) Analysis of customer lifetime value and marketing expenditure decisions through a Markovian-based model. Eur J Oper Res 237(1):278–288
Fader PS, Hardie BG, Lee KL (2005) RFM and CLV: using iso-value curves for customer base analysis. J Mark Res 42(4):415–430
García DL, Nebot À, Vellido A (2017) Intelligent data analysis approaches to churn as a business problem: a survey. Knowl Inf Syst 51(3):719–774
Glady N, Baesens B, Croux C (2009) Modeling churn using customer lifetime value. Eur J Oper Res 197(1):402–411
Gruca TS, Rego LL (2005) Customer satisfaction, cash flow, and shareholder value. J Mark 69(3):115–130
Gupta S, Lehmann DR, Stuart JA (2004) Valuing customers. J Mark Res 41(1):7–18
Gurney K (2014) Multilayer nets and backpropagation. In: An introduction to neural networks, 1st edn. CRC Press, Boca Raton, pp 41–57
Han S, Yuan B, Liu W (2009) Rare class mining: progress and prospect. In: CCPR 2009. Chinese conference on pattern recognition, 2009. IEEE, New York, pp 1–5
Höppner S, Stripling E, Baesens B, vanden Broucke S, Verdonck T (2018) Profit driven decision trees for churn prediction. Eur J Oper Res 286(3):920–933
Huang B, Kechadi MT, Buckley B (2012) Customer churn prediction in telecommunications. Expert Syst Appl 39(1):1414–1425
Huang Y, Kechadi T (2013) An effective hybrid learning system for telecommunication churn prediction. Expert Syst Appl 40(14):5635–5647
Idris A, Khan A, Lee YS (2012) Genetic programming and adaboosting based churn prediction for telecom. In: 2012 IEEE international conference on systems, man, and cybernetics (SMC). IEEE, New York, pp 1328–1332
Idris A, Khan A, Lee YS (2013) Intelligent churn prediction in telecom: employing mRMR feature selection and RotBoost based ensemble classification. Appl Intell 39(3):659–672
Idris A, Rizwan M, Khan A (2012) Churn prediction in telecom using random forest and PSO based data balancing in combination with various feature selection strategies. Comput Electr Eng 38(6):1808–1819
Jafari-Marandi AKR (2014) Webpage clustering—taking the zero step: a case study of an Iranian website. J Web Eng 13(3–4):333–360
Jafari-Marandi R, Davarzani S, Gharibdousti MS, Smith BK (2018) An optimum ANN-based breast cancer diagnosis: bridging gaps between ANN learning and decision-making goals. Appl Soft Comput 72:108–120
Jafari-Marandi R, Khanzadeh M, Smith BK, Bian L (2017) Self-organizing and error driven (SOED) artificial neural network for smarter classifications. J Comput Des Eng 4(4):282–304
Jafari-Marandi R, Khanzadeh M, Tian W, Smith B, Bian L (2019) From in-situ monitoring toward high-throughput process control: cost-driven decision-making framework for laser-based additive manufacturing. J Manufact Syst 51:29–41
Keramati A, Jafari-Marandi R, Aliannejadi M, Ahmadian I, Mozaffari M, Abbasi U (2014) Improved churn prediction in telecommunication industry using data mining techniques. Appl Soft Comput 24:994–1012
Khan A, Sohail A, Ali A (2018) A new channel boosted convolutional neural network using transfer learning. Preprint arXiv:1804.08528
Kohonen T (2013) Essentials of the self-organizing map. Neural Netw 37:52–65
Lee H, Lee Y, Cho H, Im K, Kim YS (2011) Mining churning behaviors and developing retention strategies based on a partial least squares (PLS) model. Decis Support Syst 52(1):207–216
Lemmens A, Gupta S (2017) Managing churn to maximize profits. Working paper
Lemmens A, Gupta S (2017) Managing churn to maximize profits. Available at SSRN 2964906
Liu Y, Zhuang Y (2015) Research model of churn prediction based on customer segmentation and misclassification cost in the context of big data. J Comput Commun 3(06):87
Lu N, Lin H, Lu J, Zhang G (2014) A customer churn prediction model in telecom industry using boosting. IEEE Trans Ind Inf 10(2):1659–1665
Maldonado S, López J, Vairetti C (2019) Profit-based churn prediction based on minimax probability machines. Eur J Oper Res 284(1):273–284
Mazurowski MA, Habas PA, Zurada JM, Lo JY, Baker JA, Tourassi GD (2008) Training neural network classifiers for medical decision making: the effects of imbalanced datasets on classification performance. Neural Netw 21(2):427–436
Meilă M (2007) Comparing clusterings—an information based distance. J Multivar Anal 98(5):873–895
Organization WH (2010) World health statistics. World Health Organization, New York
Prashanth R, Deepak K, Meher AK (2017) High accuracy predictive modelling for customer churn prediction in telecom industry. In: International conference on machine learning and data mining in pattern recognition. Springer, Berlin, pp 391–402
Reinartz WJ, Kumar V (2003) The impact of customer relationship characteristics on profitable lifetime duration. J Mark 67(1):77–99
Risselada H, Verhoef PC, Bijmolt TH (2010) Staying power of churn prediction models. J Interact Mark 24(3):198–208
Saito T, Rehmsmeier M (2015) The precision–recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS ONE 10(3):e0118432
Sheng VS, Ling CX (2006) Thresholding for making classifiers cost-sensitive. In: AAAI, pp 476–481
Stripling E, vanden Broucke S, Antonio K, Baesens B, Snoeck M (2018) Profit maximizing logistic model for customer churn prediction using genetic algorithms. Swarm Evol Comput 40:116–130
Sun Y, Kamel MS, Wong AK, Wang Y (2007) Cost-sensitive boosting for classification of imbalanced data. Pattern Recogn 40(12):3358–3378
Tan PN, Steinbach M, Kumar V (2016) Introduction to data mining. Pearson Education, India
Tang L, Thomas L, Fletcher M, Pan J, Marshall A (2014) Assessing the impact of derived behavior information on customer attrition in the financial service industry. Eur J Oper Res 236(2):624–633
Ullah I, Raza B, Malik AK, Imran M, Islam SU, Kim SW (2019) A churn prediction model using random forest: analysis of machine learning techniques for churn prediction and factor identification in telecom sector. IEEE Access 7:60134–60149
van Wezel M, Potharst R (2007) Improved customer choice predictions using ensemble methods. Eur J Oper Res 181(1):436–452
Verbeke W, Dejaeger K, Martens D, Hur J, Baesens B (2012) New insights into churn prediction in the telecommunication sector: a profit driven data mining approach. Eur J Oper Res 218(1):211–229
Verbraken T, Verbeke W, Baesens B (2013) A novel profit maximizing metric for measuring classification performance of customer churn prediction models. IEEE Trans Knowl Data Eng 25(5):961–973
Wei C-P, Chiu I-T (2002) Turning telecommunications call details to churn prediction: a data mining approach. Expert Syst Appl 23(2):103–112
Zhang C, Ni M, Yin H, Qiu K (2018) Developed density peak clustering with support vector data description for access network intrusion detection. IEEE Access 6:46356–46362
Zhou Z-H, Liu X-Y (2006) Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans Knowl Data Eng 18(1):63–77
Zhu B, Baesens B, vanden Broucke SK, (2017) An empirical comparison of techniques for the class imbalance problem in churn prediction. Inf Sci 408:84–99
Zhu H, Wang X (2017) A cost-sensitive semi-supervised learning model based on uncertainty. Neurocomputing 251:106–114
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Supplementary Video 1 (MP4 74563 kb)
Appendices
Appendix 1
Accepting the predicted membership of test set based on the SOM that is trained using train set and validation set as the test set membership is an important assumption. Therefore, it has been tested by comparing the similarity between the membership of test set cases when SOM learns with all customers in the dataset and when SOM learns only using train and validation sets.
In the literature, there are measures for comparing the similarities of two sets of clusterings. Clustering, unlike classification, does not have the luxury of class labels and this fact creates challenges to compare and evaluate the performance of different methods. There are two proposed approaches for comparing two sets of clusters (C and C′) on the same set of data points: counting pairs and set matching [41]. We only introduce two common measures to compare clusters through counting pairs. When comparing C and C′, each pair of data points falls under one of four cases based on their memberships in C and C′. These four cases are denoted by \(\bullet \bullet\), \(\circ \circ\), \(\bullet \circ\), \(\circ \bullet\). Here, two filled circles denote that both of the data points are in the same cluster under both C and C′, and two hollow circles signify that under neither C nor C′ is the pair in the same class. These two cases capture the similarities between C and C′. On the other hand, one filled circle and one hollow circle show that the pair has been the member of the same cluster under only one of C and C′, but not the other. Equation 10 and Eq. 11 represent, respectively, Fowlkes and Mallows and Rand’s measures. In these formulas N\(\bullet \bullet\), N\(\circ \circ\), n, k, k′, nk, and nk′, respectively, stand for the number of pairs that fall under the same cluster under both C and C′, the number of pairs that are not the members of the same cluster under neither C nor C′, the number of data points, the number of clusters under C, the number of clusters under C′, the number of data points which are members of cluster k in C, and the number of data points which are member of cluster k′ under C′
Although these measures are capable of comparing two different SOM outputs [27], there is another dimension in the output of SOM more than clustering that these measures do not capture. In a normal clustering, there is not a defined relationship between two clusters; however, if each neuron in SOM is assumed to be a cluster, each cluster has neighboring clusters. In fact, this is the reason why in Step-6 of the SOED procedure (Algorithm 1) XY coordinates are assigned to the members of each cluster. Measures introduced in Eqs. 10 and 11 cannot capture this added dimension. The base of both of these equations is how similar the clustering technique has segmented all pairs of data points into different clusters. The proposed comparison measure, expressly designed for SOM outputs with the same topologies, captures the same similarity while including the neighboring dimension. In Eq. 12, n, α, Loc_C(i), dist(P1, P2) are, respectively, the number of data points, a constant parameters which falls between zero and the maximum distant possible between any two clusters in the SOM output, the location of the cluster which has data point i as a member, and the distant between P1 and P2. Preliminary experiments determine α, so SOMC returns 0 for the input of two random clusterings. Here, α may differ based on the different topology of SOM outputs. α is set to be 1.97 in the case of 7 × 7 SOM output
The comparisons between different SOM made by Keramati and Jafari-Marandi [27] revealed that Rand’s measures (Eq. 11) could not be as distinguishing as Fowlkes and Mallows (Eq. 10). Table 4 presents the Fowlkes and Mallows (FM—Eq. 10) and the proposed measure in Eq. 12 (SOMC) for the output of different SOM settings: Complete Data (CD), Train Data (TD), 10% TD, Benchmark 1 (B1), Benchmark 2 (B2). Complete Data and Train Data stand for the experiments that are the main point of comparison in Table 5. Complete Data (CD) is the SOM setting that uses the complete data (train set, validation set, and test set) to drive the membership vector of the test set. Train Data (TD) only uses the train set and the validation set for the same output. Also, 10% TD only uses 10% of the train set for the same output. Moreover, Benchmark 1 (B1) is a simulation procedure of SOM output which creates a random membership vector of the test set, whereas Benchmark 2 (B2) is an intentional membership vector of the test set in which all the cases are the member of cluster 1. B1 and B2 serve as points of reference for the understanding of the range of the applied measures. FM ranges between 0 and 1: zero being no similarity and one perfect matching. SOMC ranges between − 1 and 1: negative one being negative matching, zero random matching, and one complete matching. Every cell in Table 4 is the average of ten experiments to control for the random nature of SOM. On the other hand, every cell in Table 5 is the value of the SOMC or FM for the presented example.
The comparison between the membership of test set records based on the CD and TD cases shows there is a meaningful similarity. Based on this similarity, we assume the validity of the predicted membership of test set based on the SOM that is trained using train set and validation set. Misclassification cost of the members of the test set is calculated based on the customer revenue calculated for each cluster in MOD.
Appendix 2
Profiling of clusters of customers is valuable because it creates insights on the kind of customers that are in the dividing part of SEOD map. Figure 5a shows these regions within the employed SOED map. N1 to N7 are the clusters residing non-churn cases, and C1 to C5 are the clusters of churn cases whose profiling are worthwhile for SOED’s procedure. It is helpful to look at the average value of each attribute with the help of a color-coded map based on all the clusters’ scaled average. Figure 15 presents these for all the attributes in the dataset.
Appendix 3
See Tables 6, 7, 8, 9, 10, and 11, respectively, representing the 20 validation runs for CS MLP 1–4, CS DT, and CS AdaBoost (Table 12).
Rights and permissions
About this article
Cite this article
Jafari-Marandi, R., Denton, J., Idris, A. et al. Optimum profit-driven churn decision making: innovative artificial neural networks in telecom industry. Neural Comput & Applic 32, 14929–14962 (2020). https://doi.org/10.1007/s00521-020-04850-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-020-04850-6