Skip to main content

Advertisement

Log in

Optimum profit-driven churn decision making: innovative artificial neural networks in telecom industry

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Knowledge-based churn prediction and decision making is invaluable for telecom companies due to highly competitive markets. The comprehensiveness and action ability of a data-driven churn prediction system depend on the effective extraction of hidden patterns from the data. Generally, data analytics is employed to extrapolate the extracted patterns from the training dataset to the test set. In this study, one more step is taken; the improved prediction performance is attained by capturing the individuality of each customer while discovering the hidden pattern from the train set and then applying all the knowledge to the test set. The proposed churn prediction system is developed using artificial neural networks that take advantage of both self-organizing and error-driven learning approaches (ChP-SOEDNN). We are introducing a new dimension to the study of churn prediction in telecom industry: a systematic and profit-driven churn decision-making framework. The comparison of the ChP-SOEDNN with other techniques shows its superiority regarding both accuracy and misclassification cost. Misclassification cost is a realistic criterion this article introduces to measure the success of a method in finding the best set of decisions that leads to the minimum possible loss of profit. Moreover, ChP-SOEDNN shows capability in devising a cost-efficient retention strategy for each cluster of customers, in addition to strength in dealing with the typical issue of imbalanced class distribution that is common in churn prediction problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Explore related subjects

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

References

  1. Ahmed U, Khan A, Khan SH, Basit A, Haq IU, Lee YS (2019) Transfer learning and meta classification based deep churn prediction system for telecom industry. Preprint arXiv:1901.06091

  2. Amin A, Al-Obeidat F, Shah B, Adnan A, Loo J, Anwar S (2019) Customer churn prediction in telecommunication industry using data certainty. J Bus Res 94:290–301

    Google Scholar 

  3. Amin A, Anwar S, Adnan A, Nawaz M, Alawfi K, Hussain A, Huang K (2017) Customer churn prediction in the telecommunication sector using a rough set approach. Neurocomputing 237:242–254

    Google Scholar 

  4. Amin A, Anwar S, Adnan A, Nawaz M, Howard N, Qadir J, Hawalah A, Hussain A (2016) Comparing oversampling techniques to handle the class imbalance problem: a customer churn prediction case study. IEEE Access 4:7940–7957

    Google Scholar 

  5. Bahnsen AC, Aouada D, Ottersten B (2015) Example-dependent cost-sensitive decision trees. Expert Syst Appl 42(19):6609–6619

    Google Scholar 

  6. Bahnsen AC, Aouada D, Ottersten B (2015) A novel cost-sensitive framework for customer churn predictive modeling. Decis Anal 2(1):5

    Google Scholar 

  7. Berger PD, Nasr NI (1998) Customer lifetime value: Marketing models and applications. J Interact Mark 12(1):17–30

    Google Scholar 

  8. Bi W, Cai M, Liu M, Li G (2016) A big data clustering algorithm for mitigating the risk of customer churn. IEEE Trans Ind Inf 12(3):1270–1281

    Google Scholar 

  9. Burez J, Van den Poel D (2009) Handling class imbalance in customer churn prediction. Expert Syst Appl 36(3):4626–4636

    Google Scholar 

  10. Chen Z-Y, Fan Z-P (2012) Distributed customer behavior prediction using multiplex data: a collaborative MK-SVM approach. Knowl Based Syst 35:111–119

    Google Scholar 

  11. Chen Z-Y, Fan Z-P, Sun M (2012) A hierarchical multiple kernel support vector machine for customer churn prediction using longitudinal behavioral data. Eur J Oper Res 223(2):461–472

    MathSciNet  MATH  Google Scholar 

  12. De Bock KW, Van den Poel D (2012) Reconciling performance and interpretability in customer churn prediction using ensemble learning based on generalized additive models. Expert Syst Appl 39(8):6816–6826

    Google Scholar 

  13. Ekinci Y, Ülengin F, Uray N, Ülengin B (2014) Analysis of customer lifetime value and marketing expenditure decisions through a Markovian-based model. Eur J Oper Res 237(1):278–288

    MathSciNet  MATH  Google Scholar 

  14. Fader PS, Hardie BG, Lee KL (2005) RFM and CLV: using iso-value curves for customer base analysis. J Mark Res 42(4):415–430

    Google Scholar 

  15. García DL, Nebot À, Vellido A (2017) Intelligent data analysis approaches to churn as a business problem: a survey. Knowl Inf Syst 51(3):719–774

    Google Scholar 

  16. Glady N, Baesens B, Croux C (2009) Modeling churn using customer lifetime value. Eur J Oper Res 197(1):402–411

    MATH  Google Scholar 

  17. Gruca TS, Rego LL (2005) Customer satisfaction, cash flow, and shareholder value. J Mark 69(3):115–130

    Google Scholar 

  18. Gupta S, Lehmann DR, Stuart JA (2004) Valuing customers. J Mark Res 41(1):7–18

    Google Scholar 

  19. Gurney K (2014) Multilayer nets and backpropagation. In: An introduction to neural networks, 1st edn. CRC Press, Boca Raton, pp 41–57

  20. Han S, Yuan B, Liu W (2009) Rare class mining: progress and prospect. In: CCPR 2009. Chinese conference on pattern recognition, 2009. IEEE, New York, pp 1–5

  21. Höppner S, Stripling E, Baesens B, vanden Broucke S, Verdonck T (2018) Profit driven decision trees for churn prediction. Eur J Oper Res 286(3):920–933

    MathSciNet  MATH  Google Scholar 

  22. Huang B, Kechadi MT, Buckley B (2012) Customer churn prediction in telecommunications. Expert Syst Appl 39(1):1414–1425

    Google Scholar 

  23. Huang Y, Kechadi T (2013) An effective hybrid learning system for telecommunication churn prediction. Expert Syst Appl 40(14):5635–5647

    Google Scholar 

  24. Idris A, Khan A, Lee YS (2012) Genetic programming and adaboosting based churn prediction for telecom. In: 2012 IEEE international conference on systems, man, and cybernetics (SMC). IEEE, New York, pp 1328–1332

  25. Idris A, Khan A, Lee YS (2013) Intelligent churn prediction in telecom: employing mRMR feature selection and RotBoost based ensemble classification. Appl Intell 39(3):659–672

    Google Scholar 

  26. Idris A, Rizwan M, Khan A (2012) Churn prediction in telecom using random forest and PSO based data balancing in combination with various feature selection strategies. Comput Electr Eng 38(6):1808–1819

    Google Scholar 

  27. Jafari-Marandi AKR (2014) Webpage clustering—taking the zero step: a case study of an Iranian website. J Web Eng 13(3–4):333–360

    Google Scholar 

  28. Jafari-Marandi R, Davarzani S, Gharibdousti MS, Smith BK (2018) An optimum ANN-based breast cancer diagnosis: bridging gaps between ANN learning and decision-making goals. Appl Soft Comput 72:108–120

    Google Scholar 

  29. Jafari-Marandi R, Khanzadeh M, Smith BK, Bian L (2017) Self-organizing and error driven (SOED) artificial neural network for smarter classifications. J Comput Des Eng 4(4):282–304

    Google Scholar 

  30. Jafari-Marandi R, Khanzadeh M, Tian W, Smith B, Bian L (2019) From in-situ monitoring toward high-throughput process control: cost-driven decision-making framework for laser-based additive manufacturing. J Manufact Syst 51:29–41

    Google Scholar 

  31. Keramati A, Jafari-Marandi R, Aliannejadi M, Ahmadian I, Mozaffari M, Abbasi U (2014) Improved churn prediction in telecommunication industry using data mining techniques. Appl Soft Comput 24:994–1012

    Google Scholar 

  32. Khan A, Sohail A, Ali A (2018) A new channel boosted convolutional neural network using transfer learning. Preprint arXiv:1804.08528

  33. Kohonen T (2013) Essentials of the self-organizing map. Neural Netw 37:52–65

    Google Scholar 

  34. Lee H, Lee Y, Cho H, Im K, Kim YS (2011) Mining churning behaviors and developing retention strategies based on a partial least squares (PLS) model. Decis Support Syst 52(1):207–216

    Google Scholar 

  35. Lemmens A, Gupta S (2017) Managing churn to maximize profits. Working paper

  36. Lemmens A, Gupta S (2017) Managing churn to maximize profits. Available at SSRN 2964906

  37. Liu Y, Zhuang Y (2015) Research model of churn prediction based on customer segmentation and misclassification cost in the context of big data. J Comput Commun 3(06):87

    Google Scholar 

  38. Lu N, Lin H, Lu J, Zhang G (2014) A customer churn prediction model in telecom industry using boosting. IEEE Trans Ind Inf 10(2):1659–1665

    Google Scholar 

  39. Maldonado S, López J, Vairetti C (2019) Profit-based churn prediction based on minimax probability machines. Eur J Oper Res 284(1):273–284

    MathSciNet  MATH  Google Scholar 

  40. Mazurowski MA, Habas PA, Zurada JM, Lo JY, Baker JA, Tourassi GD (2008) Training neural network classifiers for medical decision making: the effects of imbalanced datasets on classification performance. Neural Netw 21(2):427–436

    Google Scholar 

  41. Meilă M (2007) Comparing clusterings—an information based distance. J Multivar Anal 98(5):873–895

    MathSciNet  MATH  Google Scholar 

  42. Organization WH (2010) World health statistics. World Health Organization, New York

  43. Prashanth R, Deepak K, Meher AK (2017) High accuracy predictive modelling for customer churn prediction in telecom industry. In: International conference on machine learning and data mining in pattern recognition. Springer, Berlin, pp 391–402

  44. Reinartz WJ, Kumar V (2003) The impact of customer relationship characteristics on profitable lifetime duration. J Mark 67(1):77–99

    Google Scholar 

  45. Risselada H, Verhoef PC, Bijmolt TH (2010) Staying power of churn prediction models. J Interact Mark 24(3):198–208

    Google Scholar 

  46. Saito T, Rehmsmeier M (2015) The precision–recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS ONE 10(3):e0118432

    Google Scholar 

  47. Sheng VS, Ling CX (2006) Thresholding for making classifiers cost-sensitive. In: AAAI, pp 476–481

  48. Stripling E, vanden Broucke S, Antonio K, Baesens B, Snoeck M (2018) Profit maximizing logistic model for customer churn prediction using genetic algorithms. Swarm Evol Comput 40:116–130

    Google Scholar 

  49. Sun Y, Kamel MS, Wong AK, Wang Y (2007) Cost-sensitive boosting for classification of imbalanced data. Pattern Recogn 40(12):3358–3378

    MATH  Google Scholar 

  50. Tan PN, Steinbach M, Kumar V (2016) Introduction to data mining. Pearson Education, India

    Google Scholar 

  51. Tang L, Thomas L, Fletcher M, Pan J, Marshall A (2014) Assessing the impact of derived behavior information on customer attrition in the financial service industry. Eur J Oper Res 236(2):624–633

    MathSciNet  MATH  Google Scholar 

  52. Ullah I, Raza B, Malik AK, Imran M, Islam SU, Kim SW (2019) A churn prediction model using random forest: analysis of machine learning techniques for churn prediction and factor identification in telecom sector. IEEE Access 7:60134–60149

    Google Scholar 

  53. van Wezel M, Potharst R (2007) Improved customer choice predictions using ensemble methods. Eur J Oper Res 181(1):436–452

    MATH  Google Scholar 

  54. Verbeke W, Dejaeger K, Martens D, Hur J, Baesens B (2012) New insights into churn prediction in the telecommunication sector: a profit driven data mining approach. Eur J Oper Res 218(1):211–229

    Google Scholar 

  55. Verbraken T, Verbeke W, Baesens B (2013) A novel profit maximizing metric for measuring classification performance of customer churn prediction models. IEEE Trans Knowl Data Eng 25(5):961–973

    Google Scholar 

  56. Wei C-P, Chiu I-T (2002) Turning telecommunications call details to churn prediction: a data mining approach. Expert Syst Appl 23(2):103–112

    Google Scholar 

  57. Zhang C, Ni M, Yin H, Qiu K (2018) Developed density peak clustering with support vector data description for access network intrusion detection. IEEE Access 6:46356–46362

    Google Scholar 

  58. Zhou Z-H, Liu X-Y (2006) Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans Knowl Data Eng 18(1):63–77

    MathSciNet  Google Scholar 

  59. Zhu B, Baesens B, vanden Broucke SK, (2017) An empirical comparison of techniques for the class imbalance problem in churn prediction. Inf Sci 408:84–99

    Google Scholar 

  60. Zhu H, Wang X (2017) A cost-sensitive semi-supervised learning model based on uncertainty. Neurocomputing 251:106–114

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ruholla Jafari-Marandi.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Video 1 (MP4 74563 kb)

Appendices

Appendix 1

Accepting the predicted membership of test set based on the SOM that is trained using train set and validation set as the test set membership is an important assumption. Therefore, it has been tested by comparing the similarity between the membership of test set cases when SOM learns with all customers in the dataset and when SOM learns only using train and validation sets.

In the literature, there are measures for comparing the similarities of two sets of clusterings. Clustering, unlike classification, does not have the luxury of class labels and this fact creates challenges to compare and evaluate the performance of different methods. There are two proposed approaches for comparing two sets of clusters (C and C′) on the same set of data points: counting pairs and set matching [41]. We only introduce two common measures to compare clusters through counting pairs. When comparing C and C′, each pair of data points falls under one of four cases based on their memberships in C and C′. These four cases are denoted by \(\bullet \bullet\), \(\circ \circ\), \(\bullet \circ\), \(\circ \bullet\). Here, two filled circles denote that both of the data points are in the same cluster under both C and C′, and two hollow circles signify that under neither C nor C′ is the pair in the same class. These two cases capture the similarities between C and C′. On the other hand, one filled circle and one hollow circle show that the pair has been the member of the same cluster under only one of C and C′, but not the other. Equation 10 and Eq. 11 represent, respectively, Fowlkes and Mallows and Rand’s measures. In these formulas N\(\bullet \bullet\), N\(\circ \circ\), n, k, k′, nk, and nk′, respectively, stand for the number of pairs that fall under the same cluster under both C and C′, the number of pairs that are not the members of the same cluster under neither C nor C′, the number of data points, the number of clusters under C, the number of clusters under C′, the number of data points which are members of cluster k in C, and the number of data points which are member of cluster k′ under C

$$F\left( {C,C^{\prime } } \right) = \sqrt {\frac{{N_{ \bullet \bullet } }}{{\mathop \sum \nolimits_{k} n_{k} \left( {n_{k} - 1} \right)/2}} \times \frac{{N_{ \bullet \bullet } }}{{\mathop \sum \nolimits_{{k^{\prime }}} n_{{k^{\prime } }} \left( {n_{{k^{\prime } }} - 1} \right)/2}}}$$
(10)
$$R\left( {C,C^{{^{\prime } }} } \right) = \frac{{N_{ \bullet \bullet } + N_{ \circ \circ } }}{{n\left( {n - 1} \right)/2}}.$$
(11)

Although these measures are capable of comparing two different SOM outputs [27], there is another dimension in the output of SOM more than clustering that these measures do not capture. In a normal clustering, there is not a defined relationship between two clusters; however, if each neuron in SOM is assumed to be a cluster, each cluster has neighboring clusters. In fact, this is the reason why in Step-6 of the SOED procedure (Algorithm 1) XY coordinates are assigned to the members of each cluster. Measures introduced in Eqs. 10 and 11 cannot capture this added dimension. The base of both of these equations is how similar the clustering technique has segmented all pairs of data points into different clusters. The proposed comparison measure, expressly designed for SOM outputs with the same topologies, captures the same similarity while including the neighboring dimension. In Eq. 12, n, α, Loc_C(i), dist(P1, P2) are, respectively, the number of data points, a constant parameters which falls between zero and the maximum distant possible between any two clusters in the SOM output, the location of the cluster which has data point i as a member, and the distant between P1 and P2. Preliminary experiments determine α, so SOMC returns 0 for the input of two random clusterings. Here, α may differ based on the different topology of SOM outputs. α is set to be 1.97 in the case of 7 × 7 SOM output

$${\text{SOMC}}\left( {O,O^{{^{{\prime }} }} } \right) = \frac{{\mathop \sum \nolimits_{i = j}^{n} \mathop \sum \nolimits_{j = 1}^{n} \left( {\alpha - {\text{dist}}\left( {{\text{Loc}}\_C\left( i \right),{\text{Loc}}\_C\left( j \right)} \right)} \right)}}{{\alpha \times n \times \left( {n - 1} \right)/2}}.$$
(12)

The comparisons between different SOM made by Keramati and Jafari-Marandi [27] revealed that Rand’s measures (Eq. 11) could not be as distinguishing as Fowlkes and Mallows (Eq. 10). Table 4 presents the Fowlkes and Mallows (FM—Eq. 10) and the proposed measure in Eq. 12 (SOMC) for the output of different SOM settings: Complete Data (CD), Train Data (TD), 10% TD, Benchmark 1 (B1), Benchmark 2 (B2). Complete Data and Train Data stand for the experiments that are the main point of comparison in Table 5. Complete Data (CD) is the SOM setting that uses the complete data (train set, validation set, and test set) to drive the membership vector of the test set. Train Data (TD) only uses the train set and the validation set for the same output. Also, 10% TD only uses 10% of the train set for the same output. Moreover, Benchmark 1 (B1) is a simulation procedure of SOM output which creates a random membership vector of the test set, whereas Benchmark 2 (B2) is an intentional membership vector of the test set in which all the cases are the member of cluster 1. B1 and B2 serve as points of reference for the understanding of the range of the applied measures. FM ranges between 0 and 1: zero being no similarity and one perfect matching. SOMC ranges between − 1 and 1: negative one being negative matching, zero random matching, and one complete matching. Every cell in Table 4 is the average of ten experiments to control for the random nature of SOM. On the other hand, every cell in Table 5 is the value of the SOMC or FM for the presented example.

Table 5 SOMC, and Fowlkes and Mallows (FM) measures for different SOM outputs

The comparison between the membership of test set records based on the CD and TD cases shows there is a meaningful similarity. Based on this similarity, we assume the validity of the predicted membership of test set based on the SOM that is trained using train set and validation set. Misclassification cost of the members of the test set is calculated based on the customer revenue calculated for each cluster in MOD.

Appendix 2

Profiling of clusters of customers is valuable because it creates insights on the kind of customers that are in the dividing part of SEOD map. Figure 5a shows these regions within the employed SOED map. N1 to N7 are the clusters residing non-churn cases, and C1 to C5 are the clusters of churn cases whose profiling are worthwhile for SOED’s procedure. It is helpful to look at the average value of each attribute with the help of a color-coded map based on all the clusters’ scaled average. Figure 15 presents these for all the attributes in the dataset.

Fig. 15
figure 15

SOED colored map for all the dataset attributes

Appendix 3

See Tables 678910, and 11, respectively, representing the 20 validation runs for CS MLP 1–4, CS DT, and CS AdaBoost (Table 12).

Table 6 SOM hit rate examples for Table 4 experiments
Table 7 CS MLP 1: 20 validation runs
Table 8 CS MLP 2: 20 validation runs
Table 9 CS MLP 3: 20 validation runs
Table 10 CS MLP 4: 20 validation runs
Table 11 CS DT: 20 validation runs
Table 12 CS AdaBoost: 20 validation runs

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jafari-Marandi, R., Denton, J., Idris, A. et al. Optimum profit-driven churn decision making: innovative artificial neural networks in telecom industry. Neural Comput & Applic 32, 14929–14962 (2020). https://doi.org/10.1007/s00521-020-04850-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-020-04850-6

Keywords