Optimum profit-driven churn decision making: innovative artificial neural networks in telecom industry

Jafari-Marandi, Ruholla; Denton, Joshua; Idris, Adnan; Smith, Brian K.; Keramati, Abbas

doi:10.1007/s00521-020-04850-6

Optimum profit-driven churn decision making: innovative artificial neural networks in telecom industry

Original Article
Published: 09 April 2020

Volume 32, pages 14929–14962, (2020)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Ruholla Jafari-Marandi¹,
Joshua Denton²,
Adnan Idris³,
Brian K. Smith⁴ &
…
Abbas Keramati^5,6

1528 Accesses
1 Altmetric
Explore all metrics

Abstract

Knowledge-based churn prediction and decision making is invaluable for telecom companies due to highly competitive markets. The comprehensiveness and action ability of a data-driven churn prediction system depend on the effective extraction of hidden patterns from the data. Generally, data analytics is employed to extrapolate the extracted patterns from the training dataset to the test set. In this study, one more step is taken; the improved prediction performance is attained by capturing the individuality of each customer while discovering the hidden pattern from the train set and then applying all the knowledge to the test set. The proposed churn prediction system is developed using artificial neural networks that take advantage of both self-organizing and error-driven learning approaches (ChP-SOEDNN). We are introducing a new dimension to the study of churn prediction in telecom industry: a systematic and profit-driven churn decision-making framework. The comparison of the ChP-SOEDNN with other techniques shows its superiority regarding both accuracy and misclassification cost. Misclassification cost is a realistic criterion this article introduces to measure the success of a method in finding the best set of decisions that leads to the minimum possible loss of profit. Moreover, ChP-SOEDNN shows capability in devising a cost-efficient retention strategy for each cluster of customers, in addition to strength in dealing with the typical issue of imbalanced class distribution that is common in churn prediction problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Research Trends in Customer Churn Prediction: A Data Mining Approach

Customer Churn Analysis Using Machine Learning

Inferring Machine Learning Based Parameter Estimation for Telecom Churn Prediction

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

References

Ahmed U, Khan A, Khan SH, Basit A, Haq IU, Lee YS (2019) Transfer learning and meta classification based deep churn prediction system for telecom industry. Preprint arXiv:1901.06091
Amin A, Al-Obeidat F, Shah B, Adnan A, Loo J, Anwar S (2019) Customer churn prediction in telecommunication industry using data certainty. J Bus Res 94:290–301
Google Scholar
Amin A, Anwar S, Adnan A, Nawaz M, Alawfi K, Hussain A, Huang K (2017) Customer churn prediction in the telecommunication sector using a rough set approach. Neurocomputing 237:242–254
Google Scholar
Amin A, Anwar S, Adnan A, Nawaz M, Howard N, Qadir J, Hawalah A, Hussain A (2016) Comparing oversampling techniques to handle the class imbalance problem: a customer churn prediction case study. IEEE Access 4:7940–7957
Google Scholar
Bahnsen AC, Aouada D, Ottersten B (2015) Example-dependent cost-sensitive decision trees. Expert Syst Appl 42(19):6609–6619
Google Scholar
Bahnsen AC, Aouada D, Ottersten B (2015) A novel cost-sensitive framework for customer churn predictive modeling. Decis Anal 2(1):5
Google Scholar
Berger PD, Nasr NI (1998) Customer lifetime value: Marketing models and applications. J Interact Mark 12(1):17–30
Google Scholar
Bi W, Cai M, Liu M, Li G (2016) A big data clustering algorithm for mitigating the risk of customer churn. IEEE Trans Ind Inf 12(3):1270–1281
Google Scholar
Burez J, Van den Poel D (2009) Handling class imbalance in customer churn prediction. Expert Syst Appl 36(3):4626–4636
Google Scholar
Chen Z-Y, Fan Z-P (2012) Distributed customer behavior prediction using multiplex data: a collaborative MK-SVM approach. Knowl Based Syst 35:111–119
Google Scholar
Chen Z-Y, Fan Z-P, Sun M (2012) A hierarchical multiple kernel support vector machine for customer churn prediction using longitudinal behavioral data. Eur J Oper Res 223(2):461–472
MathSciNet MATH Google Scholar
De Bock KW, Van den Poel D (2012) Reconciling performance and interpretability in customer churn prediction using ensemble learning based on generalized additive models. Expert Syst Appl 39(8):6816–6826
Google Scholar
Ekinci Y, Ülengin F, Uray N, Ülengin B (2014) Analysis of customer lifetime value and marketing expenditure decisions through a Markovian-based model. Eur J Oper Res 237(1):278–288
MathSciNet MATH Google Scholar
Fader PS, Hardie BG, Lee KL (2005) RFM and CLV: using iso-value curves for customer base analysis. J Mark Res 42(4):415–430
Google Scholar
García DL, Nebot À, Vellido A (2017) Intelligent data analysis approaches to churn as a business problem: a survey. Knowl Inf Syst 51(3):719–774
Google Scholar
Glady N, Baesens B, Croux C (2009) Modeling churn using customer lifetime value. Eur J Oper Res 197(1):402–411
MATH Google Scholar
Gruca TS, Rego LL (2005) Customer satisfaction, cash flow, and shareholder value. J Mark 69(3):115–130
Google Scholar
Gupta S, Lehmann DR, Stuart JA (2004) Valuing customers. J Mark Res 41(1):7–18
Google Scholar
Gurney K (2014) Multilayer nets and backpropagation. In: An introduction to neural networks, 1st edn. CRC Press, Boca Raton, pp 41–57
Han S, Yuan B, Liu W (2009) Rare class mining: progress and prospect. In: CCPR 2009. Chinese conference on pattern recognition, 2009. IEEE, New York, pp 1–5
Höppner S, Stripling E, Baesens B, vanden Broucke S, Verdonck T (2018) Profit driven decision trees for churn prediction. Eur J Oper Res 286(3):920–933
MathSciNet MATH Google Scholar
Huang B, Kechadi MT, Buckley B (2012) Customer churn prediction in telecommunications. Expert Syst Appl 39(1):1414–1425
Google Scholar
Huang Y, Kechadi T (2013) An effective hybrid learning system for telecommunication churn prediction. Expert Syst Appl 40(14):5635–5647
Google Scholar
Idris A, Khan A, Lee YS (2012) Genetic programming and adaboosting based churn prediction for telecom. In: 2012 IEEE international conference on systems, man, and cybernetics (SMC). IEEE, New York, pp 1328–1332
Idris A, Khan A, Lee YS (2013) Intelligent churn prediction in telecom: employing mRMR feature selection and RotBoost based ensemble classification. Appl Intell 39(3):659–672
Google Scholar
Idris A, Rizwan M, Khan A (2012) Churn prediction in telecom using random forest and PSO based data balancing in combination with various feature selection strategies. Comput Electr Eng 38(6):1808–1819
Google Scholar
Jafari-Marandi AKR (2014) Webpage clustering—taking the zero step: a case study of an Iranian website. J Web Eng 13(3–4):333–360
Google Scholar
Jafari-Marandi R, Davarzani S, Gharibdousti MS, Smith BK (2018) An optimum ANN-based breast cancer diagnosis: bridging gaps between ANN learning and decision-making goals. Appl Soft Comput 72:108–120
Google Scholar
Jafari-Marandi R, Khanzadeh M, Smith BK, Bian L (2017) Self-organizing and error driven (SOED) artificial neural network for smarter classifications. J Comput Des Eng 4(4):282–304
Google Scholar
Jafari-Marandi R, Khanzadeh M, Tian W, Smith B, Bian L (2019) From in-situ monitoring toward high-throughput process control: cost-driven decision-making framework for laser-based additive manufacturing. J Manufact Syst 51:29–41
Google Scholar
Keramati A, Jafari-Marandi R, Aliannejadi M, Ahmadian I, Mozaffari M, Abbasi U (2014) Improved churn prediction in telecommunication industry using data mining techniques. Appl Soft Comput 24:994–1012
Google Scholar
Khan A, Sohail A, Ali A (2018) A new channel boosted convolutional neural network using transfer learning. Preprint arXiv:1804.08528
Kohonen T (2013) Essentials of the self-organizing map. Neural Netw 37:52–65
Google Scholar
Lee H, Lee Y, Cho H, Im K, Kim YS (2011) Mining churning behaviors and developing retention strategies based on a partial least squares (PLS) model. Decis Support Syst 52(1):207–216
Google Scholar
Lemmens A, Gupta S (2017) Managing churn to maximize profits. Working paper
Lemmens A, Gupta S (2017) Managing churn to maximize profits. Available at SSRN 2964906
Liu Y, Zhuang Y (2015) Research model of churn prediction based on customer segmentation and misclassification cost in the context of big data. J Comput Commun 3(06):87
Google Scholar
Lu N, Lin H, Lu J, Zhang G (2014) A customer churn prediction model in telecom industry using boosting. IEEE Trans Ind Inf 10(2):1659–1665
Google Scholar
Maldonado S, López J, Vairetti C (2019) Profit-based churn prediction based on minimax probability machines. Eur J Oper Res 284(1):273–284
MathSciNet MATH Google Scholar
Mazurowski MA, Habas PA, Zurada JM, Lo JY, Baker JA, Tourassi GD (2008) Training neural network classifiers for medical decision making: the effects of imbalanced datasets on classification performance. Neural Netw 21(2):427–436
Google Scholar
Meilă M (2007) Comparing clusterings—an information based distance. J Multivar Anal 98(5):873–895
MathSciNet MATH Google Scholar
Organization WH (2010) World health statistics. World Health Organization, New York
Prashanth R, Deepak K, Meher AK (2017) High accuracy predictive modelling for customer churn prediction in telecom industry. In: International conference on machine learning and data mining in pattern recognition. Springer, Berlin, pp 391–402
Reinartz WJ, Kumar V (2003) The impact of customer relationship characteristics on profitable lifetime duration. J Mark 67(1):77–99
Google Scholar
Risselada H, Verhoef PC, Bijmolt TH (2010) Staying power of churn prediction models. J Interact Mark 24(3):198–208
Google Scholar
Saito T, Rehmsmeier M (2015) The precision–recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS ONE 10(3):e0118432
Google Scholar
Sheng VS, Ling CX (2006) Thresholding for making classifiers cost-sensitive. In: AAAI, pp 476–481
Stripling E, vanden Broucke S, Antonio K, Baesens B, Snoeck M (2018) Profit maximizing logistic model for customer churn prediction using genetic algorithms. Swarm Evol Comput 40:116–130
Google Scholar
Sun Y, Kamel MS, Wong AK, Wang Y (2007) Cost-sensitive boosting for classification of imbalanced data. Pattern Recogn 40(12):3358–3378
MATH Google Scholar
Tan PN, Steinbach M, Kumar V (2016) Introduction to data mining. Pearson Education, India
Google Scholar
Tang L, Thomas L, Fletcher M, Pan J, Marshall A (2014) Assessing the impact of derived behavior information on customer attrition in the financial service industry. Eur J Oper Res 236(2):624–633
MathSciNet MATH Google Scholar
Ullah I, Raza B, Malik AK, Imran M, Islam SU, Kim SW (2019) A churn prediction model using random forest: analysis of machine learning techniques for churn prediction and factor identification in telecom sector. IEEE Access 7:60134–60149
Google Scholar
van Wezel M, Potharst R (2007) Improved customer choice predictions using ensemble methods. Eur J Oper Res 181(1):436–452
MATH Google Scholar
Verbeke W, Dejaeger K, Martens D, Hur J, Baesens B (2012) New insights into churn prediction in the telecommunication sector: a profit driven data mining approach. Eur J Oper Res 218(1):211–229
Google Scholar
Verbraken T, Verbeke W, Baesens B (2013) A novel profit maximizing metric for measuring classification performance of customer churn prediction models. IEEE Trans Knowl Data Eng 25(5):961–973
Google Scholar
Wei C-P, Chiu I-T (2002) Turning telecommunications call details to churn prediction: a data mining approach. Expert Syst Appl 23(2):103–112
Google Scholar
Zhang C, Ni M, Yin H, Qiu K (2018) Developed density peak clustering with support vector data description for access network intrusion detection. IEEE Access 6:46356–46362
Google Scholar
Zhou Z-H, Liu X-Y (2006) Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans Knowl Data Eng 18(1):63–77
MathSciNet Google Scholar
Zhu B, Baesens B, vanden Broucke SK, (2017) An empirical comparison of techniques for the class imbalance problem in churn prediction. Inf Sci 408:84–99
Google Scholar
Zhu H, Wang X (2017) A cost-sensitive semi-supervised learning model based on uncertainty. Neurocomputing 251:106–114
Google Scholar

Download references

Author information

Authors and Affiliations

Industrial and Manufacturing Engineering Department, Cal Poly, San Luis Obispo, CA, 93407, USA
Ruholla Jafari-Marandi
Department of Marketing, Quantitative Analysis, and Business Law, Mississippi State University, Starkville, MS, 39759, USA
Joshua Denton
Department of Computer Sciences and IT, University of Poonch, Rawalakot, Pakistan
Adnan Idris
Department of Industrial and Systems Engineering, Mississippi State University, Starkville, MS, 39759, USA
Brian K. Smith
Ted Rogers School of Information Technology Management, Ryerson University, Toronto, ON, Canada
Abbas Keramati
School of Industrial and Systems Engineering, University of Tehran, Tehran, Iran
Abbas Keramati

Authors

Ruholla Jafari-Marandi
View author publications
You can also search for this author inPubMed Google Scholar
Joshua Denton
View author publications
You can also search for this author inPubMed Google Scholar
Adnan Idris
View author publications
You can also search for this author inPubMed Google Scholar
Brian K. Smith
View author publications
You can also search for this author inPubMed Google Scholar
Abbas Keramati
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Ruholla Jafari-Marandi.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Video 1 (MP4 74563 kb)

Appendices

Appendix 1

Accepting the predicted membership of test set based on the SOM that is trained using train set and validation set as the test set membership is an important assumption. Therefore, it has been tested by comparing the similarity between the membership of test set cases when SOM learns with all customers in the dataset and when SOM learns only using train and validation sets.

In the literature, there are measures for comparing the similarities of two sets of clusterings. Clustering, unlike classification, does not have the luxury of class labels and this fact creates challenges to compare and evaluate the performance of different methods. There are two proposed approaches for comparing two sets of clusters (C and C′) on the same set of data points: counting pairs and set matching [41]. We only introduce two common measures to compare clusters through counting pairs. When comparing C and C′, each pair of data points falls under one of four cases based on their memberships in C and C′. These four cases are denoted by $\bullet \bullet$, $\circ \circ$, $\bullet \circ$, $\circ \bullet$. Here, two filled circles denote that both of the data points are in the same cluster under both C and C′, and two hollow circles signify that under neither C nor C′ is the pair in the same class. These two cases capture the similarities between C and C′. On the other hand, one filled circle and one hollow circle show that the pair has been the member of the same cluster under only one of C and C′, but not the other. Equation 10 and Eq. 11 represent, respectively, Fowlkes and Mallows and Rand’s measures. In these formulas N$\bullet \bullet$, N$\circ \circ$, n, k, k′, nk, and nk′, respectively, stand for the number of pairs that fall under the same cluster under both C and C′, the number of pairs that are not the members of the same cluster under neither C nor C′, the number of data points, the number of clusters under C, the number of clusters under C′, the number of data points which are members of cluster k in C, and the number of data points which are member of cluster k′ under C′

$$F\left( {C,C^{\prime } } \right) = \sqrt {\frac{{N_{ \bullet \bullet } }}{{\mathop \sum \nolimits_{k} n_{k} \left( {n_{k} - 1} \right)/2}} \times \frac{{N_{ \bullet \bullet } }}{{\mathop \sum \nolimits_{{k^{\prime }}} n_{{k^{\prime } }} \left( {n_{{k^{\prime } }} - 1} \right)/2}}}$$

(10)

$$R\left( {C,C^{{^{\prime } }} } \right) = \frac{{N_{ \bullet \bullet } + N_{ \circ \circ } }}{{n\left( {n - 1} \right)/2}}.$$

(11)

Although these measures are capable of comparing two different SOM outputs [27], there is another dimension in the output of SOM more than clustering that these measures do not capture. In a normal clustering, there is not a defined relationship between two clusters; however, if each neuron in SOM is assumed to be a cluster, each cluster has neighboring clusters. In fact, this is the reason why in Step-6 of the SOED procedure (Algorithm 1) XY coordinates are assigned to the members of each cluster. Measures introduced in Eqs. 10 and 11 cannot capture this added dimension. The base of both of these equations is how similar the clustering technique has segmented all pairs of data points into different clusters. The proposed comparison measure, expressly designed for SOM outputs with the same topologies, captures the same similarity while including the neighboring dimension. In Eq. 12, n, α, Loc_C(i), dist(P1, P2) are, respectively, the number of data points, a constant parameters which falls between zero and the maximum distant possible between any two clusters in the SOM output, the location of the cluster which has data point i as a member, and the distant between P1 and P2. Preliminary experiments determine α, so SOMC returns 0 for the input of two random clusterings. Here, α may differ based on the different topology of SOM outputs. α is set to be 1.97 in the case of 7 × 7 SOM output

$${\text{SOMC}}\left( {O,O^{{^{{\prime }} }} } \right) = \frac{{\mathop \sum \nolimits_{i = j}^{n} \mathop \sum \nolimits_{j = 1}^{n} \left( {\alpha - {\text{dist}}\left( {{\text{Loc}}\_C\left( i \right),{\text{Loc}}\_C\left( j \right)} \right)} \right)}}{{\alpha \times n \times \left( {n - 1} \right)/2}}.$$

(12)

The comparisons between different SOM made by Keramati and Jafari-Marandi [27] revealed that Rand’s measures (Eq. 11) could not be as distinguishing as Fowlkes and Mallows (Eq. 10). Table 4 presents the Fowlkes and Mallows (FM—Eq. 10) and the proposed measure in Eq. 12 (SOMC) for the output of different SOM settings: Complete Data (CD), Train Data (TD), 10% TD, Benchmark 1 (B1), Benchmark 2 (B2). Complete Data and Train Data stand for the experiments that are the main point of comparison in Table 5. Complete Data (CD) is the SOM setting that uses the complete data (train set, validation set, and test set) to drive the membership vector of the test set. Train Data (TD) only uses the train set and the validation set for the same output. Also, 10% TD only uses 10% of the train set for the same output. Moreover, Benchmark 1 (B1) is a simulation procedure of SOM output which creates a random membership vector of the test set, whereas Benchmark 2 (B2) is an intentional membership vector of the test set in which all the cases are the member of cluster 1. B1 and B2 serve as points of reference for the understanding of the range of the applied measures. FM ranges between 0 and 1: zero being no similarity and one perfect matching. SOMC ranges between − 1 and 1: negative one being negative matching, zero random matching, and one complete matching. Every cell in Table 4 is the average of ten experiments to control for the random nature of SOM. On the other hand, every cell in Table 5 is the value of the SOMC or FM for the presented example.

Table 5 SOMC, and Fowlkes and Mallows (FM) measures for different SOM outputs

Full size table

The comparison between the membership of test set records based on the CD and TD cases shows there is a meaningful similarity. Based on this similarity, we assume the validity of the predicted membership of test set based on the SOM that is trained using train set and validation set. Misclassification cost of the members of the test set is calculated based on the customer revenue calculated for each cluster in MOD.

Appendix 2

Profiling of clusters of customers is valuable because it creates insights on the kind of customers that are in the dividing part of SEOD map. Figure 5a shows these regions within the employed SOED map. N1 to N7 are the clusters residing non-churn cases, and C1 to C5 are the clusters of churn cases whose profiling are worthwhile for SOED’s procedure. It is helpful to look at the average value of each attribute with the help of a color-coded map based on all the clusters’ scaled average. Figure 15 presents these for all the attributes in the dataset.

Appendix 3

See Tables 6, 7, 8, 9, 10, and 11, respectively, representing the 20 validation runs for CS MLP 1–4, CS DT, and CS AdaBoost (Table 12).

Table 6 SOM hit rate examples for Table 4 experiments

Full size table

Table 7 CS MLP 1: 20 validation runs

Full size table

Table 8 CS MLP 2: 20 validation runs

Full size table

Table 9 CS MLP 3: 20 validation runs

Full size table

Table 10 CS MLP 4: 20 validation runs

Full size table

Table 11 CS DT: 20 validation runs

Full size table

Table 12 CS AdaBoost: 20 validation runs

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jafari-Marandi, R., Denton, J., Idris, A. et al. Optimum profit-driven churn decision making: innovative artificial neural networks in telecom industry. Neural Comput & Applic 32, 14929–14962 (2020). https://doi.org/10.1007/s00521-020-04850-6

Download citation

Received: 11 March 2019
Accepted: 14 March 2020
Published: 09 April 2020
Issue Date: September 2020
DOI: https://doi.org/10.1007/s00521-020-04850-6

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimum profit-driven churn decision making: innovative artificial neural networks in telecom industry

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Research Trends in Customer Churn Prediction: A Data Mining Approach

Customer Churn Analysis Using Machine Learning

Inferring Machine Learning Based Parameter Estimation for Telecom Churn Prediction

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Electronic supplementary material

Appendices

Appendix 1

Appendix 2

Appendix 3

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now