Churn prediction in telecom using Random Forest and PSO based data balancing in combination with various feature selection strategies

doi:10.1016/j.compeleceng.2012.09.001

Computers & Electrical Engineering

Volume 38, Issue 6, November 2012, Pages 1808-1819

https://doi.org/10.1016/j.compeleceng.2012.09.001 Get rights and content

Abstract

The telecommunication industry faces fierce competition to retain customers, and therefore requires an efficient churn prediction model to monitor the customer’s churn. Enormous size, high dimensionality and imbalanced nature of telecommunication datasets are main hurdles in attaining the desired performance for churn prediction. In this study, we investigate the significance of a Particle Swarm Optimization (PSO) based undersampling method to handle the imbalance data distribution in collaboration with different feature reduction techniques such as Principle Component Analysis (PCA), Fisher’s ratio, F-score and Minimum Redundancy and Maximum Relevance (mRMR). Whereas Random Forest (RF) and K Nearest Neighbour (KNN) classifiers are employed to evaluate the performance on optimally sampled and reduced features dataset. Prediction performance is evaluated using sensitivity, specificity and Area under the curve (AUC) based measures. Finally, it is observed through simulations that our proposed approach based on PSO, mRMR, and RF termed as Chr-PmRF, performs quite well for predicting churners and therefore can be beneficial for highly competitive telecommunication industry.

Graphical abstract

Highlights

► Telecom industry faces fierce completion to retain customers. ► Enormous size, imbalanced dataset and high dimensionality make churn prediction in telecom a challenging problem. ► Our proposed approach named Chr-PmRF, employs PSO based balancing, mRMR feature reduction and Random Forest as a classifier. ► Chr-PmRF efficiently predicts churners and might be beneficial for highly competitive telecommunication industry.

Introduction

Telecommunication is one of the industries, where customer base plays a significant role in maintaining stable revenues and thus a serious attention is devoted to retain customers. The customers’ appetency to switch over to any other viable network varies for different reasons such as, call quality, more complimentary competitors’ pricing plan, customers’ billing problems, etc. The telecommunication industry always faces threat of financial loss from potential churners therefore, an efficient churn prediction model not only secures the revenues but also provides hints to management for targeting potential churners by reducing the market-relevant shortcomings. Hence, customer relationship management in a telecommunication company desires an efficient churn prediction model for predicting the potential churners.

The efficiency of churn prediction model, based on classification system relies on learning acquired through the available dataset. The appropriately preprocessed dataset helps the classifier to attain the required training level, which ultimately turns into a desirable performance. Telecommunication companies archive data by acquiring a lot of information about customers. Unfortunately, such a data has high dimensionality and imbalanced class distribution. Generally, information regarding demographics, contract nature, billing and payments, call details, services log etc. are maintained that eventually leads to the high dimensionality. Similarly, the number of churners in telecommunication industry is usually far less compared to non-churners and consequently, it results in an imbalanced dataset. This imbalance distribution in the dataset might cause weak learning by a classifier. Therefore, the preprocessing phase essentially requires a proper sampling and feature reduction strategy for accomplishing good learning by the classifier.

Principle Component Analysis (PCA) and Independent Component Analysis (ICA) [1] are mostly used feature selection strategies, which linearly operate to select the useful and discriminating features present in a dataset. PCA is based on data covariance while ICA uses higher order statistics for achieving data independence, along with reducing the dimensionality of the data. Similarly, some well-known sampling techniques are Random Oversampling (ROS) and Random Undersampling (RUS) [2], where instances of the minority class are duplicated and majority class are discarded, respectively. Due to the random selection, involved in duplicating and discarding the data values, these approaches lack consistency and show varying performances. In addition, the RUS can discard some useful instances and ROS can lead to overfitting owing to replication. Similarly, One Sided Selection (OSS) removes the noisy and boundary line majority class instances, but it is slow when used on large datasets for using Tomek Links [3], which are proven costly. Cluster based oversampling identifies rare cases from the dataset and resamples the instances, but considered to be effective [4], [5] for small sized training dataset. Synthetic Minority Oversampling Technique is an intelligent oversampling method, where new minority class samples are added synthetically, but it involves high computational cost [6] and thus is not suitable for large sized dataset. Data Boost-IM [7] is another approach used for sampling, where the predictive occurrences of both minority and majority classes are increased using synthetic data generation, this approach also involves high computation cost and therefore is not appropriate for large sized dataset. Most of the sampling techniques either use random selection for undersampling, which consequently introduces bias, or synthetic generation of minority class samples, which are proven costly. Therefore, an optimized sampling technique can be employed for sampling dataset, which can effectively mitigate the imbalance in data distribution.

Besides the appropriate feature selection and sampling techniques required to handle the imbalanced telecommunication dataset, the classification models are the real tools, which perform the customer churn prediction. Researchers have used Decision Trees [8], [9], [10], Logistic Regression [10], [11], Genetic Programming [12], [26], Neural Network [13], [14], [15], [16], Random Forest [17], Adaboost [19], Naive based algorithms [11] for various classification problems including churn prediction. Some of the techniques have also used nonlinear kernel methods in Support Vector Machines for churn prediction but they suffer from the high dimensionality of a dataset [8]. Other classification models such as SVM [20], [27] and KNN [11], also show deteriorated performances in case of telecommunication churn prediction, because of the imbalanced nature of dataset [11]. Although some approaches, based on ensemble of KNN and logistic regression [18], additive grooves with multiple counts features evaluation [19] and hybrid two phased feature selection [20], have been suggested but the classification models could not achieve the needed performance. These ensemble approaches, primarily curtail the data dimensionality by selecting features and introduce data balancing in the due course, but the classification performance suffers due to the loss of information resulting from application of improper sampling and feature reduction methods.

Realizing the challenges, being faced in customer churn prediction due to large size, high dimensionality and imbalanced nature of the telecommunication dataset, we initially analyzed RUS and PSO based [23] undersampling methods separately. The PSO based undersampling method initially subsamples the dataset and then evaluates each subsample against KNN and Random Forest on the basis of AUC. Once an optimal subsample is selected then PCA, F-score, Fisher’s ratio and mRMR are applied separately and analyzed with RF and KNN classifiers. It is finally observed that our proposed approach based on PSO, mRMR and RF termed as Chr-PmRF provides best results among the other combinations of sampling, feature reduction and classification techniques.

The rest of the manuscript first presents the proposed churn prediction approach in Section 2. Next, Section 3 analyzes the simulated results and gives corresponding discussions. Finally, the conclusions are drawn in Section 4.

Section snippets

Material and methods

The telecommunication datasets generally face the problems of skewed data distribution and high dimensionality. This causes the classification algorithms to perform poorly for customers churn prediction. Therefore, in Chr-PmRF approach, we concentrate in handling these problems. The basic block diagram shown in Fig. 1 highlights various steps involved in Chr-PmRF.

We initially preprocess the dataset in order to handle the problems of missing values and nominal values present in the dataset. RUS

Proposed Chr-PmRF approach

Besides various combinations of sampling, feature selection and classification methodologies employed, we have observed that PSO based undersampling in combination with mRMR based feature selection and RF classifier yields best churn prediction results. Therefore, in what follows, we will focus on this particular combination denoted as Chr-PmRF. Our proposed Chr-PmRF efficiently utilizes a PSO based undersampling method, which not only undersamples the dataset but also optimizes chosen

Results and discussion

The proposed Chr-PmRF approach is validated with the comprehensive experimentation conducted employing various combinations of sampling, feature selection and classification methodologies. The 10 folds cross validation testing is adopted for analyzing the performance attained during the experimentation using AUC, sensitivity and specificity based performance measures.

Conclusions

This work validates the claim as regards classification that appropriate preprocessing and establishing the proper data distribution is vital for classification. The PSO based optimal sampling approach not only undersamples the data but optimizes the samples selection on the basis of AUC measure, to attain better classification performance. The discriminating power of the optimally selected samples is further explored by employing appropriate feature selection strategies. Where mRMR returns a

Acknowledgement

This work is supported by the Higher Education Commission of Pakistan (HEC) as per Award No. 17-5-6 (Ps6-002)/HEC/Sch/2010

Adnan Idris received his M.S. degree in Computer System Engineering from GIK Institute of Engineering Sciences and Technology Topi, Pakistan in 2006. Prior to that he has earned his master degree in software engineering from COMSTATS Institute of I.T, Islamabad in 2002. Further he has 7 years research and teaching experience at university level. Currently he is doing PhD from Pakistan Institute of Eng. & Applied Sciences, Islamabad. His research areas include Customer Churn Prediction, Machine

References (29)

S. Yen et al.
Cluster-based under-sampling approaches for imbalanced data distributions
Expert Syst Appl
(2009)
K.M. Osei-Bryson
Evaluation of decision trees: a multi-criteria approach
J Comput Oper Res
(2004)
A. Khan et al.
Genetic perceptual shaping: utilizing cover image and conceivable attack information using genetic programming
Inform Fusion
(2007)
A. Khan et al.
Machine learning based adaptive watermark decoding in view of an anticipated attack
Pattern Recog
(2008)
A.P. Bradley
The use of the area under the ROC curve in the evaluation of machine learning algorithms
Pattern Recog
(1997)
Vinay V, Cox IJ, Wood K, Milic-Frayling N. A comparison of dimensionality reduction techniques of text retrieval. In:...
N. Japkowicz et al.
The class imbalance problem: a systematic Study
Intell Data Anal
(2002)
Kubat M, Matwin S. Addressing the curse of imbalanced training sets: one-sided selection. In: Proceedings of the 14th...
T. Jo et al.
Class imbalance versus small disjuncts
ACM SIGKDD Explorat Newslett
(2004)
N.V. Chawla et al.
SMOTE: synthetic minority over-sampling technique
J Artif Intell Res
(2002)

H. Guo et al.

Learning from imbalanced data sets with boosting and data generation: the DataBoost-IM approach

ACM SIGKDD Explorat Newslett

(2004)

Guyon I, Lemaire V, Boulle M, Dror G, Vogel D. Analysis of kddcup2009: Fast scoring on a large orange customer...

Huang B Q, Kechadi M-T, Buckley B. Customer churn prediction for broadband internet services. In: Proceedings of the...

J. Haden et al.

Computer assisted customer churn management: state-of-the-art and future trends

Comput Oper Res

(2007)

Cited by (103)

Profit-driven weighted classifier with interpretable ability for customer churn prediction
2024, Omega (United Kingdom)
Customer churn prediction methods aim to identify customers with the highest probability of attrition, improve the effectiveness of customer retention campaigns, and maximize profits. However, previous studies have relied on a single classifier, leading to suboptimal predictive results. To address this issue, we propose a novel profit-driven weighted classifier that integrates a weighted strategy with multiple profit-driven ensemble members. We employ an artificial hummingbird optimization algorithm to determine the optimal weight coefficients of the profit-driven ensemble members based on the expected maximum profit criterion. We then calculate the Shapley additive explanation value to further improve the interpretability of the proposed weighted classifier. We conducted experiments and statistical tests on eight real-world datasets from different industries. The results show that the proposed weighted classifier significantly improves profits compared with comparative classifiers and provides strong interpretability based on the Shapley additive explanation value.
A neural network-based predictive decision model for customer retention in the telecommunication sector
2024, Technological Forecasting and Social Change
Acquiring a new customer is far more expensive than retaining a customer. Hence, customer retention is a key aspect of business for a firm to maintain and improve on its market share and profit. The paper analyses customer retention strategies by employing an artificial neural network-based decision model to a real-life dataset collected from 311 mobile service users in India. Seven linear and non-linear adaptive models are developed using features related to customer dissatisfaction (DSF), customer disloyalty (DLF) and customer churn (CF). Findings of this study suggest that non-linear models are most efficient in predicting customer churn, and both DSF and DLF variables significantly affect the retention strategy. Three groups of customers are discussed in this study in the order of least likelihood of churning to most likelihood. Finally, a priority matrix based on key performance indicators is proposed to help service providers target potential customers to retain.
Optimising customer retention: An AI-driven personalised pricing approach
2024, Computers and Industrial Engineering
Customer churn has become one of the most important challenges that telecom companies have to deal with. Churn cases not only cause revenue losses but also impose extra costs of finding new customers. To overcome this issue, they develop various strategies to retain their customers. In this regard, this study presents an integrated artificial intelligence-based model that can meet the expectations of these companies not only to profile the customer churn, but also to predict a service fee that is likely to be accepted by the customers. The model first identifies the customers who are likely to churn and then offers the customers a personalised service fee that is likely to be acceptable. In this study, the K-nearest neighbours, Decision Tree, Random Forest, and Support Vector Machine methods are adapted as classifiers for churn prediction, and regression models of the same methods are utilised to predict the most optimum “personalised” service fee for potential customer churns. Additionally, to reduce the cost of data collection for companies, the feature selection method is used to determine the most optimal feature subset in churn analysis and service fee prediction. The results show that the Random Forest method is superior to other methods in both churn and price predictions and has resulted in as much as a predictive accuracy of 94% and AUC of 98%. The outcome of this comprehensive analysis using four artificial intelligence methods over three diverse telecom datasets, suggests that the proposed “personalised” pricing model in the telecom sector could prevent the churn and increase the profitability by up to 36%. In addition, the model based on SVM suggests that it is possible to reduce the number of required data to be collected by as much as 20%. As the robustness and generalisation ability of the models has been demonstrated over three diverse data sets, it can be further adapted in different sectors.
LOWESS smoothing and Random Forest based GRU model: A short-term photovoltaic power generation forecasting method
2022, Energy
Accurate prediction of photovoltaic power generation is vital to guarantee smooth operation of power stations and ensure users’ electricity consumption. As a good forecasting tool, Gated Recurrent Unit method has been widely used in different forecasting areas. However, the existing studies ignore the impact of data fluctuations on prediction accuracy, to fill the gaps and enhance prediction accuracy, several different data smoothing techniques are introduced and compared to reduce fluctuations, Random Forest method is used for feature selection, and RepeatVector layer extended by attribute dimensions and TimeDistributed layer with full connectivity are utilized to optimize the Gated Recurrent Unit model. A real-world case from the photovoltaic power plant in Xuhui District, Shanghai, China, is adopted to evaluate the performance of proposed method. The comparing results with Recurrent Neural Networks and Long Short-Term Memory, and the actual data as well, show that the proposed prediction method can effectively improve the prediction accuracy of photovoltaic power generation. We also use the daily and monthly data of The Desert Knowledge Australia Solar Centre in Australia to investigate whether the proposed method is suitable for short-term or medium and long-term prediction. The results indicate that our method is more appropriate for short-term prediction.
Swarm intelligence goal-oriented approach to data-driven innovation in customer churn management
2021, International Journal of Information Management
Citation Excerpt :
rule discovery (Amin et al., 2016; Verbeke et al., 2011), decision trees and random forests (Idris, Rizwan, & Khan, 2012; Höppner, Stripling, Baesens, vanden Broucke, & Verdonck, 2020; Nie et al., 2011), deep neural networks (Mena, De Caigny, Coussement, De Bock, & Lessmann, 2019; De Caigny, Coussement, De Bock, & Lessmann, 2019),
One type of data-driven innovations in management is data-driven decision making. Confronted with a big amount of data external and internal to their organization's managers strive for predictive data analysis that enables insight into the future, but even more for prescriptive ones that use algorithms to prepare recommendations for current and future actions. Most of the decision-making techniques use deterministic machine learning (ML) techniques but unfortunately, they do not take into account the variety and volatility of decision-making situations and do not allow for a more flexible approach, i.e., adjusted to changing environmental conditions or changing management priorities. A way to better adapt ML tools to the needs of decision-makers is to use swarm intelligence ML (SIML) methods that provide a set of alternative solutions that allow matching actions with the current decision-making situation. Thus, applying SIML methods in managerial decision-making is conceptualized as a company capability as it allows for systematic alignment of allocating resources decisions vis-à -vis changing decision-making conditions.
The study focuses on the customer churn management as the area of applying SIML techniques to managerial decision-making. The objectives are twofold: to present the specific features and the role of SIML methods in customer churn management and to test if a modified SIML algorithm may increase the effectiveness of churn-related segmentation and improve decision-making process. The empirical study uses publicly available customer data related to digital markets to test if and how SIML methods facilitate managerial decision-making with regard to customers potentially leaving the company in the context of changing conditions. The research results are discussed with regard to prior studies on applying ML techniques to decision-making and customer churn management studies. We also discuss the place of presented analytical approach in the literature on dynamic capabilities, especially big data-driven capabilities.
DEBOHID: A differential evolution based oversampling approach for highly imbalanced datasets
2021, Expert Systems with Applications
Class distribution of the samples in the dataset is one of the critical factors affecting the classification success. Classifiers trained with imbalanced datasets classify majority class samples more successfully than minority class samples. Oversampling, which is based on increasing the minority class samples, is a frequently used method to overcome the class imbalance. More than two decades, many oversampling methods are presented for the class imbalance problem. Differential Evolution is a metaheuristic algorithm that achieves successful results in a lot of domains. One of the main reasons for this success is that DE has an effective candidate individual generation mechanism. In this work, we propose a novel oversampling method based on a differential evolution algorithm for highly imbalanced datasets, and it is named as DEBOHID (A differential evolution based oversampling approach for highly imbalanced datasets). In order to show the success of DEBOHID, 44 highly imbalanced ratio datasets are used in experiments. The obtained results are compared with nine different state-of-art oversampling methods. In order to show the independence of the experimental results to classifier, Support Vector Machines (SVM), k-Nearest Neighbor (kNN), and Decision Tree (DT) are used as a classifier in the experiments. AUC and G-Mean metrics are used for the performance measurements. The experimental results and statistical analyses have shown the triumph of the DEBOHID.

View all citing articles on Scopus

Muhammad Rizwan has completed his B.S. (CIS) degree from Pakistan Institute of Engineering and Applied sciences, Islamabad. His research interest includes computer programming, Machine Learning and Pattern Recognition.

Asifullah Khan received his M.S. and Ph.D. degrees in Computer Systems Engineering from GIK Institute of Engineering Sciences and Technology Topi, Pakistan, in 2003 and 2006, respectively. He has spent 2-years as Post-Doc Research Fellow at Department of Mechatronics, GIST South Korea. He is currently working as Associate Professor in Department of Computer and Information Sciences at PIEAS. His research areas include Digital Watermarking, Pattern Recognition, Image Processing, Evolutionary Algorithms, Bioinformatics, Machine Learning, and Computational Materials Science.

^☆: Reviews processed and approved for publication by Editor-in-Chief Dr. Manu Malek.

View full text

Churn prediction in telecom using Random Forest and PSO based data balancing in combination with various feature selection strategies☆

Abstract

Graphical abstract

Highlights

Introduction

Section snippets

Material and methods

Proposed Chr-PmRF approach

Results and discussion

Conclusions

Acknowledgement

Expert Syst Appl

J Comput Oper Res

Inform Fusion

Pattern Recog

Pattern Recog

The class imbalance problem: a systematic Study

Intell Data Anal

Class imbalance versus small disjuncts

ACM SIGKDD Explorat Newslett

SMOTE: synthetic minority over-sampling technique

J Artif Intell Res

Learning from imbalanced data sets with boosting and data generation: the DataBoost-IM approach

ACM SIGKDD Explorat Newslett

Computer assisted customer churn management: state-of-the-art and future trends

Comput Oper Res