Abstract
Solving imbalance problems is a challenging tasks in data mining and machine learning. Most classifiers are biased towards the majority class examples when learning from highly imbalanced data. In practice, churn prediction is considered as one of data mining application that reflects imbalance problems. This study investigates how to handle class imbalance in churn prediction using RUSBoost, a combination of random under-sampling and boosting algorithm, which is combined with feature selection for better performance result. The datasets used are broadband internet data collected from a telecommunication industry in Indonesia. The study firstly select the important features using Information Gain, and then building churn prediction model using RUSBoost with C4.5 as the weak learner. The result shows that feature selection and RUSBoost improve 16% of the performance of prediction and reduce 48% of the processing time.
Keywords
References
Chawla, N.V., Japkowicz, N., Kotcz, A.: Editorial: special issue on learning from imbalanced data sets. ACM SIGKDD Explor. Newsl. 6(1), 1–6 (2004)
Weiss, G.M.: Mining with rarity: a unifying framework. ACM SIGKDD Explor. Newsl. 6(1), 7–19 (2004)
Berson, A., Smith, S.J., Thearling, K.: Building Data Mining Applications for CRM. McGraw-Hill Osborne (2000)
Kotsiantis, S., Kanellopoulos, D., Pintelas, P., et al.: Handling imbalanced datasets: a review. GESTS Int. Trans. Comput. Sci. Eng. 30(1), 25–36 (2006)
Tang, J., Alelyani, S., Liu, H.: Feature selection for classification: a review. In: Data Classification: Algorithms and Applications, p. 37 (2014)
Jamali, I., Bazmara, M., Jafari, S.: Feature Selection in Imbalance data sets. Int. J. Comput. Sci. Issues 9(3), 42–45 (2012)
Batista, G.E., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor. Newsl. 6(1), 20–29 (2004)
Drummond, C., Holte, R.C., et al.: C4. 5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling. In: Workshop on Learning from Imbalanced Datasets II, vol. 11 (2003)
Seiffert, C., Khoshgoftaar, T.M., Van Hulse, J., Napolitano, A.: RUSBoost: A hybrid approach to alleviating class imbalance. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 40(1), 185–197 (2010)
Chawla, N.V., Lazarevic, A., Hall, L.O., Bowyer, K.W.: SMOTEBoost: improving prediction of the minority class in boosting. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS (LNAI), vol. 2838, pp. 107–119. Springer, Heidelberg (2003). doi:10.1007/978-3-540-39804-2_12
Effendy, V., Adiwijaya, Z., Baizal, A.: Handling imbalanced data in customer churn prediction using combined sampling and weighted random forest. In: 2014 2nd International Conference on Information and Communication Technology (ICoICT), pp. 325–330 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Dwiyanti, E., Adiwijaya, Ardiyanti, A. (2017). Handling Imbalanced Data in Churn Prediction Using RUSBoost and Feature Selection (Case Study: PT.Telekomunikasi Indonesia Regional 7). In: Herawan, T., Ghazali, R., Nawi, N.M., Deris, M.M. (eds) Recent Advances on Soft Computing and Data Mining. SCDM 2016. Advances in Intelligent Systems and Computing, vol 549. Springer, Cham. https://doi.org/10.1007/978-3-319-51281-5_38
Download citation
DOI: https://doi.org/10.1007/978-3-319-51281-5_38
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-51279-2
Online ISBN: 978-3-319-51281-5
eBook Packages: EngineeringEngineering (R0)