Abstract
The customer churn problem affects hugely the telecommunication services in particular, and businesses in general. Note that in majority of cases the number of potential customer churn is much smaller than the non-churners. Therefore, the imbalance distribution of samples between churners and non-churners is a concern when building a churn prediction model. This paper presents a Local PCA approach to solve imbalance classification problem by generating new churn samples. The experiments were carried out on a large real-world Telecommunication dataset and assessed on a churn prediction task. The experiments showed that the Local PCA along with Smote outperformed Linear regression and Standard PCA data generation techniques.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Au, W., Chan, C.C., Yao, X.: A novel evolutionary data mining algorithm with applications to churn prediction. IEEE Transactions on Evolutionary Computation 7, 532–545 (2003)
Bradley, A.P.: The use of the area under the roc curve in the evaluation of machine learning algorithms. Pattern Recognition 30, 1145–1159 (1997)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kergelmeyer, W.P.: Smote: synthetic minority over-sampling technique. JAIR 16, 321–357 (2002)
Chawla, N.V., Japkowicz, N., Kotcz, A.: Editorial: special issue on learning from imbalanced data sets. SIGKDD Explor. Newsl. 6(1), 1–6 (2004)
Goldberg, D.E.: Genetic Algorithms in Search, Optimization and Machine Learning. Kluwer Academic Publishers, Dordrecht (1989)
Huang, B.Q., Kechadi, M.-T., Buckley, B.: Customer churn prediction for broad-band internet services. In: Pedersen, T.B., Mohania, M.K., Tjoa, A.M. (eds.) DaWaK 2009. LNCS, vol. 5691, pp. 229–243. Springer, Heidelberg (2009)
Jolliffe, I.T.: Principal Components Analysis. Springer, Heidelberg (1986)
Wei, C., Chiu, I.: Turning telecommunications call details to churn prediction: a data mining approach. Expert Systems with Applications 23, 103–112 (2002)
Wilson, D.L.: Asymptotic properties of nearest neighbor rules using edited data. IEEE Transactions on Systems, Man and Communications 2(3), 408–421 (1972)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sato, T., Huang, B.Q., Huang, Y., Kechadi, M.T. (2010). Local PCA Regression for Missing Data Estimation in Telecommunication Dataset. In: Zhang, BT., Orgun, M.A. (eds) PRICAI 2010: Trends in Artificial Intelligence. PRICAI 2010. Lecture Notes in Computer Science(), vol 6230. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15246-7_67
Download citation
DOI: https://doi.org/10.1007/978-3-642-15246-7_67
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15245-0
Online ISBN: 978-3-642-15246-7
eBook Packages: Computer ScienceComputer Science (R0)