Abstract
We develop an innovative data preprocessing algorithm for classifying customers using unbalanced time series data. This problem is directly motivated by an application whose aim is to uncover the customers’ churning behavior in the telecommunication industry. We model this problem as a sequential classification problem, and present an effective solution for solving the challenging problem, where the elements in the sequences are of a multi-dimensional nature, the sequences are uneven in length and classes of the data are highly unbalanced. Our solution is to integrate model based clustering and develop an innovative data preprocessing algorithm for the time series data. In this paper, we provide the theory and algorithms for the task, and empirically demonstrate that the method is effective in determining the customer class for CRM applications in the telecommunications industry.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
http://kdd.ics.uci.edu/databases/kddcup98/kddcup98.html (1998)
Agrawal, R., Srikant, R.: Mining sequential patterns. In: Yu, P.S., Chen, A.S.P. (eds.) Eleventh International Conference on Data Engineering, Taipei, Taiwan, pp. 3–14. IEEE Computer Society Press, Los Alamitos (1995)
Borges, J., Levene, M.: Data mining of user navigation patterns. In: Masand, B., Spiliopoulou, M. (eds.) WebKDD 1999. LNCS (LNAI), vol. 1836, pp. 31–36. Springer, Heidelberg (2000)
Cadez, I., Heckerman, D., Meek, C., Smyth, P., White, S.: Visualization of navigation patterns on a web site using model-based clustering. Knowledge Discovery and Data Mining, pp. 280–284 (March 2000)
Domingos, P.: Metacost: A general method for making classifiers cost sensitive. In: Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining, pp. 155–164. AAAI Press, Menlo Park (1999)
Elkan, C.: The foundations of cost-sensitive learning. In: Proceedings of the 17th International Joint Conference on Artificial Intelligence, pp. 973–978 (2001)
Hand, D.J., Till, R.J.: A simple generalisation of the area under the ROC curve for multiple class classification problems. Machine Learning 45, 171–186 (2001)
Levene, M., Loizou, G.: A probabilistic approach to navigation in hypertext. Information Sciences 114(1–4), 165–186 (1999)
Ling, C.X., Huang, J., Zhang, H.: AUC: a statistically consistent and more discriminating measure than accuracy. In: Proceedings of 18th International Conference on Artificial Intelligence (IJCAI–2003), pp. 329–341 (2003)
Ling, C.X., Li, C.: Data mining for direct marketing - specific problems and solutions. In: Proceedings of Fourth International Conference on Knowledge Discovery and Data Mining (KDD–1998), pp. 73–79 (1998)
Smyth, P.: Clustering sequences with hidden markov models. In: Mozer, M.C., Jordan, M.I., Petsche, T. (eds.) Advances in Neural Information Processing Systems, vol. 9, p. 648. The MIT Press, Cambridge (1997)
Wang, K., Zhou, S., Yang, Q., Yeung, J.M.S.: Mining customer value: from association rules to direct marketing. Journal of Data Mining and Knowledge Discovery (2005)
Zadrozny, B., Elkan, C.: Learning and making decisions when costs and prob- abilities are both unknown. In: Proceedings of the seventh ACM SIGKDD inter- national conference on Knowledge discovery and data mining (SIGKDD 2001), San Francisco, CA, USA, pp. 204–213 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yang, Y. et al. (2005). Preprocessing Time Series Data for Classification with Application to CRM. In: Zhang, S., Jarvis, R. (eds) AI 2005: Advances in Artificial Intelligence. AI 2005. Lecture Notes in Computer Science(), vol 3809. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11589990_16
Download citation
DOI: https://doi.org/10.1007/11589990_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-30462-3
Online ISBN: 978-3-540-31652-7
eBook Packages: Computer ScienceComputer Science (R0)