Abstract
Robust clustering on categorical sequences remains an open and challenging task due to the noise data and lack of an inherently meaningful measure of pairwise similarity between sequences. In this paper, a self-representation model is proposed as a representation of categorical sequences. Based on the model, we transform the robust clustering to a subspace clustering problem. Furthermore, an efficient algorithm for robust clustering of categorical sequences is also proposed, which provides the new measure with high-quality clustering results and the elimination of noise sequences using the subspace method. The experimental results on the synthetic and real world data demonstrate the promising performance of the proposed method.
Supported by the National Natural Science Foundation of China under Grant 61672157; Innovative Research Team of Probability and Statistics: Theory and Application (IRTL1704).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Xing, Z., Pei, J., Keogh, E.: A brief survey on sequence classification. ACM SIGKDD Explor. Newsl. 12, 40–48 (2010)
Xiong, T., Wang, S., Jiang, Q., Huang, J.Z: A new Markov model for clustering categorical sequences. In: IEEE International Conference on Data Mining, pp. 854–863 (2011)
Kondrak, G.: N-Gram similarity and distance. In: International Conference on String Processing and Information Retrieval, pp. 115–126 (2005)
Kelil, A., Wang, S.: SCS: a new similarity measure for categorical sequences. In: Eighth IEEE International Conference on Data Mining, pp. 343–352 (2008)
Blasiak, S., Rangwala, H.: A hidden Markov model variant for sequence classification. In: IJCAI 2011, Proceedings of the International Joint Conference on Artificial Intelligence, pp. 1192–1197. Barcelona, Catalonia, Spain (July 2012)
Smyth, P.: Clustering sequences with hidden Markov models. Nips 9, 648–654 (1997)
Guo, G., Chen, L., Ye, Y., Jiang, Q.: Cluster validation method for determining the number of clusters in categorical sequences. IEEE Trans. Neural Netw. Learn. Syst. 28, 2936–2948 (2017)
Yao, S.: A robust hidden Markov model based clustering algorithm. In: Information Technology and Artificial Intelligence Conference, pp. 259–264 (2011)
Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Read. Speech Recognit. 77, 267–296 (1990)
Smola, A., Gretton, A., Song, L.: A Hilbert space embedding for distributions. In: International Conference on Algorithmic Learning Theory, pp. 13–31 (2007)
Song, L.: Kernel embeddings of conditional distributions. IEEE Signal Process. Mag. 30, 98–111 (2013)
Schlkopf, B., Smola, A.: Learning with Kernels: support vector machines. Regularization, optimization, and beyond, publications of the American statistical association 98, 489–489 (2002)
Fukumizu, K.: Kernel measures of conditional dependence. In: Conference on Neural Information Processing Systems, pp. 167–204. Vancouver, British Columbia, Canada (December 2007)
Elhamifar, E., Vidal, R.: Sparse subspace clustering: algorithm, theory, and applications. IEEE Trans. Pattern Anal. Mach. Intell. 35, 2765–2781 (2012)
Bezdek, J.C.: A convergence theorem for the fuzzy ISODATA clustering algorithms. IEEE Trans. Pattern Anal. Mach. Intell. 2, 1–8 (1980)
Jordan, M., Xu, L.: On convergence properties of the EM algorithm for Gaussian mixtures. Neural Comput. 8, 129–151 (1995)
Loiselle, S., Rouat, J., Pressnitzer, D., Thorpe, S.: Exploration of rank order coding with spiking neural networks for speech recognition. In: 2005 IEEE International Joint Conference on Neural Networks, IJCNN 2005. Proceedings, vol. 2074, pp. 2076–2080 (2005)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Xu, K., Chen, L., Wang, S., Wang, B. (2018). A Self-representation Model for Robust Clustering of Categorical Sequences. In: U, L., Xie, H. (eds) Web and Big Data. APWeb-WAIM 2018. Lecture Notes in Computer Science(), vol 11268. Springer, Cham. https://doi.org/10.1007/978-3-030-01298-4_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-01298-4_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01297-7
Online ISBN: 978-3-030-01298-4
eBook Packages: Computer ScienceComputer Science (R0)