A Self-representation Model for Robust Clustering of Categorical Sequences

Xu, Kunpeng; Chen, Lifei; Wang, Shengrui; Wang, Beizhan

doi:10.1007/978-3-030-01298-4_2

Kunpeng Xu^15,16,
Lifei Chen^15,16,
Shengrui Wang¹⁷ &
…
Beizhan Wang¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11268))

Included in the following conference series:

Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint International Conference on Web and Big Data

1033 Accesses

Abstract

Robust clustering on categorical sequences remains an open and challenging task due to the noise data and lack of an inherently meaningful measure of pairwise similarity between sequences. In this paper, a self-representation model is proposed as a representation of categorical sequences. Based on the model, we transform the robust clustering to a subspace clustering problem. Furthermore, an efficient algorithm for robust clustering of categorical sequences is also proposed, which provides the new measure with high-quality clustering results and the elimination of noise sequences using the subspace method. The experimental results on the synthetic and real world data demonstrate the promising performance of the proposed method.

Supported by the National Natural Science Foundation of China under Grant 61672157; Innovative Research Team of Probability and Statistics: Theory and Application (IRTL1704).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Xing, Z., Pei, J., Keogh, E.: A brief survey on sequence classification. ACM SIGKDD Explor. Newsl. 12, 40–48 (2010)
Article Google Scholar
Xiong, T., Wang, S., Jiang, Q., Huang, J.Z: A new Markov model for clustering categorical sequences. In: IEEE International Conference on Data Mining, pp. 854–863 (2011)
Google Scholar
Kondrak, G.: N-Gram similarity and distance. In: International Conference on String Processing and Information Retrieval, pp. 115–126 (2005)
Chapter Google Scholar
Kelil, A., Wang, S.: SCS: a new similarity measure for categorical sequences. In: Eighth IEEE International Conference on Data Mining, pp. 343–352 (2008)
Google Scholar
Blasiak, S., Rangwala, H.: A hidden Markov model variant for sequence classification. In: IJCAI 2011, Proceedings of the International Joint Conference on Artificial Intelligence, pp. 1192–1197. Barcelona, Catalonia, Spain (July 2012)
Google Scholar
Smyth, P.: Clustering sequences with hidden Markov models. Nips 9, 648–654 (1997)
Google Scholar
Guo, G., Chen, L., Ye, Y., Jiang, Q.: Cluster validation method for determining the number of clusters in categorical sequences. IEEE Trans. Neural Netw. Learn. Syst. 28, 2936–2948 (2017)
Article MathSciNet Google Scholar
Yao, S.: A robust hidden Markov model based clustering algorithm. In: Information Technology and Artificial Intelligence Conference, pp. 259–264 (2011)
Google Scholar
Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Read. Speech Recognit. 77, 267–296 (1990)
Article Google Scholar
Smola, A., Gretton, A., Song, L.: A Hilbert space embedding for distributions. In: International Conference on Algorithmic Learning Theory, pp. 13–31 (2007)
Google Scholar
Song, L.: Kernel embeddings of conditional distributions. IEEE Signal Process. Mag. 30, 98–111 (2013)
Article Google Scholar
Schlkopf, B., Smola, A.: Learning with Kernels: support vector machines. Regularization, optimization, and beyond, publications of the American statistical association 98, 489–489 (2002)
Google Scholar
Fukumizu, K.: Kernel measures of conditional dependence. In: Conference on Neural Information Processing Systems, pp. 167–204. Vancouver, British Columbia, Canada (December 2007)
Google Scholar
Elhamifar, E., Vidal, R.: Sparse subspace clustering: algorithm, theory, and applications. IEEE Trans. Pattern Anal. Mach. Intell. 35, 2765–2781 (2012)
Article Google Scholar
Bezdek, J.C.: A convergence theorem for the fuzzy ISODATA clustering algorithms. IEEE Trans. Pattern Anal. Mach. Intell. 2, 1–8 (1980)
Article Google Scholar
Jordan, M., Xu, L.: On convergence properties of the EM algorithm for Gaussian mixtures. Neural Comput. 8, 129–151 (1995)
Google Scholar
Loiselle, S., Rouat, J., Pressnitzer, D., Thorpe, S.: Exploration of rank order coding with spiking neural networks for speech recognition. In: 2005 IEEE International Joint Conference on Neural Networks, IJCNN 2005. Proceedings, vol. 2074, pp. 2076–2080 (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Mathematics and Informatics, Fujian Normal University, Fuzhou, 350117, China
Kunpeng Xu & Lifei Chen
Digit Fujian Internet-of-Things Laboratory of Environmental Monitoring, Fujian Normal University, Fuzhou, 350117, China
Kunpeng Xu & Lifei Chen
Department of Computer Science, University of Sherbrooke, Sherbrooke, QC, J1K2R1, Canada
Shengrui Wang
School of Software, Xiamen University, Xiamen, 361005, China
Beizhan Wang

Authors

Kunpeng Xu
View author publications
You can also search for this author in PubMed Google Scholar
Lifei Chen
View author publications
You can also search for this author in PubMed Google Scholar
Shengrui Wang
View author publications
You can also search for this author in PubMed Google Scholar
Beizhan Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lifei Chen .

Editor information

Editors and Affiliations

University of Macau, Macao, China
Leong Hou U
Education University of Hong Kong, Hong Kong, China
Haoran Xie

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xu, K., Chen, L., Wang, S., Wang, B. (2018). A Self-representation Model for Robust Clustering of Categorical Sequences. In: U, L., Xie, H. (eds) Web and Big Data. APWeb-WAIM 2018. Lecture Notes in Computer Science(), vol 11268. Springer, Cham. https://doi.org/10.1007/978-3-030-01298-4_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-01298-4_2
Published: 21 October 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01297-7
Online ISBN: 978-3-030-01298-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics