Skip to main content

A Self-representation Model for Robust Clustering of Categorical Sequences

  • Conference paper
  • First Online:
Web and Big Data (APWeb-WAIM 2018)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11268))

  • 1033 Accesses

Abstract

Robust clustering on categorical sequences remains an open and challenging task due to the noise data and lack of an inherently meaningful measure of pairwise similarity between sequences. In this paper, a self-representation model is proposed as a representation of categorical sequences. Based on the model, we transform the robust clustering to a subspace clustering problem. Furthermore, an efficient algorithm for robust clustering of categorical sequences is also proposed, which provides the new measure with high-quality clustering results and the elimination of noise sequences using the subspace method. The experimental results on the synthetic and real world data demonstrate the promising performance of the proposed method.

Supported by the National Natural Science Foundation of China under Grant 61672157; Innovative Research Team of Probability and Statistics: Theory and Application (IRTL1704).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Xing, Z., Pei, J., Keogh, E.: A brief survey on sequence classification. ACM SIGKDD Explor. Newsl. 12, 40–48 (2010)

    Article  Google Scholar 

  2. Xiong, T., Wang, S., Jiang, Q., Huang, J.Z: A new Markov model for clustering categorical sequences. In: IEEE International Conference on Data Mining, pp. 854–863 (2011)

    Google Scholar 

  3. Kondrak, G.: N-Gram similarity and distance. In: International Conference on String Processing and Information Retrieval, pp. 115–126 (2005)

    Chapter  Google Scholar 

  4. Kelil, A., Wang, S.: SCS: a new similarity measure for categorical sequences. In: Eighth IEEE International Conference on Data Mining, pp. 343–352 (2008)

    Google Scholar 

  5. Blasiak, S., Rangwala, H.: A hidden Markov model variant for sequence classification. In: IJCAI 2011, Proceedings of the International Joint Conference on Artificial Intelligence, pp. 1192–1197. Barcelona, Catalonia, Spain (July 2012)

    Google Scholar 

  6. Smyth, P.: Clustering sequences with hidden Markov models. Nips 9, 648–654 (1997)

    Google Scholar 

  7. Guo, G., Chen, L., Ye, Y., Jiang, Q.: Cluster validation method for determining the number of clusters in categorical sequences. IEEE Trans. Neural Netw. Learn. Syst. 28, 2936–2948 (2017)

    Article  MathSciNet  Google Scholar 

  8. Yao, S.: A robust hidden Markov model based clustering algorithm. In: Information Technology and Artificial Intelligence Conference, pp. 259–264 (2011)

    Google Scholar 

  9. Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Read. Speech Recognit. 77, 267–296 (1990)

    Article  Google Scholar 

  10. Smola, A., Gretton, A., Song, L.: A Hilbert space embedding for distributions. In: International Conference on Algorithmic Learning Theory, pp. 13–31 (2007)

    Google Scholar 

  11. Song, L.: Kernel embeddings of conditional distributions. IEEE Signal Process. Mag. 30, 98–111 (2013)

    Article  Google Scholar 

  12. Schlkopf, B., Smola, A.: Learning with Kernels: support vector machines. Regularization, optimization, and beyond, publications of the American statistical association 98, 489–489 (2002)

    Google Scholar 

  13. Fukumizu, K.: Kernel measures of conditional dependence. In: Conference on Neural Information Processing Systems, pp. 167–204. Vancouver, British Columbia, Canada (December 2007)

    Google Scholar 

  14. Elhamifar, E., Vidal, R.: Sparse subspace clustering: algorithm, theory, and applications. IEEE Trans. Pattern Anal. Mach. Intell. 35, 2765–2781 (2012)

    Article  Google Scholar 

  15. Bezdek, J.C.: A convergence theorem for the fuzzy ISODATA clustering algorithms. IEEE Trans. Pattern Anal. Mach. Intell. 2, 1–8 (1980)

    Article  Google Scholar 

  16. Jordan, M., Xu, L.: On convergence properties of the EM algorithm for Gaussian mixtures. Neural Comput. 8, 129–151 (1995)

    Google Scholar 

  17. Loiselle, S., Rouat, J., Pressnitzer, D., Thorpe, S.: Exploration of rank order coding with spiking neural networks for speech recognition. In: 2005 IEEE International Joint Conference on Neural Networks, IJCNN 2005. Proceedings, vol. 2074, pp. 2076–2080 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lifei Chen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Xu, K., Chen, L., Wang, S., Wang, B. (2018). A Self-representation Model for Robust Clustering of Categorical Sequences. In: U, L., Xie, H. (eds) Web and Big Data. APWeb-WAIM 2018. Lecture Notes in Computer Science(), vol 11268. Springer, Cham. https://doi.org/10.1007/978-3-030-01298-4_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-01298-4_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-01297-7

  • Online ISBN: 978-3-030-01298-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics