Skip to main content

Kernel K-Means for Categorical Data

  • Conference paper
Advances in Intelligent Data Analysis VI (IDA 2005)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3646))

Included in the following conference series:

Abstract

Clustering categorical data is an important and challenging data analysis task. In this paper, we explore the use of kernel K-means to cluster categorical data. We propose a new kernel function based on Hamming distance to embed categorical data in a constructed feature space where the clustering is conducted. We experimentally evaluated the quality of the solutions produced by kernel K-means on real datasets. Results indicated the feasibility of kernel K-means using our proposed kernel function to discover clusters embedded in categorical data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Andritsos, P., Tsaparas, P., Miller, R.J., Sevcik., K.C.: LIMBO: Scalable Clustering of Categorical Data. In: Bertino, E., Christodoulakis, S., Plexousakis, D., Christophides, V., Koubarakis, M., Böhm, K., Ferrari, E. (eds.) EDBT 2004. LNCS, vol. 2992. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  2. Barbara, D., Couto, J., Li, Y.: Coolcat: An Entropy-based algorithm for Categorical Clustering. In: Proceedings of the 11th ACM Conference on Information and Knowledge Management (CIKM 2002), McLean, Virginia, USA, November 2002, pp. 582–589. ACM Press, New York (2002)

    Google Scholar 

  3. Ben-hur, A., Horn, D., Siegelmann, H.T., Vapnik, V.: Support Vector Clustering. Journal of Machine Learning Research 2, 125–137

    Google Scholar 

  4. Blake, C.L., Merz, C.J.: UCI Repository of Machine Learning Databases. University of California, Department of Information and Computer Science, Irvine, CA, http://www.ics.uci.edu/~mlearn/MLRepository.html.

  5. Ganti, V., Gehrke, J., Ramakrishnan, R.: CACTUS: Clustering Categorical Data using Summaries. In: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), San Diego, CA, USA, August 1999, pp. 73–83. ACM Press, New York (1999)

    Chapter  Google Scholar 

  6. Girolami, M.: Mercer Kernel Based Clustering in Feature Space. IEEE Transactions on Neural Networks 13(4), 780–784 (2002)

    Article  Google Scholar 

  7. Gibson, D., Kleinberg, J., Raghavan, P.: Clustering Categorical Data: An Approach Based on Dynamical Systems. In: Proceedings of the 24th International Conference on Very Large Data Bases (VLDB), New York, USA, August 1998, pp. 311–322. Morgan Kaufmann, San Francisco (1998)

    Google Scholar 

  8. Gluck, A., Corter, J.: Information, Uncertainty, and the Utility of Categories. In: Proceedings of the 7th Annual Conference of the Cognitive Science Society, Irvine, California, pp. 283–287. Laurence Erlbaum Associates, Mahwah (1985)

    Google Scholar 

  9. Guha, S., Rastogi, R., Shim, K.: ROCK: A Robust Clustering Algorithm for Categorical Attributes. Journal of Information Systems 25(5), 345–366 (2000)

    Article  Google Scholar 

  10. Halkidi, M., Batistakis, Y., Vazirgiannis, M.: On Clustering Validation Techniques. Journal of Intelligent Information Systems 17(2–3), 107–145 (2001)

    Article  MATH  Google Scholar 

  11. Huang, Z.: Extensions to the K-Means Algorithm for Clustering Large Data Sets with Categorical Values. Data Mining and Knowledge Discovery 2(3), 283–304 (1998)

    Article  Google Scholar 

  12. Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs (1988)

    MATH  Google Scholar 

  13. Katsavounidis, I., Kuo, C., Zhang, Z.: A New Initialization Technique for Generalized Lloyd Iteration. IEEE Signal Processing Letters 1(10), 144–146 (1994)

    Article  Google Scholar 

  14. Kondor, R.I., Lafferty, J.: Diffusion Kernels on Graphs and Other Discrete Structures. In: Sammut, C., Hoffmann, A.G. (eds.) Proceedings of the 19th International Conference on Machine Learning (ICML 2002), pp. 315–322. Morgan Kaufmann, San Francisco (2002)

    Google Scholar 

  15. Larsen, B., Aone, C.: Fast and Effective Text Mining Using Linear-time Document Clustering. In: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, August 1999, pp. 16–22. ACM Press, New York (1999)

    Chapter  Google Scholar 

  16. Lodhi, H., Shawe-Taylor, J., Cristiani, N., Watkins, C.: Text Classification using String Kernels. Journal of Machine Learning Research 2, 419–444

    Google Scholar 

  17. Shawe-Taylor, J., Cristiani, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2003)

    Google Scholar 

  18. Slonim, N., Tibshy, N.: Agglomerative Information Bottleneck. In: Proceedings of the Neural Information Processing Systems Conference 1999 (NIPS 1999), Beckenridge (1999)

    Google Scholar 

  19. Steinbach, M., Karypis, G., Kumar, V.: A Comparison of Document Clustering Technique. Technical Report #00–034, University of Minnesota, Department of Computer Science and Egineering

    Google Scholar 

  20. Zaki, M.J., Peters, M.: CLICK: Mining Subspaces Clusters in Categorical Data via K-partite Maximal Cliques. TR 04-11, CS Dept., RPI (2004)

    Google Scholar 

  21. Zhang, R., Rudnicky, A.: A Large Scale Clustering Scheme for Kernel K-means. In: Proceedings of the 16th International Conference on Pattern Recognition (ICPR 2002), Quebec City, Canada, August 2002, pp. 289–292 (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Couto, J. (2005). Kernel K-Means for Categorical Data. In: Famili, A.F., Kok, J.N., Peña, J.M., Siebes, A., Feelders, A. (eds) Advances in Intelligent Data Analysis VI. IDA 2005. Lecture Notes in Computer Science, vol 3646. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11552253_5

Download citation

  • DOI: https://doi.org/10.1007/11552253_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-28795-7

  • Online ISBN: 978-3-540-31926-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics