Kernel K-Means for Categorical Data

Couto, Julia

doi:10.1007/11552253_5

Julia Couto²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3646))

Included in the following conference series:

International Symposium on Intelligent Data Analysis

2274 Accesses
8 Citations

Abstract

Clustering categorical data is an important and challenging data analysis task. In this paper, we explore the use of kernel K-means to cluster categorical data. We propose a new kernel function based on Hamming distance to embed categorical data in a constructed feature space where the clustering is conducted. We experimentally evaluated the quality of the solutions produced by kernel K-means on real datasets. Results indicated the feasibility of kernel K-means using our proposed kernel function to discover clusters embedded in categorical data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Andritsos, P., Tsaparas, P., Miller, R.J., Sevcik., K.C.: LIMBO: Scalable Clustering of Categorical Data. In: Bertino, E., Christodoulakis, S., Plexousakis, D., Christophides, V., Koubarakis, M., Böhm, K., Ferrari, E. (eds.) EDBT 2004. LNCS, vol. 2992. Springer, Heidelberg (2004)
Chapter Google Scholar
Barbara, D., Couto, J., Li, Y.: Coolcat: An Entropy-based algorithm for Categorical Clustering. In: Proceedings of the 11th ACM Conference on Information and Knowledge Management (CIKM 2002), McLean, Virginia, USA, November 2002, pp. 582–589. ACM Press, New York (2002)
Google Scholar
Ben-hur, A., Horn, D., Siegelmann, H.T., Vapnik, V.: Support Vector Clustering. Journal of Machine Learning Research 2, 125–137
Google Scholar
Blake, C.L., Merz, C.J.: UCI Repository of Machine Learning Databases. University of California, Department of Information and Computer Science, Irvine, CA, http://www.ics.uci.edu/~mlearn/MLRepository.html.
Ganti, V., Gehrke, J., Ramakrishnan, R.: CACTUS: Clustering Categorical Data using Summaries. In: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), San Diego, CA, USA, August 1999, pp. 73–83. ACM Press, New York (1999)
Chapter Google Scholar
Girolami, M.: Mercer Kernel Based Clustering in Feature Space. IEEE Transactions on Neural Networks 13(4), 780–784 (2002)
Article Google Scholar
Gibson, D., Kleinberg, J., Raghavan, P.: Clustering Categorical Data: An Approach Based on Dynamical Systems. In: Proceedings of the 24th International Conference on Very Large Data Bases (VLDB), New York, USA, August 1998, pp. 311–322. Morgan Kaufmann, San Francisco (1998)
Google Scholar
Gluck, A., Corter, J.: Information, Uncertainty, and the Utility of Categories. In: Proceedings of the 7th Annual Conference of the Cognitive Science Society, Irvine, California, pp. 283–287. Laurence Erlbaum Associates, Mahwah (1985)
Google Scholar
Guha, S., Rastogi, R., Shim, K.: ROCK: A Robust Clustering Algorithm for Categorical Attributes. Journal of Information Systems 25(5), 345–366 (2000)
Article Google Scholar
Halkidi, M., Batistakis, Y., Vazirgiannis, M.: On Clustering Validation Techniques. Journal of Intelligent Information Systems 17(2–3), 107–145 (2001)
Article MATH Google Scholar
Huang, Z.: Extensions to the K-Means Algorithm for Clustering Large Data Sets with Categorical Values. Data Mining and Knowledge Discovery 2(3), 283–304 (1998)
Article Google Scholar
Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs (1988)
MATH Google Scholar
Katsavounidis, I., Kuo, C., Zhang, Z.: A New Initialization Technique for Generalized Lloyd Iteration. IEEE Signal Processing Letters 1(10), 144–146 (1994)
Article Google Scholar
Kondor, R.I., Lafferty, J.: Diffusion Kernels on Graphs and Other Discrete Structures. In: Sammut, C., Hoffmann, A.G. (eds.) Proceedings of the 19th International Conference on Machine Learning (ICML 2002), pp. 315–322. Morgan Kaufmann, San Francisco (2002)
Google Scholar
Larsen, B., Aone, C.: Fast and Effective Text Mining Using Linear-time Document Clustering. In: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, August 1999, pp. 16–22. ACM Press, New York (1999)
Chapter Google Scholar
Lodhi, H., Shawe-Taylor, J., Cristiani, N., Watkins, C.: Text Classification using String Kernels. Journal of Machine Learning Research 2, 419–444
Google Scholar
Shawe-Taylor, J., Cristiani, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2003)
Google Scholar
Slonim, N., Tibshy, N.: Agglomerative Information Bottleneck. In: Proceedings of the Neural Information Processing Systems Conference 1999 (NIPS 1999), Beckenridge (1999)
Google Scholar
Steinbach, M., Karypis, G., Kumar, V.: A Comparison of Document Clustering Technique. Technical Report #00–034, University of Minnesota, Department of Computer Science and Egineering
Google Scholar
Zaki, M.J., Peters, M.: CLICK: Mining Subspaces Clusters in Categorical Data via K-partite Maximal Cliques. TR 04-11, CS Dept., RPI (2004)
Google Scholar
Zhang, R., Rudnicky, A.: A Large Scale Clustering Scheme for Kernel K-means. In: Proceedings of the 16th International Conference on Pattern Recognition (ICPR 2002), Quebec City, Canada, August 2002, pp. 289–292 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

James Madison University, Harrisonburg, VA, 22807, USA
Julia Couto

Authors

Julia Couto
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute for Information Technology, National Research Council Canada, Ottawa, Canada
A. Fazel Famili
LIACS, Leiden University, The Netherlands
Joost N. Kok
IFM, Linköping University, SE-58183, Linköping, Sweden
José M. Peña
Department of Computer Science, Universiteit Utrecht,
Arno Siebes
Utrecht University, TB Utrecht,, P.O. box 80 089, NL-3508, the Netherlands
Ad Feelders

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Couto, J. (2005). Kernel K-Means for Categorical Data. In: Famili, A.F., Kok, J.N., Peña, J.M., Siebes, A., Feelders, A. (eds) Advances in Intelligent Data Analysis VI. IDA 2005. Lecture Notes in Computer Science, vol 3646. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11552253_5

Download citation

DOI: https://doi.org/10.1007/11552253_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28795-7
Online ISBN: 978-3-540-31926-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics