Categorical Data Clustering

Andritsos, Periklis; Tsaparas, Panayiotis

doi:10.1007/978-0-387-30164-8_99

Categorical Data Clustering

Periklis Andritsos &
Panayiotis Tsaparas

Reference work entry

1485 Accesses

Synonyms

Clustering of nonnumerical data; Grouping

Definition

Data clustering is informally defined as the problem of partitioning a set of objects into groups, such that the objects in the same group are similar, while the objects in different groups are dissimilar. Categorical data clustering refers to the case where the data objects are defined over categorical attributes. A categorical attribute is an attribute whose domain is a set of discrete values that are not inherently comparable. That is, there is no single ordering or inherent distance function for the categorical values, and there is no mapping from categorical to numerical values that is semantically meaningful.

Motivation and Background

Clustering is a problem of great practical importance that has been the focus of substantial research in several domains for decades. As storage capacities grow, we have at hand larger amounts of data available for analysis and mining. Clustering plays an instrumental role in this...

This is a preview of subscription content, log in via an institution.

Recommended Reading

Andritsos, P., Tsaparas, P., Miller, R. J., Kenneth, C., & Sevcik, K. C. (2004). LIMBO: Scalable clustering of categorical data. In Proceedings of the 9th international conference on extending database technology (EDBT) (pp. 123–146). Heraklion, Greece.
Google Scholar
Barbarà, D., Couto, J., & Li, Y. (2002). COOLCAT: An entropy-based algorithm for categorical clustering. In Proceedings of the 11th international conference on information and knowledge management (CIKM) (pp. 582–589). McLean, VA.
Google Scholar
Cover, T. M., & Thomas, J. A. (1991). Elements of information theory. New York: Wiley.
MATH Google Scholar
Das, G., & Mannila, H. (2000). Context-based similarity measures for categorical databases. In Proceedings of the 4th European conference on principles of data mining and knowledge discovery (PKDD) (pp. 201–210). Lyon, France.
Google Scholar
Fisher, D. H. (1987). Knowledge acquisition via incremental conceptual clustering. Machine Learning, 2, 139–172.
Google Scholar
Ganti, V., Gehrke, J., & Ramakrishnan, R. (1999). CACTUS: Clustering categorical data using summaries. In Proceedings of the 5th international conference on knowledge discovery and data mining (KDD) (pp. 73–83). San Diego, CA.
Google Scholar
Gionis, A., Mannila, H., & Tsaparas, P. (2007). Clustering aggregation. ACM Transactions on Knowledge Discovery from Data, 1(1), Article No 4.
Google Scholar
Gluck, M., & Corter, J. (1985). Information, uncertainty, and the utility of categories. In Proceedings of the 7th annual conference of the cognitive science society (COGSCI) (pp. 283–287). Irvine, CA.
Google Scholar
Guha, S., Rastogi, R., & Shim, K. (1999). ROCK: A robust clustering algorithm for categorical attributes. In Proceedings of the 15th international conference on data engineering (pp. 512–521). Sydney, Australia.
Google Scholar
Han, J., & Kamber, M. (2001). Data mining: Concepts and techniques. San Francisco: Morgan Kaufmann.
Google Scholar
Huang, Z. (1998). Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Mining and Knowledge Discovery, 2(3), 283–304.
Google Scholar
Jain, A. K., & Dubes, R. C. (1988). Algorithms for clustering data. Englewood Cliffs, NJ: Prentice-Hall.
MATH Google Scholar
Jarke, M., Lenzerini, M., Vassiliou, Y., & Vassiliadis, P. (1999). Fundamentals of data warehouses. Berlin: Springer.
Google Scholar
Kleinberg, Jon (1999). Authoritative sources in a hyperlinked environment”. Journal of the ACM 46(5): 604632.
MathSciNet Google Scholar
Zaki, M. J., Peters, M., Assent, I., & Seidl, T. (2005). CLICKS: An effective algorithm for mining subspace clusters in categorical datasets. In Proceeding of the 11th international conference on knowledge discovery and data mining (KDD) (pp. 736–742). Chicago, IL.
Google Scholar

Download references

Author information

Authors and Affiliations

Authors

Periklis Andritsos
View author publications
You can also search for this author in PubMed Google Scholar
Panayiotis Tsaparas
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Science and Engineering, University of New South Wales, Sydney, Australia, 2052
Claude Sammut
Faculty of Information Technology, Clayton School of Information Technology, Monash University, P.O. Box 63, Victoria, Australia, 3800
Geoffrey I. Webb

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Andritsos, P., Tsaparas, P. (2011). Categorical Data Clustering. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-30164-8_99

Download citation

DOI: https://doi.org/10.1007/978-0-387-30164-8_99
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-30768-8
Online ISBN: 978-0-387-30164-8
eBook Packages: Computer ScienceReference Module Computer Science and Engineering

Publish with us

Policies and ethics