Synonyms
Unsupervised learning
Definition
Clustering is the assignment of objects to groups of similar objects (clusters). The objects are typically described as vectors of features (also called attributes). Attributes can be numerical (scalar) or categorical. The assignment can be hard, where each object belongs to one cluster, or fuzzy, where an object can belong to several clusters with a probability. The clusters can be overlapping, though typically they are disjoint. A distance measure is a function that quantifies the similarity of two objects.
Historical Background
Clustering is one of the most useful tasks in data analysis. The goal of clustering is to discover groups of similar objects and to identify interesting patterns in the data. Typically, the clustering problem is about partitioning a given data set into groups (clusters) such that the data points in a cluster are more similar to each other than points in different clusters [4, 8]. For example, consider a retail...
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Recommended Reading
Agrawal R, Gehrke J, Gunopulos D, Raghavan P. Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 1998. p. 94–105.
Bezdeck JC, Ehrlich R, Full W. FCM: Fuzzy C-Means algorithm. Comput Geosci. 1984;10(2–3):191–203.
Ester M, Kriegel H.-Peter, Sander J, Xu X. A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining; 1996. p. 226–31.
Everitt BS, Landau S, Leese M. Cluster analysis. London: Hodder Arnold; 2001.
Fayyad UM, Piatesky-Shapiro G, Smuth P, Uthurusamy R. Advances in knowledge discovery and data mining. Menlo Park: AAAI Press; 1996.
Han J, Kamber M. Data mining: concepts and techniques. San Fransisco: Morgan Kaufmann Publishers; 2001.
Huang Z. A fast clustering algorithm to cluster very large categorical data sets in data mining. In: Proceedings of the ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery; 1997.
Jain AK, Murty MN, Flyn PJ. Data clustering: a review. ACM Comput Surv. 1999;31(3):264–323.
Karypis G, Han E-H, Kumar V. CHAMELEON: a hierarchical clustering algorithm using dynamic modeling. IEEE Comput. 1999;32(8):68–75.
MacQueen JB Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability; 1967. p. 281–97.
Mitchell T. Machine learning. New York: McGraw-Hill; 1997.
Ng R, Han J. Efficient and effective clustering methods for spatial data mining. In: Proceedings of the 20th International Conference on Very Large Data Bases; 1994. p. 144–55.
Theodoridis S, Koutroubas K. Pattern recognition. New York: Academic; 1999.
Vazirgiannis M, Halkidi M, Gunopulos D. Uncertainty handling and quality assessment in data mining. New York: Springer; 2003.
Wang W, Yang J, Muntz R. STING: a statistical information grid approach to spatial data mining. In: Proceedings of the 23th Internationa Conference on Very Large Data Bases; 1997. p. 186–95.
Zhang T, Ramakrishnman R, Linvy M. BIRCH: an efficient method for very large databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 1996. p. 103–14.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Section Editor information
Rights and permissions
Copyright information
© 2018 Springer Science+Business Media, LLC, part of Springer Nature
About this entry
Cite this entry
Gunopulos, D. (2018). Clustering Overview and Applications. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_602
Download citation
DOI: https://doi.org/10.1007/978-1-4614-8265-9_602
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-8266-6
Online ISBN: 978-1-4614-8265-9
eBook Packages: Computer ScienceReference Module Computer Science and Engineering