Constrained Clustering

Wagstaff, Kiri  L.

doi:10.1007/978-1-4899-7687-1_163

Kiri L. Wagstaff³

185 Accesses
2 Citations

Definition

Constrained clustering is a semisupervised approach to clustering data while incorporating domain knowledge in the form of constraints. The constraints are usually expressed as pairwise statements indicating that two items must, or cannot, be placed into the same cluster. Constrained clustering algorithms may enforce every constraint in the solution, or they may use the constraints as guidance rather than hard requirements.

Motivation and Background

Unsupervised learningoperates without any domain-specific guidance or preexisting knowledge. Supervised learning requires that all training examples be associated with labels. Yet it is often the case that existing knowledge for a problem domain fits neither of these extremes. Semisupervised learning methods fill this gap by making use of both labeled and unlabeled data. Constrained clustering, a form of semisupervised learning, was developed to extend clustering algorithms to incorporate existing domain knowledge, when...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 699.99; Price excludes VAT (USA)

Hardcover Book: USD 949.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Recommended Reading

Bar-Hillel A, Hertz T, Shental N, Weinshall D (2005) Learning a Mahalanobis metric from equivalence constraints. J Mach Learn Res 6:937–965
MathSciNet MATH Google Scholar
Basu S, Bilenko M, Mooney RJ (2004) A probabilistic framework for semi-supervised clustering. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, Seattle, pp 59–68
Google Scholar
Basu S, Davidson I, Wagstaff K (eds) (2008) Constrained clustering: advances in algorithms, theory, and applications. CRC Press, Boca Raton
Google Scholar
Bilenko M, Basu S, Mooney RJ (2004) Integrating constraints and metric learning in semi-supervised clustering. In: Proceedings of the twenty-first international conference on machine learning, Banff, pp 11–18
Google Scholar
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc 39(1):1–38
MathSciNet MATH Google Scholar
Kamvar S, Klein D, Manning CD (2003) Spectral learning. In: Proceedings of the international joint conference on artificial intelligence, Acapulco, pp 561–566
Google Scholar
Klein D, Kamvar SD, Manning CD (2002) From instance-level constraints to space-level constraints: making the most of prior knowledge in data clustering. In: Proceedings of the nineteenth international conference on machine learning, Sydney, pp 307–313
Google Scholar
Lu Z, Leen T (2005) Semi-supervised learning with penalized probabilistic clustering. In: Advances in neural information processing systems, vol 17. MIT Press, Cambridge, MA, pp 849–856
Google Scholar
MacQueen JB (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth symposium on math, statistics, and probability, vol 1. University of California Press, California, pp 281–297
Google Scholar
Shental N, Bar-Hillel A, Hertz T, Weinshall D (2004) Computing Gaussian mixture models with EM using equivalence constraints. In: Advances in neural information processing systems, vol 16. MIT Press, Cambridge, MA, pp 465–472
Google Scholar
Wagstaff K, Cardie C (2000) Clustering with instance-level constraints. In: Proceedings of the seventeenth international conference on machine learning. Morgan Kaufmann, San Francisco, pp 1103–1110
Google Scholar
Wagstaff K, Cardie C, Rogers S, Schroedl S (2001) Constrained k-means clustering with background knowledge. In: Proceedings of the eighteenth international conference on machine learning. Morgan Kaufmann, San Francisco, pp 577–584
Google Scholar
Xing EP, Ng AY, Jordan MI, Russell S (2003) Distance metric learning, with application to clustering with side-information. In: Advances in neural information processing systems, vol 15. MIT Press, Cambridge, MA, pp 505–512
Google Scholar

Download references

Author information

Authors and Affiliations

Pasadena, CA, USA
Kiri L. Wagstaff

Authors

Kiri L. Wagstaff
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

The University of New South Wales, Sydney, NSW, Australia
Claude Sammut
Faculty of Information Technology, Monash University, Melbourne, VIC, Australia
Geoffrey I. Webb

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Wagstaff, K.L. (2017). Constrained Clustering. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning and Data Mining. Springer, Boston, MA. https://doi.org/10.1007/978-1-4899-7687-1_163

Download citation

DOI: https://doi.org/10.1007/978-1-4899-7687-1_163
Published: 14 April 2017
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4899-7685-7
Online ISBN: 978-1-4899-7687-1
eBook Packages: Computer ScienceReference Module Computer Science and Engineering

Publish with us

Policies and ethics