Skip to main content

Constrained Clustering

  • Reference work entry
Encyclopedia of Machine Learning

Definition

Constrained clustering is a semisupervised approach to clustering data while incorporating domain knowledge in the form of constraints. The constraints are usually expressed as pairwise statements indicating that two items must, or cannot, be placed into the same cluster. Constrained clustering algorithms may enforce every constraint in the solution, or they may use the constraints as guidance rather than hard requirements.

Motivation and Background

Unsupervised learningoperates without any domain-specific guidance or preexisting knowledge. Supervised learning requires that all training examples be associated with labels. Yet it is often the case that existing knowledge for a problem domain fits neither of these extremes. Semisupervised learning methods fill this gap by making use of both labeled and unlabeled data. Constrained clustering, a form of semisupervised learning, was developed to extend clustering algorithms to incorporate existing domain knowledge, when...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Recommended Reading

  • Bar-Hillel, A., Hertz, T., Shental, N., & Weinshall, D. (2005). Learning a Mahalanobis metric from equivalence constraints. Journal of Machine Learning Research, 6, 937–965.

    MathSciNet  Google Scholar 

  • Basu, S., Bilenko, M., & Mooney, R. J. (2004). A probabilistic framework for semi-supervised clustering. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 59–68). Seattle, WA.

    Google Scholar 

  • Basu, S., Davidson, I., & Wagstaff, K. (Eds.). (2008). Constrained Clustering: Advances in Algorithms, Theory, and Applications. Boca Raton, FL: CRC Press.

    Google Scholar 

  • Bilenko, M., Basu, S., & Mooney, R. J. (2004). Integrating constraints and metric learning in semi-supervised clustering. In Proceedings of the Twenty-first International Conference on Machine Learning (pp. 11–18). Banff, AB, Canada.

    Google Scholar 

  • Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, 39(1), 1–38.

    MathSciNet  MATH  Google Scholar 

  • Kamvar, S., Klein, D., & Manning, C. D. (2003). Spectral learning. In Proceedings of the International Joint Conference on Artificial Intelligence (pp. 561–566). Acapulco, Mexico.

    Google Scholar 

  • Klein, D., Kamvar, S. D., & Manning, C. D. (2002). From instance-level constraints to space-level constraints: Making the most of prior knowledge in data clustering. In Proceedings of the Nineteenth International Conference on Machine Learning (pp. 307–313). Sydney, Australia.

    Google Scholar 

  • Lu, Z. & Leen, T. (2005). Semi-supervised learning with penalized probabilistic clustering. In Advances in Neural Information Processing Systems (Vol. 17, pp. 849–856). Cambridge, MA: MIT Press.

    Google Scholar 

  • MacQueen, J. B. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Symposium on Math, Statistics, and Probability (Vol. 1, pp. 281–297). California: University of California Press.

    Google Scholar 

  • Shental, N., Bar-Hillel, A., Hertz, T., & Weinshall, D. (2004). Computing Gaussian mixture models with EM using equivalence constraints. In Advances in Neural Information Processing Systems (Vol. 16, pp. 465–472). Cambridge, MA: MIT Press.

    Google Scholar 

  • Wagstaff, K. & Cardie, C. (2000). Clustering with instance-level constraints. In Proceedings of the Seventeenth International Conference on Machine Learning (pp. 1103–1110). San Francisco: Morgan Kaufmann.

    Google Scholar 

  • Wagstaff, K., Cardie, C., Rogers, S., & Schroedl, S. (2001). Constrained k-means clustering with background knowledge. In Proceedings of the Eighteenth International Conference on Machine Learning (pp. 577–584). San Francisco: Morgan Kaufmann.

    Google Scholar 

  • Xing, E. P., Ng, A. Y., Jordan, M. I., & Russell, S. (2003). Distance metric learning, with application to clustering with side-information. In Advances in Neural Information Processing Systems (Vol. 15, pp. 505–512). Cambridge, MA: MIT Press.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer Science+Business Media, LLC

About this entry

Cite this entry

Wagstaff, K.L. (2011). Constrained Clustering. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-30164-8_163

Download citation

Publish with us

Policies and ethics