Definition
In its rawest form, correlation clustering is graph optimization problem. Consider a clustering C to be a mapping from the elements to be clustered, V , to the set {1, …, | V | }, so that u and v are in the same cluster if and only if C[u] = C[v]. Given a collection of items in which each pair (u, v) has two weights w uv + and w uv − , we must find a clustering C that minimizes
or, equivalently, maximizes
Note that although w uv + and w uv − may be thought of as positive and negative evidence towards coassociation, the actual weights are nonnegative.
Motivation and Background
The notion of clustering with advice, that is nonmetric-driven relations between items,...
Recommended Reading
Ailon, N., Charikar, M., & Newman, A. (2005). Aggregating inconsistent information: Ranking and clustering. In Proceedings of the Thirty-Seventh ACM Symposium on the Theory of Computing (pp. 684–693). New York: ACM Press.
Alon, N., Makarychev, K., Makarychev, Y., & Naor, A. (2006). Quadratic forms on graphs. Inventiones Mathematicae, 163(3), 499–522.
Arora, S., Berger, E., Hazan, E., Kindler, G., & Safra, S. (2005). On non-approximability for quadratic programs. In Proceedings of Forty-Sixth Symposium on Foundations of Computer Science. (pp. 206–215). Washington DC: IEEE Computer Society.
Bansal, N., Blum, A., & Chawla, S. (2002). Correlation clustering. In Correlation clustering (pp. 238–247). Washington, DC: IEEE Computer Society.
Ben-Dor, A., Shamir, R., & Yakhini, Z. (1999). Clustering gene expression patterns. Journal of Computational Biology, 6, 281–297.
Bertolacci, M., & Wirth, A. (2007). Are approximation algorithms for consensus clustering worthwhile? In Proceedings of Seventh SIAM International Conference on Data Mining. (pp. 437–442). Philadelphia: SIAM.
Charikar, M., Guruswami, V., & Wirth, A. (2003). Clustering with qualitative information. In Proceedings of forty fourth FOCS (pp. 524–533).
Charikar, M., & Wirth, A. (2004). Maximizing quadratic programs: Extending Grothendieck’s inequality. In Proceedings of forty fifth FOCS (pp. 54–60).
Daume, H. (2006). Practical structured learning techniques for natural language processing. PhD thesis, University of Southern California.
Davidson, I., & Ravi, S. (2005). Clustering with constraints: Feasibility issues and the k-means algorithm. In Proceedings of Fifth SIAM International Conference on Data Mining.
Demaine, E., Emanuel, D., Fiat, A., & Immorlica, N. (2006). Correlation clustering in general weighted graphs. Theoretical Computer Science, 361(2), 172–187.
Demaine, E., & Immorlica, N. (2003). Correlation clustering with partial information. In Proceedings of Sixth Workshop on Approximation Algorithms for Combinatorial Optimization Problems. (pp. 1–13).
Emanuel, D., & Fiat, A. (2003). Correlation clustering – minimizing disagreements on arbitrary weighted graphs. In Proceedings of Eleventh European Symposium on Algorithms (pp. 208–220).
Ferligoj, A., & Batagelj, V. (1982). Clustering with relational constraint. Psychometrika, 47(4), 413–426.
Finley, T., & Joachims, T. (2005). Supervised clustering with support vector machines. In Proceedings of Twenty-Second International Conference on Machine Learning.
Gionis, A., Mannila, H., & Tsaparas, P. (2005). Clustering aggregation. In Proceedings of Twenty-First International Conference on Data Engineering. To appear.
Gramm, J., Guo, J., Hüffner, F., & Niedermeier, R. (2004). Automated generation of search tree algorithms for hard graph modification problems. Algorithmica, 39(4), 321–347.
Kulis, B., Basu, S., Dhillon, I., & Mooney, R. (2005). Semi-supervised graph clustering: A kernel approach. In Proceedings of Twenty-Second International Conference on Machine Learning (pp. 457–464).
McCallum, A., & Wellner, B. (2005). Conditional models of identity uncertainty with application to noun coreference. In L. Saul, Y. Weiss, & L. Bottou, (Eds.), Advances in neural information processing systems 17 (pp. 905–912). Cambridge, MA: MIT Press.
Meilă, M. (2003). Comparing clusterings by the variation of information. In Proceedings of Sixteenth Conference on Learning Theory (pp. 173–187).
Shamir, R., Sharan, R., & Tsur, D. (2004). Cluster graph modification problems. Discrete Applied Mathematics, 144, 173–182.
Swamy, C. (2004). Correlation Clustering: Maximizing agreements via semidefinite programming. In Proceedings of Fifteenth ACM-SIAM Symposium on Discrete Algorithms (pp. 519–520).
Tan, J. (2007). A Note on the inapproximability of correlation clustering. Technical Report 0704.2092, eprint arXiv, 2007.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer Science+Business Media, LLC
About this entry
Cite this entry
Wirth, A. (2011). Correlation Clustering. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-30164-8_176
Download citation
DOI: https://doi.org/10.1007/978-0-387-30164-8_176
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-30768-8
Online ISBN: 978-0-387-30164-8
eBook Packages: Computer ScienceReference Module Computer Science and Engineering