Correlation Clustering

Wirth, Anthony

doi:10.1007/978-0-387-30164-8_176

Anthony Wirth

535 Accesses

Synonyms

Clustering with advice; Clustering with constraints; Clustering with qualitative information; Clustering with side information

Definition

In its rawest form, correlation clustering is graph optimization problem. Consider a clustering C to be a mapping from the elements to be clustered, V , to the set {1, …, | V | }, so that u and v are in the same cluster if and only if C[u] = C[v]. Given a collection of items in which each pair (u, v) has two weights w _uv ⁺ and w _uv ⁻ , we must find a clustering C that minimizes

$$\sum \limits_{C[u]=C[v]}{w}_{uv}^{-} + \sum \limits_{C[u]\neq C[v]}{w}_{uv}^{+}\,,$$

(1)

or, equivalently, maximizes

$$\sum \limits_{C[u]=C[v]}{w}_{uv}^{+} + \sum \limits_{C[u]\neq C[v]}{w}_{uv}^{-}\,.$$

(2)

Note that although w _uv ⁺ and w _uv ⁻ may be thought of as positive and negative evidence towards coassociation, the actual weights are nonnegative.

Motivation and Background

The notion of clustering with advice, that is nonmetric-driven relations between items,...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Recommended Reading

Ailon, N., Charikar, M., & Newman, A. (2005). Aggregating inconsistent information: Ranking and clustering. In Proceedings of the Thirty-Seventh ACM Symposium on the Theory of Computing (pp. 684–693). New York: ACM Press.
Google Scholar
Alon, N., Makarychev, K., Makarychev, Y., & Naor, A. (2006). Quadratic forms on graphs. Inventiones Mathematicae, 163(3), 499–522.
MathSciNet MATH Google Scholar
Arora, S., Berger, E., Hazan, E., Kindler, G., & Safra, S. (2005). On non-approximability for quadratic programs. In Proceedings of Forty-Sixth Symposium on Foundations of Computer Science. (pp. 206–215). Washington DC: IEEE Computer Society.
Google Scholar
Bansal, N., Blum, A., & Chawla, S. (2002). Correlation clustering. In Correlation clustering (pp. 238–247). Washington, DC: IEEE Computer Society.
Google Scholar
Ben-Dor, A., Shamir, R., & Yakhini, Z. (1999). Clustering gene expression patterns. Journal of Computational Biology, 6, 281–297.
Google Scholar
Bertolacci, M., & Wirth, A. (2007). Are approximation algorithms for consensus clustering worthwhile? In Proceedings of Seventh SIAM International Conference on Data Mining. (pp. 437–442). Philadelphia: SIAM.
Google Scholar
Charikar, M., Guruswami, V., & Wirth, A. (2003). Clustering with qualitative information. In Proceedings of forty fourth FOCS (pp. 524–533).
Google Scholar
Charikar, M., & Wirth, A. (2004). Maximizing quadratic programs: Extending Grothendieck’s inequality. In Proceedings of forty fifth FOCS (pp. 54–60).
Google Scholar
Daume, H. (2006). Practical structured learning techniques for natural language processing. PhD thesis, University of Southern California.
Google Scholar
Davidson, I., & Ravi, S. (2005). Clustering with constraints: Feasibility issues and the k-means algorithm. In Proceedings of Fifth SIAM International Conference on Data Mining.
Google Scholar
Demaine, E., Emanuel, D., Fiat, A., & Immorlica, N. (2006). Correlation clustering in general weighted graphs. Theoretical Computer Science, 361(2), 172–187.
MathSciNet MATH Google Scholar
Demaine, E., & Immorlica, N. (2003). Correlation clustering with partial information. In Proceedings of Sixth Workshop on Approximation Algorithms for Combinatorial Optimization Problems. (pp. 1–13).
Google Scholar
Emanuel, D., & Fiat, A. (2003). Correlation clustering – minimizing disagreements on arbitrary weighted graphs. In Proceedings of Eleventh European Symposium on Algorithms (pp. 208–220).
Google Scholar
Ferligoj, A., & Batagelj, V. (1982). Clustering with relational constraint. Psychometrika, 47(4), 413–426.
MathSciNet MATH Google Scholar
Finley, T., & Joachims, T. (2005). Supervised clustering with support vector machines. In Proceedings of Twenty-Second International Conference on Machine Learning.
Google Scholar
Gionis, A., Mannila, H., & Tsaparas, P. (2005). Clustering aggregation. In Proceedings of Twenty-First International Conference on Data Engineering. To appear.
Google Scholar
Gramm, J., Guo, J., Hüffner, F., & Niedermeier, R. (2004). Automated generation of search tree algorithms for hard graph modification problems. Algorithmica, 39(4), 321–347.
MathSciNet MATH Google Scholar
Kulis, B., Basu, S., Dhillon, I., & Mooney, R. (2005). Semi-supervised graph clustering: A kernel approach. In Proceedings of Twenty-Second International Conference on Machine Learning (pp. 457–464).
Google Scholar
McCallum, A., & Wellner, B. (2005). Conditional models of identity uncertainty with application to noun coreference. In L. Saul, Y. Weiss, & L. Bottou, (Eds.), Advances in neural information processing systems 17 (pp. 905–912). Cambridge, MA: MIT Press.
Google Scholar
Meilă, M. (2003). Comparing clusterings by the variation of information. In Proceedings of Sixteenth Conference on Learning Theory (pp. 173–187).
Google Scholar
Shamir, R., Sharan, R., & Tsur, D. (2004). Cluster graph modification problems. Discrete Applied Mathematics, 144, 173–182.
MathSciNet MATH Google Scholar
Swamy, C. (2004). Correlation Clustering: Maximizing agreements via semidefinite programming. In Proceedings of Fifteenth ACM-SIAM Symposium on Discrete Algorithms (pp. 519–520).
Google Scholar
Tan, J. (2007). A Note on the inapproximability of correlation clustering. Technical Report 0704.2092, eprint arXiv, 2007.
Google Scholar

Download references

Author information

Authors and Affiliations

Authors

Anthony Wirth
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Science and Engineering, University of New South Wales, Sydney, Australia, 2052
Claude Sammut
Faculty of Information Technology, Clayton School of Information Technology, Monash University, P.O. Box 63, Victoria, Australia, 3800
Geoffrey I. Webb

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Wirth, A. (2011). Correlation Clustering. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-30164-8_176

Download citation

DOI: https://doi.org/10.1007/978-0-387-30164-8_176
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-30768-8
Online ISBN: 978-0-387-30164-8
eBook Packages: Computer ScienceReference Module Computer Science and Engineering

Publish with us

Policies and ethics