Abstract
Given a set of objects and nonnegative real weights expressing “positive” and “negative” feeling of clustering any two objects together, min-disagreement correlation clustering partitions the input object set so as to minimize the sum of the intra-cluster negative-type weights plus the sum of the inter-cluster positive-type weights. Min-disagreement correlation clustering is \(\mathbf {APX}\)-hard, but efficient constant-factor approximation algorithms exist if the weights are bounded in some way. The weight bounds so far studied in the related literature are mostly local, as they are required to hold for every object-pair. In this paper, we introduce the problem of min-disagreement correlation clustering with global weight bounds, i.e., constraints to be satisfied by the input weights altogether. Our main result is a sufficient condition that establishes when any algorithm achieving a certain approximation under the probability constraint keeps the same guarantee on an input that violates the constraint. This extends the range of applicability of the most prominent existing correlation-clustering algorithms, including the popular Pivot, thus providing benefits, both theoretical and practical. Experiments demonstrate the usefulness of our approach, in terms of both worthiness of employing existing efficient algorithms, and guidance on the definition of weights from feature vectors in a task of fair clustering.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
In fact, a probability-constraint-compliant graph \(G'\) can be derived from G in linear time and space (statement (i) of our result). Pivot on \(G'\) yields a 5-approximate clustering [5]. A 5-approximate clustering on \(G'\) is a 5-approximate clustering on G (statement (ii) of our result).
- 2.
Publicly available at http://konect.cc/networks/.
- 3.
Experiments were carried out on the Cresco6 cluster https://www.eneagrid.enea.it.
- 4.
- 5.
- 6.
The average weighted by cluster-size of the per-attribute averages of the Euclidean distances between the frequency attribute vector computed over the set of objects of a cluster and the frequency attribute vector over the whole set of objects [1].
References
Abraham, S.S., Sundaram, S.S.: Fairness in clustering with multiple sensitive attributes. In: Proceedings of EDBT Conference, pp. 287–298 (2020)
Ahmadian, S., et al.: Fair hierarchical clustering. In: Proceedings of NIPS Conference (2020)
Ahmadian, S., Epasto, A., Kumar, R., Mahdian, M.: Clustering without over-representation. In: Proceedings of ACM KDD Conference, pp. 267–275 (2019)
Ahmadian, S., Epasto, A., Kumar, R., Mahdian, M.: Fair correlation clustering. In: Proceedings of AISTATS Conference, pp. 4195–4205 (2020)
Ailon, N., Charikar, M., Newman, A.: Aggregating inconsistent information: ranking and clustering. JACM 55(5), 23:1–23:27 (2008)
Ausiello, G., Marchetti-Spaccamela, A., Crescenzi, P., Gambosi, G., Protasi, M., Kann, V.: Complexity and Approximation: Combinatorial Optimization Problems and Their Approximability Properties. Springer, Heidelberg (1999). https://doi.org/10.1007/978-3-642-58412-1
Backurs, A., Indyk, P., Onak, K., Schieber, B., Vakilian, A., Wagner, T.: Scalable fair clustering. In: Proceedings of ICML Conference, pp. 405–413 (2019)
Bansal, N., Blum, A., Chawla, S.: Correlation clustering. Mach. Learn. 56(1), 89–113 (2004)
Bera, S.K., Chakrabarty, D., Flores, N., Negahbani, M.: Fair algorithms for clustering. In: Proceedings of NIPS Conference, pp. 4955–4966 (2019)
Bonchi, F., García-Soriano, D., Liberty, E.: Correlation clustering: from theory to practice. In: Proceedings of ACM KDD Conference, p. 1972 (2014)
Charikar, M., Guruswami, V., Wirth, A.: Clustering with qualitative information. JCSS 71(3), 360–383 (2005)
Chawla, S., Makarychev, K., Schramm, T., Yaroslavtsev, G.: Near optimal LP rounding algorithm for correlation clustering on complete and complete k-partite graphs. In: Proceedings of ACM STOC Symposium, pp. 219–228 (2015)
Chen, X., Fain, B., Lyu, L., Munagala, K.: Proportionally fair clustering. In: Proceedings of ICML Conference, pp. 1032–1041 (2019)
Chierichetti, F., Kumar, R., Lattanzi, S., Vassilvitskii, S.: Fair clustering through fairlets. In: Proceedings of NIPS Conference, pp. 5029–5037 (2017)
Crescenzi, P.: A short guide to approximation preserving reductions. In: Proceedings of IEEE CCC Conference, pp. 262–273 (1997)
Demaine, E.D., Emanuel, D., Fiat, A., Immorlica, N.: Correlation clustering in general weighted graphs. TCS 361(2–3), 172–187 (2006)
Gionis, A., Mannila, H., Tsaparas, P.: Clustering aggregation. ACM TKDD 1(1), 4 (2007)
Kleindessner, M., Awasthi, P., Morgenstern, J.: Fair k-center clustering for data summarization. In: Proceedings of ICML Conference, pp. 3448–3457 (2019)
Kleindessner, M., Samadi, S., Awasthi, P., Morgenstern, J.: Guarantees for spectral clustering with fairness constraints. In: Proceedings of ICML Conference, pp. 3458–3467 (2019)
Kollios, G., Potamias, M., Terzi, E.: Clustering large probabilistic graphs. IEEE TKDE 25(2), 325–336 (2013)
Mandaglio, D., Tagarelli, A., Gullo, F.: In and out: optimizing overall interaction in probabilistic graphs under clustering constraints. In: Proceedings of ACM KDD Conference, pp. 1371–1381 (2020)
Pandove, D., Goel, S., Rani, R.: Correlation clustering methodologies and their fundamental results. Expert. Syst. 35(1), e12229 (2018)
Puleo, G.J., Milenkovic, O.: Correlation clustering with constrained cluster sizes and extended weights bounds. SIAM J. Optim. 25(3), 1857–1872 (2015)
Rösner, C., Schmidt, M.: Privacy preserving clustering with constraints. In: Proceedings of ICALP Colloquium, vol. 107, pp. 96:1–96:14 (2018)
Shamir, R., Sharan, R., Tsur, D.: Cluster graph modification problems. Discret. Appl. Math. 144(1–2), 173–182 (2004)
Swamy, C.: Correlation clustering: maximizing agreements via semidefinite programming. In: Proceedings of ACM-SIAM SODA Conference, pp. 526–527 (2004)
van Zuylen, A., Williamson, D.P.: Deterministic algorithms for rank aggregation and other ranking and clustering problems. In: Proceedings of WAOA, pp. 260–273 (2007)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Mandaglio, D., Tagarelli, A., Gullo, F. (2021). Correlation Clustering with Global Weight Bounds. In: Oliver, N., Pérez-Cruz, F., Kramer, S., Read, J., Lozano, J.A. (eds) Machine Learning and Knowledge Discovery in Databases. Research Track. ECML PKDD 2021. Lecture Notes in Computer Science(), vol 12976. Springer, Cham. https://doi.org/10.1007/978-3-030-86520-7_31
Download citation
DOI: https://doi.org/10.1007/978-3-030-86520-7_31
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86519-1
Online ISBN: 978-3-030-86520-7
eBook Packages: Computer ScienceComputer Science (R0)