Skip to main content

Correlation Clustering with Global Weight Bounds

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12976))

Abstract

Given a set of objects and nonnegative real weights expressing “positive” and “negative” feeling of clustering any two objects together, min-disagreement correlation clustering partitions the input object set so as to minimize the sum of the intra-cluster negative-type weights plus the sum of the inter-cluster positive-type weights. Min-disagreement correlation clustering is \(\mathbf {APX}\)-hard, but efficient constant-factor approximation algorithms exist if the weights are bounded in some way. The weight bounds so far studied in the related literature are mostly local, as they are required to hold for every object-pair. In this paper, we introduce the problem of min-disagreement correlation clustering with global weight bounds, i.e., constraints to be satisfied by the input weights altogether. Our main result is a sufficient condition that establishes when any algorithm achieving a certain approximation under the probability constraint keeps the same guarantee on an input that violates the constraint. This extends the range of applicability of the most prominent existing correlation-clustering algorithms, including the popular Pivot, thus providing benefits, both theoretical and practical. Experiments demonstrate the usefulness of our approach, in terms of both worthiness of employing existing efficient algorithms, and guidance on the definition of weights from feature vectors in a task of fair clustering.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    In fact, a probability-constraint-compliant graph \(G'\) can be derived from G in linear time and space (statement (i) of our result). Pivot on \(G'\) yields a 5-approximate clustering [5]. A 5-approximate clustering on \(G'\) is a 5-approximate clustering on G (statement (ii) of our result).

  2. 2.

    Publicly available at http://konect.cc/networks/.

  3. 3.

    Experiments were carried out on the Cresco6 cluster https://www.eneagrid.enea.it.

  4. 4.

    https://archive.ics.uci.edu/ml/index.php.

  5. 5.

    https://www.kaggle.com/sakshigoyal7/credit-card-customers.

  6. 6.

    The average weighted by cluster-size of the per-attribute averages of the Euclidean distances between the frequency attribute vector computed over the set of objects of a cluster and the frequency attribute vector over the whole set of objects [1].

References

  1. Abraham, S.S., Sundaram, S.S.: Fairness in clustering with multiple sensitive attributes. In: Proceedings of EDBT Conference, pp. 287–298 (2020)

    Google Scholar 

  2. Ahmadian, S., et al.: Fair hierarchical clustering. In: Proceedings of NIPS Conference (2020)

    Google Scholar 

  3. Ahmadian, S., Epasto, A., Kumar, R., Mahdian, M.: Clustering without over-representation. In: Proceedings of ACM KDD Conference, pp. 267–275 (2019)

    Google Scholar 

  4. Ahmadian, S., Epasto, A., Kumar, R., Mahdian, M.: Fair correlation clustering. In: Proceedings of AISTATS Conference, pp. 4195–4205 (2020)

    Google Scholar 

  5. Ailon, N., Charikar, M., Newman, A.: Aggregating inconsistent information: ranking and clustering. JACM 55(5), 23:1–23:27 (2008)

    Google Scholar 

  6. Ausiello, G., Marchetti-Spaccamela, A., Crescenzi, P., Gambosi, G., Protasi, M., Kann, V.: Complexity and Approximation: Combinatorial Optimization Problems and Their Approximability Properties. Springer, Heidelberg (1999). https://doi.org/10.1007/978-3-642-58412-1

    Book  MATH  Google Scholar 

  7. Backurs, A., Indyk, P., Onak, K., Schieber, B., Vakilian, A., Wagner, T.: Scalable fair clustering. In: Proceedings of ICML Conference, pp. 405–413 (2019)

    Google Scholar 

  8. Bansal, N., Blum, A., Chawla, S.: Correlation clustering. Mach. Learn. 56(1), 89–113 (2004)

    Article  MathSciNet  Google Scholar 

  9. Bera, S.K., Chakrabarty, D., Flores, N., Negahbani, M.: Fair algorithms for clustering. In: Proceedings of NIPS Conference, pp. 4955–4966 (2019)

    Google Scholar 

  10. Bonchi, F., García-Soriano, D., Liberty, E.: Correlation clustering: from theory to practice. In: Proceedings of ACM KDD Conference, p. 1972 (2014)

    Google Scholar 

  11. Charikar, M., Guruswami, V., Wirth, A.: Clustering with qualitative information. JCSS 71(3), 360–383 (2005)

    MathSciNet  MATH  Google Scholar 

  12. Chawla, S., Makarychev, K., Schramm, T., Yaroslavtsev, G.: Near optimal LP rounding algorithm for correlation clustering on complete and complete k-partite graphs. In: Proceedings of ACM STOC Symposium, pp. 219–228 (2015)

    Google Scholar 

  13. Chen, X., Fain, B., Lyu, L., Munagala, K.: Proportionally fair clustering. In: Proceedings of ICML Conference, pp. 1032–1041 (2019)

    Google Scholar 

  14. Chierichetti, F., Kumar, R., Lattanzi, S., Vassilvitskii, S.: Fair clustering through fairlets. In: Proceedings of NIPS Conference, pp. 5029–5037 (2017)

    Google Scholar 

  15. Crescenzi, P.: A short guide to approximation preserving reductions. In: Proceedings of IEEE CCC Conference, pp. 262–273 (1997)

    Google Scholar 

  16. Demaine, E.D., Emanuel, D., Fiat, A., Immorlica, N.: Correlation clustering in general weighted graphs. TCS 361(2–3), 172–187 (2006)

    Article  MathSciNet  Google Scholar 

  17. Gionis, A., Mannila, H., Tsaparas, P.: Clustering aggregation. ACM TKDD 1(1), 4 (2007)

    Article  Google Scholar 

  18. Kleindessner, M., Awasthi, P., Morgenstern, J.: Fair k-center clustering for data summarization. In: Proceedings of ICML Conference, pp. 3448–3457 (2019)

    Google Scholar 

  19. Kleindessner, M., Samadi, S., Awasthi, P., Morgenstern, J.: Guarantees for spectral clustering with fairness constraints. In: Proceedings of ICML Conference, pp. 3458–3467 (2019)

    Google Scholar 

  20. Kollios, G., Potamias, M., Terzi, E.: Clustering large probabilistic graphs. IEEE TKDE 25(2), 325–336 (2013)

    Google Scholar 

  21. Mandaglio, D., Tagarelli, A., Gullo, F.: In and out: optimizing overall interaction in probabilistic graphs under clustering constraints. In: Proceedings of ACM KDD Conference, pp. 1371–1381 (2020)

    Google Scholar 

  22. Pandove, D., Goel, S., Rani, R.: Correlation clustering methodologies and their fundamental results. Expert. Syst. 35(1), e12229 (2018)

    Article  Google Scholar 

  23. Puleo, G.J., Milenkovic, O.: Correlation clustering with constrained cluster sizes and extended weights bounds. SIAM J. Optim. 25(3), 1857–1872 (2015)

    Article  MathSciNet  Google Scholar 

  24. Rösner, C., Schmidt, M.: Privacy preserving clustering with constraints. In: Proceedings of ICALP Colloquium, vol. 107, pp. 96:1–96:14 (2018)

    Google Scholar 

  25. Shamir, R., Sharan, R., Tsur, D.: Cluster graph modification problems. Discret. Appl. Math. 144(1–2), 173–182 (2004)

    Article  MathSciNet  Google Scholar 

  26. Swamy, C.: Correlation clustering: maximizing agreements via semidefinite programming. In: Proceedings of ACM-SIAM SODA Conference, pp. 526–527 (2004)

    Google Scholar 

  27. van Zuylen, A., Williamson, D.P.: Deterministic algorithms for rank aggregation and other ranking and clustering problems. In: Proceedings of WAOA, pp. 260–273 (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andrea Tagarelli .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mandaglio, D., Tagarelli, A., Gullo, F. (2021). Correlation Clustering with Global Weight Bounds. In: Oliver, N., Pérez-Cruz, F., Kramer, S., Read, J., Lozano, J.A. (eds) Machine Learning and Knowledge Discovery in Databases. Research Track. ECML PKDD 2021. Lecture Notes in Computer Science(), vol 12976. Springer, Cham. https://doi.org/10.1007/978-3-030-86520-7_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-86520-7_31

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-86519-1

  • Online ISBN: 978-3-030-86520-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics