skip to main content
10.1145/2736277.2741629acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Improved Theoretical and Practical Guarantees for Chromatic Correlation Clustering

Published: 18 May 2015 Publication History

Abstract

We study a natural generalization of the correlation clustering problem to graphs in which the pairwise relations between objects are categorical instead of binary. This problem was recently introduced by Bonchi et al. under the name of chromatic correlation clustering, and is motivated by many real-world applications in data-mining and social networks, including community detection, link classification, and entity de-duplication. Our main contribution is a fast and easy-to-implement constant approximation framework for the problem, which builds on a novel reduction of the problem to that of correlation clustering. This result significantly progresses the current state of knowledge for the problem, improving on a previous result that only guaranteed linear approximation in the input size. We complement the above result by developing a linear programming-based algorithm that achieves an improved approximation ratio of 4. Although this algorithm cannot be considered to be practical, it further extends our theoretical understanding of chromatic correlation clustering. We also present a fast heuristic algorithm that is motivated by real-life scenarios in which there is a ground-truth clustering that is obscured by noisy observations. We test our algorithms on both synthetic and real datasets, like social networks data. Our experiments reinforce the theoretical findings by demonstrating that our algorithms generally outperform previous approaches, both in terms of solution cost and reconstruction of an underlying ground-truth clustering.

References

[1]
N. Ailon, N. Avigdor-Elgrabli, E. Liberty, and A. van Zuylen. Improved approximation algorithms for bipartite correlation clustering. SIAM J. Comput., 41(5):1110--1121, 2012.
[2]
N. Ailon, M. Charikar, and A. Newman. Aggregating inconsistent information: Ranking and clustering. J. ACM, 55(5), 2008.
[3]
N. Ailon and E. Liberty. Correlation clustering revisited: The "true" cost of error minimization problems. In 36th ICALP, pages 24--36, 2009.
[4]
A. Arasu, C. Ré, and D. Suciu. Large-scale deduplication with constraints using dedupalog. In 25th ICDE, pages 952--963, 2009.
[5]
L. Backstrom and J. Leskovec. Supervised random walks: predicting and recommending links in social networks. In 4th WSDM, pages 635--644, 2011.
[6]
N. Bansal, A. Blum, and S. Chawla. Correlation clustering. Machine Learning, 56(1-3):89--113, 2004.
[7]
A. Ben-Dor, R. Shamir, and Z. Yakhini. Clustering gene expression patterns. Journal of Computational Biology, 6(3/4):281--297, 1999.
[8]
F. Bonchi, D. García-Soriano, and E. Liberty. Correlation clustering: from theory to practice. In 20th KDD, page 1972, 2014.
[9]
F. Bonchi, A. Gionis, F. Gullo, and A. Ukkonen. Chromatic correlation clustering. In 18th KDD, pages 1321--1329, 2012.
[10]
F. Bonchi, A. Gionis, and A. Ukkonen. Overlapping correlation clustering. In 11th ICDM, pages 51--60, 2011.
[11]
N. Cesa-Bianchi, C. Gentile, F. Vitale, and G. Zappella. A correlation clustering approach to link classification in signed networks. In 25th COLT, pages 34.1--34.20, 2012.
[12]
D. Chakrabarti, R. Kumar, and K. Punera. A graph-theoretic approach to webpage segmentation. In 17th WWW, pages 377--386, 2008.
[13]
M. Charikar, V. Guruswami, and A. Wirth. Clustering with qualitative information. J. Comput. Syst. Sci., 71(3):360--383, 2005.
[14]
Z. Chen, T. Jiang, and G. Lin. Computing phylogenetic roots with bounded degrees and errors. SIAM J. Comput., 32(4):864--879, 2003.
[15]
F. Chierichetti, N. N. Dalvi, and R. Kumar. Correlation clustering in mapreduce. In 20th KDD, pages 641--650, 2014.
[16]
E. D. Demaine, D. Emanuel, A. Fiat, and N. Immorlica. Correlation clustering in general weighted graphs. Theor. Comput. Sci., 361(2-3):172--187, 2006.
[17]
A. K. Elmagarmid, P. G. Ipeirotis, and V. S. Verykios. Duplicate record detection: A survey. IEEE Trans. Knowl. Data Eng., 19(1):1--16, 2007.
[18]
A. Gionis, H. Mannila, and P. Tsaparas. Clustering aggregation. TKDD, 1(1), 2007.
[19]
I. Giotis and V. Guruswami. Correlation clustering with a fixed number of clusters. Theory of Computing, 2(1):249--266, 2006.
[20]
O. Hassanzadeh, F. Chiang, R. J. Miller, and H. C. Lee. Framework for evaluating clustering algorithms in duplicate detection. PVLDB, 2(1):1282--1293, 2009.
[21]
T. Joachims and J. E. Hopcroft. Error bounds for correlation clustering. In 22nd ICML, pages 385--392, 2005.
[22]
D. V. Kalashnikov, Z. Chen, S. Mehrotra, and R. Nuray-Turan. Web people search via connection analysis. IEEE Trans. Knowl. Data Eng., 20(11):1550--1565, 2008.
[23]
M. Karpinski and W. Schudy. Linear time approximation schemes for the gale-berlekamp game and related minimization problems. In 41st STOC, pages 313--322, 2009.
[24]
M. Kivela, A. Arenas, M. Barthelemy, J. P. Gleeson, Y. Moreno, and M. A. Porter. Multilayer networks. arXiv preprint arXiv:1309.7233, 2013.
[25]
D. E. Knuth. The Art of Computer Programming, Volume 2 (3rd Ed.): Seminumerical Algorithms, 2nd Edition. Addison-Wesley Longman Publishing Co., Inc., 1997.
[26]
J. Leskovec, D. P. Huttenlocher, and J. M. Kleinberg. Predicting positive and negative links in online social networks. In 19th WWW, pages 641--650, 2010.
[27]
J. Leskovec and A. Krevl. Snap datasets: Stanford large network dataset collection.
[28]
D. Liben-Nowell and J. M. Kleinberg. The link-prediction problem for social networks. JASIST, 58(7):1019--1031, 2007.
[29]
C. Mathieu, O. Sankur, and W. Schudy. Online correlation clustering. In 27th STACS, pages 573--584, 2010.
[30]
C. Mathieu and W. Schudy. Correlation clustering with noisy input. In 21st SODA, pages 712--728, 2010.
[31]
M. Richardson and P. Domingos. Mining knowledge-sharing sites for viral marketing. In 8th KDD, pages 61--70, 2002.
[32]
E. Sadikov, J. Madhavan, L. Wang, and A. Y. Halevy. Clustering query refinements by user intent. In 19th WWW, pages 841--850, 2010.
[33]
R. Shamir, R. Sharan, and D. Tsur. Cluster graph modification problems. Discrete Applied Mathematics, 144(1-2):173--182, 2004.
[34]
R. Shamir and D. Tsur. Improved algorithms for the random cluster graph model. Random Struct. Algorithms, 31(4):418--449, 2007.
[35]
C. Swamy. Correlation clustering: maximizing agreements via semidefinite programming. In 15th SODA, pages 526--527, 2004.
[36]
P. Tan, M. Steinbach, and V. Kumar. Introduction to Data Mining. Addison-Wesley, 2005.
[37]
L. Tang, X. Wang, and H. Liu. Community detection via heterogeneous interaction analysis. Data Min. Knowl. Discov., 25(1):1--33, 2012.
[38]
A. van Zuylen and D. P. Williamson. Deterministic pivoting algorithms for constrained ranking and clustering problems. Math. Oper. Res., 34(3):594--620, 2009.

Cited By

View all
  • (2024)Overlapping and Robust Edge-Colored Clustering in HypergraphsProceedings of the 17th ACM International Conference on Web Search and Data Mining10.1145/3616855.3635792(143-151)Online publication date: 4-Mar-2024
  • (2022)Correlation Clustering with Sherali-Adams2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS)10.1109/FOCS54457.2022.00068(651-661)Online publication date: Oct-2022
  • (2021)A Color-blind 3-Approximation for Chromatic Correlation Clustering and Improved HeuristicsProceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining10.1145/3447548.3467446(882-891)Online publication date: 14-Aug-2021
  • Show More Cited By

Index Terms

  1. Improved Theoretical and Practical Guarantees for Chromatic Correlation Clustering

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      WWW '15: Proceedings of the 24th International Conference on World Wide Web
      May 2015
      1460 pages
      ISBN:9781450334693

      Sponsors

      • IW3C2: International World Wide Web Conference Committee

      In-Cooperation

      Publisher

      International World Wide Web Conferences Steering Committee

      Republic and Canton of Geneva, Switzerland

      Publication History

      Published: 18 May 2015

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. approximation algorithms
      2. categorical similarity
      3. clustering
      4. edge-labeled graphs

      Qualifiers

      • Research-article

      Conference

      WWW '15
      Sponsor:
      • IW3C2

      Acceptance Rates

      WWW '15 Paper Acceptance Rate 131 of 929 submissions, 14%;
      Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)14
      • Downloads (Last 6 weeks)6
      Reflects downloads up to 20 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Overlapping and Robust Edge-Colored Clustering in HypergraphsProceedings of the 17th ACM International Conference on Web Search and Data Mining10.1145/3616855.3635792(143-151)Online publication date: 4-Mar-2024
      • (2022)Correlation Clustering with Sherali-Adams2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS)10.1109/FOCS54457.2022.00068(651-661)Online publication date: Oct-2022
      • (2021)A Color-blind 3-Approximation for Chromatic Correlation Clustering and Improved HeuristicsProceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining10.1145/3447548.3467446(882-891)Online publication date: 14-Aug-2021
      • (2020)Clustering in graphs and hypergraphs with categorical edge labelsProceedings of The Web Conference 202010.1145/3366423.3380152(706-717)Online publication date: 20-Apr-2020
      • (2016)Edge classification in networks2016 IEEE 32nd International Conference on Data Engineering (ICDE)10.1109/ICDE.2016.7498311(1038-1049)Online publication date: May-2016

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media