skip to main content
10.1145/3589334.3645374acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

A Similarity-based Approach for Efficient Large Quasi-clique Detection

Published: 13 May 2024 Publication History

Abstract

Identifying dense subgraphs called quasi-cliques is pivotal in various graph mining tasks across domains like biology, social networks, and e-commerce. However, recent algorithms still suffer from efficiency issues when mining large quasi-cliques in massive and complex graphs. Our key insight is that vertices within a quasi-clique exhibit similar neighborhoods to some extent. Based on this, we introduce NBSim and FastNBSim, efficient algorithms that find near-maximum quasi-cliques by exploiting vertex neighborhood similarity. FastNBSim further uses MinHash approximations to reduce the time complexity for similarity computation. Empirical evaluation on 10 real-world graphs shows that our algorithms deliver up to three orders of magnitude speedup versus the state-of-the-art algorithms, while ensuring high-quality quasi-clique extraction.

Supplemental Material

MP4 File
presentation video
MP4 File
Supplemental video

References

[1]
James Abello, Mauricio GC Resende, and Sandra Sudarsky. 2002. Massive quasi-clique detection. In LATIN 2002: Theoretical Informatics: 5th Latin American Symposium Cancun, Mexico, April 3--6, 2002 Proceedings 5. Springer, 598--612.
[2]
Coen Bron and Joep Kerbosch. 1973. Algorithm 457: finding all cliques of an undirected graph. Commun. ACM, Vol. 16, 9 (1973), 575--577.
[3]
Gregory Buehrer and Kumar Chellapilla. 2008. A scalable pattern mining approach to web graph compression with communities. In Proceedings of the 2008 international conference on web search and data mining. 95--106.
[4]
Renato Carmo and Alexandre Züge. 2012. Branch and bound algorithms for the maximum clique problem under a unified framework. Journal of the Brazilian Computer Society, Vol. 18 (2012), 137--151.
[5]
Randy Carraghan and Panos M Pardalos. 1990. An exact algorithm for the maximum clique problem. Operations Research Letters, Vol. 9, 6 (1990), 375--382.
[6]
Lijun Chang. 2019. Efficient maximum clique computation over large sparse graphs. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 529--538.
[7]
Jiejiang Chen, Shaowei Cai, Shiwei Pan, Yiyuan Wang, Qingwei Lin, Mengyu Zhao, and Minghao Yin. 2021. NuQClq: an effective local search algorithm for maximum quasi-clique problem. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 12258--12266.
[8]
James Cheng, Linhong Zhu, Yiping Ke, and Shumo Chu. 2012. Fast algorithms for maximal clique enumeration with limited memory. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. 1240--1248.
[9]
Apurba Das, Seyed-Vahid Sanei-Mehri, and Srikanta Tirthapura. 2018. Shared-memory parallel maximal clique enumeration. In 2018 IEEE 25th International Conference on High Performance Computing (HiPC). IEEE, 62--71.
[10]
Youcef Djeddi, Hacene Ait Haddadene, and Nabil Belacel. 2019. An extension of adaptive multi-start tabu search for the maximum quasi-clique problem. Computers & Industrial Engineering, Vol. 132 (2019), 280--292.
[11]
Alessandro Epasto, Silvio Lattanzi, and Mauro Sozio. 2015. Efficient densest subgraph computation in evolving graphs. In Proceedings of the 24th international conference on world wide web. 300--310.
[12]
David Eppstein, Maarten Löffler, and Darren Strash. 2013. Listing all maximal cliques in large sparse real-world graphs. Journal of Experimental Algorithmics (JEA), Vol. 18 (2013), 3--1.
[13]
Giorgio Gallo, Michael D Grigoriadis, and Robert E Tarjan. 1989. A fast parametric maximum flow algorithm and applications. SIAM J. Comput., Vol. 18, 1 (1989), 30--55.
[14]
Andrew V Goldberg. 1984. Finding a maximum density subgraph. (1984).
[15]
Paul Jaccard. 1912. The distribution of the flora in the alpine zone. 1. New phytologist, Vol. 11, 2 (1912), 37--50.
[16]
Shweta Jain and C Seshadhri. 2017. A fast and provable method for estimating clique counts using turán's theorem. In Proceedings of the 26th international conference on world wide web. 441--449.
[17]
David Knoke and Song Yang. 2019. Social network analysis. SAGE publications.
[18]
Aritra Konar and Nicholas D Sidiropoulos. 2020. Mining large quasi-cliques with quality guarantees from vertex neighborhoods. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 577--587.
[19]
Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford large network dataset collection.
[20]
Chu-Min Li, Zhiwen Fang, and Ke Xu. 2013. Combining MaxSAT reasoning and incremental upper bound for the maximum clique problem. In 2013 IEEE 25th International Conference on Tools with Artificial Intelligence. IEEE, 939--946.
[21]
Chenhao Ma, Yixiang Fang, Reynold Cheng, Laks VS Lakshmanan, and Xiaolin Han. 2022. A convex-programming approach for efficient directed densest subgraph discovery. In Proceedings of the 2022 International Conference on Management of Data. 845--859.
[22]
Chenhao Ma, Yixiang Fang, Reynold Cheng, Laks VS Lakshmanan, Wenjie Zhang, and Xuemin Lin. 2020. Efficient algorithms for densest subgraph discovery on large directed graphs. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 1051--1066.
[23]
Fabrizio Marinelli, Andrea Pizzuti, and Fabrizio Rossi. 2021. LP-based dual bounds for the maximum quasi-clique problem. Discrete Applied Mathematics, Vol. 296 (2021), 118--140.
[24]
Zhuqi Miao and Balabhaskar Balasundaram. 2020. An ellipsoidal bounding scheme for the quasi-clique number of a graph. INFORMS Journal on Computing, Vol. 32, 3 (2020), 763--778.
[25]
Michael Mitzenmacher, Jakub Pachocki, Richard Peng, Charalampos Tsourakakis, and Shen Chen Xu. 2015. Scalable large near-clique detection in large-scale networks via sampling. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 815--824.
[26]
Panos M Pardalos and Jue Xue. 1994. The maximum clique problem. Journal of global Optimization, Vol. 4 (1994), 301--328.
[27]
Jeffrey Pattillo, Alexander Veremyev, Sergiy Butenko, and Vladimir Boginski. 2013. On the maximum quasi-clique problem. Discrete Applied Mathematics, Vol. 161, 1--2 (2013), 244--257.
[28]
Bruno Q Pinto, Celso C Ribeiro, José A Riveaux, and Isabel Rosseti. 2021. A BRKGA-based matheuristic for the maximum quasi-clique problem with an exact local search strategy. RAIRO-Operations Research, Vol. 55 (2021), S741--S763.
[29]
Bruno Q Pinto, Celso C Ribeiro, Isabel Rosseti, and Alexandre Plastino. 2018. A biased random-key genetic algorithm for the maximum quasi-clique problem. European Journal of Operational Research, Vol. 271, 3 (2018), 849--865.
[30]
Celso C Ribeiro and José A Riveaux. 2019. An exact algorithm for the maximum quasi-clique problem. International Transactions in Operational Research, Vol. 26, 6 (2019), 2199--2229.
[31]
Ryan A Rossi, David F Gleich, and Assefaw H Gebremedhin. 2015. Parallel maximum clique algorithms with applications to network analysis. SIAM Journal on Scientific Computing, Vol. 37, 5 (2015), C589--C616.
[32]
Boyu Ruan, Junhao Gan, Hao Wu, and Anthony Wirth. 2021. Dynamic structural clustering on graphs. In Proceedings of the 2021 International Conference on Management of Data. 1491--1503.
[33]
Pablo San Segundo, Alvaro Lopez, and Panos M Pardalos. 2016. A new exact maximum clique algorithm for large and massive sparse graphs. Computers & Operations Research, Vol. 66 (2016), 81--94.
[34]
Nikita Spirin and Jiawei Han. 2012. Survey on web spam detection: principles and algorithms. ACM SIGKDD explorations newsletter, Vol. 13, 2 (2012), 50--64.
[35]
Etsuji Tomita, Akira Tanaka, and Haruhisa Takahashi. 2006. The worst-case time complexity for generating all maximal cliques and computational experiments. Theoretical computer science, Vol. 363, 1 (2006), 28--42.
[36]
Tom Tseng, Laxman Dhulipala, and Julian Shun. 2021. Parallel index-based structural graph clustering and its approximation. In Proceedings of the 2021 International Conference on Management of Data. 1851--1864.
[37]
Charalampos Tsourakakis. 2015. The k-clique densest subgraph problem. In Proceedings of the 24th international conference on world wide web. 1122--1132.
[38]
Charalampos Tsourakakis, Francesco Bonchi, Aristides Gionis, Francesco Gullo, and Maria Tsiarli. 2013. Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. 104--112.
[39]
Takeaki Uno. 2005. Maximal Clique Enumerator (MACE). http://research.nii.ac.jp/ uno/codes.htm.
[40]
Alexander Veremyev, Oleg A Prokopyev, Sergiy Butenko, and Eduardo L Pasiliao. 2016. Exact MIP-based approaches for finding maximum quasi-cliques and dense subgraphs. Computational Optimization and Applications, Vol. 64, 1 (2016), 177--214.
[41]
Stanley Wasserman and Katherine Faust. 1994. Social network analysis: Methods and applications. (1994).
[42]
David R Wood. 1997. An algorithm for finding a maximum clique in a graph. Operations Research Letters, Vol. 21, 5 (1997), 211--217.
[43]
Xiaowei Xu, Nurcan Yuruk, Zhidan Feng, and Thomas AJ Schweiger. 2007. Scan: a structural clustering algorithm for networks. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining. 824--833.
[44]
Yichen Xu, Chenhao Ma, Yixiang Fang, and Zhifeng Bao. 2023. Efficient and Effective Algorithms for Generalized Densest Subgraph Discovery. Proceedings of the ACM on Management of Data, Vol. 1, 2 (2023), 1--27.
[45]
Long Yuan, Lu Qin, Xuemin Lin, Lijun Chang, and Wenjie Zhang. 2016. Diversified top-k clique search. The VLDB Journal, Vol. 25, 2 (2016), 171--196.
[46]
Fangyuan Zhang and Sibo Wang. 2022. Effective indexing for dynamic structural graph clustering. Proceedings of the VLDB Endowment, Vol. 15, 11 (2022), 2908--2920. io

Cited By

View all
  • (2025)Maintaining k-MinHash Signatures over Fully-Dynamic Data Streams with RecoveryProceedings of the Eighteenth ACM International Conference on Web Search and Data Mining10.1145/3701551.3703491(79-87)Online publication date: 10-Mar-2025

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WWW '24: Proceedings of the ACM Web Conference 2024
May 2024
4826 pages
ISBN:9798400701719
DOI:10.1145/3589334
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 May 2024

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. minhash
  2. neighborhoods
  3. quasi-cliques
  4. similarity

Qualifiers

  • Research-article

Conference

WWW '24
Sponsor:
WWW '24: The ACM Web Conference 2024
May 13 - 17, 2024
Singapore, Singapore

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)174
  • Downloads (Last 6 weeks)7
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Maintaining k-MinHash Signatures over Fully-Dynamic Data Streams with RecoveryProceedings of the Eighteenth ACM International Conference on Web Search and Data Mining10.1145/3701551.3703491(79-87)Online publication date: 10-Mar-2025

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media