skip to main content
10.1145/3488560.3498390acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

Finding a Concise, Precise, and Exhaustive Set of Near Bi-Cliques in Dynamic Graphs

Published: 15 February 2022 Publication History

Abstract

A variety of tasks on dynamic graphs, including anomaly detection, community detection, compression, and graph understanding, have been formulated as problems of identifying constituent (near) bi-cliques (i.e., complete bipartite graphs). Even when we restrict our attention to maximal ones, there can be exponentially many near bi-cliques, and thus finding all of them is practically impossible for large graphs. Then, two questions naturally arise: (Q1) What is a ''good'' set of near bi-cliques? That is, given a set of near bi-cliques in the input dynamic graph, how should we evaluate its quality? (Q2) Given a large dynamic graph, how can we rapidly identify a high-quality set of near bi-cliques in it? Regarding Q1, we measure how concisely, precisely, and exhaustively a given set of near bi-cliques describes the input dynamic graph. We combine these three perspectives systematically on the Minimum Description Length principle. Regarding Q2, we propose CutNPeel, a fast search algorithm for a high-quality set of near bi-cliques. By adaptively re-partitioning the input graph, CutNPeel reduces the search space and at the same time improves the search quality. Our experiments using six real-world dynamic graphs demonstrate that CutNPeel is (a) High-quality: providing near bi-cliques of up to 51.2% better quality than its state-of-the-art competitors, (b) Fast: up to 68.8X faster than the next-best competitor, and (c) Scalable: scaling to graphs with 134 million edges. We also show successful applications of CutNPeel to graph compression and pattern discovery.

Supplementary Material

MP4 File (WSDM_CutNPeel.mp4)
Presentation video of the paper 'Finding a Concise, Precise, and Exhaustive Set of Near Bi-Cliques in Dynamic Graphs'

References

[1]
James Abello, Mauricio GC Resende, and Sandra Sudarsky. 2002. Massive quasi-clique detection. In LATIN. Springer.
[2]
Gabriela Alexe, Sorin Alexe, Yves Crama, Stephan Foldes, Peter L Hammer, and Bruno Simeone. 2004. Consensus algorithms for the generation of all maximal bicliques. Discrete Applied Mathematics, Vol. 145, 1 (2004), 11--21.
[3]
Miguel Araujo, Spiros Papadimitriou, Stephan Günnemann, Christos Faloutsos, Prithwish Basu, Ananthram Swami, Evangelos E Papalexakis, and Danai Koutra. 2014. Com2: fast automatic discovery of temporal ('comet') communities. In PAKDD .
[4]
Balabhaskar Balasundaram, Sergiy Butenko, and Illya V Hicks. 2011. Clique relaxations in social network analysis: The maximum k-plex problem. Operations Research, Vol. 59, 1 (2011), 133--142.
[5]
Vladimir Batagelj and Matjaz Zaversnik. 2003. An O (m) algorithm for cores decomposition of networks. arXiv preprint cs/0310049 (2003).
[6]
Andrei Z Broder, Moses Charikar, Alan M Frieze, and Michael Mitzenmacher. 2000. Min-wise independent permutations. JCSS, Vol. 60, 3 (2000), 630--659.
[7]
Dongbo Bu, Yi Zhao, Lun Cai, Hong Xue, Xiaopeng Zhu, Hongchao Lu, Jingfen Zhang, Shiwei Sun, Lunjiang Ling, Nan Zhang, et al. 2003. Topological structure analysis of the protein--protein interaction network in budding yeast. Nucleic Acids Research, Vol. 31, 9 (2003), 2443--2450.
[8]
Moses Charikar. 2000. Greedy approximation algorithms for finding dense components in a graph. In APPROX .
[9]
Norishige Chiba and Takao Nishizeki. 1985. Arboricity and subgraph listing algorithms. SIAM Journal on computing, Vol. 14, 1 (1985), 210--223.
[10]
Apurba Das and Srikanta Tirthapura. 2018. Incremental maintenance of maximal bicliques in a dynamic bipartite graph. TMSCS, Vol. 4, 3 (2018), 231--242.
[11]
The dblp computer science bibliography. 2021. DBLP Data . https://dblp.uni-trier.de/db/
[12]
Vânia MF Dias, Celina MH De Figueiredo, and Jayme L Szwarcfiter. 2005. Generating bicliques of a graph in lexicographic order. Theoretical Computer Science, Vol. 337, 1--3 (2005), 240--248.
[13]
David Eppstein. 1994. Arboricity and bipartite subgraph listing algorithms. Inform. Process. Lett., Vol. 51, 4 (1994), 207--211.
[14]
David Eppstein, Maarten Löffler, and Darren Strash. 2010. Listing all maximal cliques in sparse graphs in near-optimal time. In ISAAC .
[15]
Esther Galbrun. 2020. The minimum description length principle for pattern mining: A survey. arXiv preprint arXiv:2007.14009 (2020).
[16]
Andrew V Goldberg. 1984. Finding a maximum density subgraph .University of California Berkeley.
[17]
Peter D Grünwald. 2007. The minimum description length principle .MIT press.
[18]
Meng Jiang, Alex Beutel, Peng Cui, Bryan Hooi, Shiqiang Yang, and Christos Faloutsos. 2015. A general suspiciousness metric for dense blocks in multimodal data. In ICDM .
[19]
Samir Khuller and Barna Saha. 2009. On finding dense subgraphs. In ICALP .
[20]
Kyle Kloster, Blair D Sullivan, and Andrew van der Poel. 2019. Mining maximal induced bicliques using odd cycle transversals. In SDM .
[21]
Jihoon Ko, Yunbum Kook, and Kijung Shin. 2020. Incremental Lossless Graph Summarization. In KDD .
[22]
Danai Koutra, U Kang, Jilles Vreeken, and Christos Faloutsos. 2014. VoG: Summarizing and understanding large graphs. In SDM .
[23]
Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, and Andrew Tomkins. 1999. Trawling the web for emerging cyber-communities. Computer Networks, Vol. 31, 11--16 (1999), 1481--1493.
[24]
Yongsub Lim, U Kang, and Christos Faloutsos. 2014. Slashburn: Graph compression and mining beyond caveman communities. TKDE, Vol. 26, 12 (2014), 3077--3089.
[25]
Richard P Lippmann, David J Fried, Isaac Graf, Joshua W Haines, Kristopher R Kendall, David McClung, Dan Weber, Seth E Webster, Dan Wyschogrod, Robert K Cunningham, et al. 2000. Evaluating intrusion detection systems: The 1998 DARPA off-line intrusion detection evaluation. In DISCEX .
[26]
Guimei Liu, Kelvin Sim, and Jinyan Li. 2006. Efficient mining of large maximal bicliques. In DaWaK .
[27]
Kazuhisa Makino and Takeaki Uno. 2004. New algorithms for enumerating all maximal cliques. In SWAT .
[28]
Benjamin McClosky and Illya V Hicks. 2012. Combinatorial algorithms for the maximum k-plex problem. Journal of Combinatorial Optimization, Vol. 23, 1 (2012), 29--49.
[29]
Nina Mishra, Dana Ron, and Ram Swaminathan. 2004. A new conceptual clustering framework. Machine Learning, Vol. 56, 1--3 (2004), 115--151.
[30]
Michael Mitzenmacher, Jakub Pachocki, Richard Peng, Charalampos Tsourakakis, and Shen Chen Xu. 2015. Scalable large near-clique detection in large-scale networks via sampling. In KDD .
[31]
John W Moon and Leo Moser. 1965. On cliques in graphs. Israel journal of Mathematics, Vol. 3, 1 (1965), 23--28.
[32]
Esmaeel Moradi and Balabhaskar Balasundaram. 2018. Finding a maximum k-club using the k-clique formulation and canonical hypercube cuts. Optimization Letters, Vol. 12, 8 (2018), 1947--1957.
[33]
Willard V Quine. 1955. A way to simplify truth functions. The American mathematical monthly, Vol. 62, 9 (1955), 627--631.
[34]
Michael J Sanderson, Amy C Driskell, Richard H Ree, Oliver Eulenstein, and Sasha Langley. 2003. Obtaining maximal concatenated phylogenetic data sets from large sequence databases. Molecular biology and evolution, Vol. 20, 7 (2003), 1036--1042.
[35]
Neil Shah, Danai Koutra, Tianmin Zou, Brian Gallagher, and Christos Faloutsos. 2015. Timecrunch: Interpretable dynamic graph summarization. In KDD .
[36]
Jitesh Shetty and Jafar Adibi. 2004. The Enron email dataset database schema and brief statistical report. Information sciences institute technical report, University of Southern California, Vol. 4, 1 (2004), 120--128.
[37]
Hyeonjeong Shin, Taehyung Kwon, Neil Shah, and Kijung Shin. 2021 b. Finding a Concise, Precise, and Exhaustive Set of Near Bi-Cliques in Dynamic Graphs (Supplementary Document) . https://github.com/hyeonjeong1/cutnpeel
[38]
Kijung Shin, Bryan Hooi, and Christos Faloutsos. 2018. Fast, accurate, and flexible algorithms for dense subtensor mining. TKDD, Vol. 12, 3 (2018), 1--30.
[39]
Kijung Shin, Bryan Hooi, Jisu Kim, and Christos Faloutsos. 2017. Densealert: Incremental dense-subtensor detection in tensor streams. In KDD .
[40]
Kijung Shin, Bryan Hooi, Jisu Kim, and Christos Faloutsos. 2021 a. Detecting Group Anomalies in Tera-Scale Multi-Aspect Data via Dense-Subtensor Mining. Frontiers in Big Data, Vol. 3 (2021), 58.
[41]
Kelvin Sim, Jinyan Li, Vivekanand Gopalkrishnan, and Guimei Liu. 2009. Mining maximal quasi-bicliques: Novel algorithm and applications in the stock market and protein networks. Statistical Analysis and Data Mining: The ASA Data Science Journal, Vol. 2, 4 (2009), 255--273.
[42]
Charalampos Tsourakakis. 2015. The k-clique densest subgraph problem. In WWW .
[43]
Shuji Tsukiyama, Mikio Ide, Hiromu Ariyoshi, and Isao Shirakawa. 1977. A new algorithm for generating all the maximal independent sets. SIAM J. Comput., Vol. 6, 3 (1977), 505--517.
[44]
The CAIDA UCSD. 2021. DDoS Attack 2007 . https://www.caida.org/catalog/datasets/ddos-20070804_dataset
[45]
Takeaki Uno. 2010. An efficient algorithm for solving pseudo clique enumeration problem. Algorithmica, Vol. 56, 1 (2010), 3--16.
[46]
Weeplaces. 2021. Weeplaces Data . https://www.yongliu.org/datasets.html
[47]
Yelp. 2021. Yelp Data . https://www.kaggle.com/yelp-dataset/yelp-dataset
[48]
Yi Zhou, Jingwei Xu, Zhenyu Guo, Mingyu Xiao, and Yan Jin. 2020. Enumerating maximal k-plexes with worst-case time guarantee. In AAAI .

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WSDM '22: Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining
February 2022
1690 pages
ISBN:9781450391320
DOI:10.1145/3488560
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 February 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. bi-clique
  2. dynamic graph
  3. graph compression
  4. pattern discovery

Qualifiers

  • Research-article

Funding Sources

Conference

WSDM '22

Acceptance Rates

Overall Acceptance Rate 498 of 2,863 submissions, 17%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 111
    Total Downloads
  • Downloads (Last 12 months)12
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Feb 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media