skip to main content
research-article
Open Access

Scalable Approximate Butterfly and Bi-triangle Counting for Large Bipartite Networks

Published:12 December 2023Publication History
Skip Abstract Section

Abstract

A bipartite graph is a graph that consists of two disjoint sets of vertices and only edges between vertices from different vertex sets. In this paper, we study the counting problems of two common types of em motifs in bipartite graphs: (i) butterflies (2x2 bicliques) and (ii) bi-triangles (length-6 cycles). Unlike most of the existing algorithms that aim to obtain exact counts, our goal is to obtain precise enough estimations of these counts in bipartite graphs, as such estimations are already sufficient and of great usefulness in various applications. While there exist approximate algorithms for butterfly counting, these algorithms are mainly based on the techniques designed for general graphs, and hence, they are less effective on bipartite graphs. Not to mention that there is still a lack of study on approximate bi-triangle counting. Motivated by this, we first propose a novel butterfly counting algorithm, called one-sided weighted sampling, which is tailored for bipartite graphs. The basic idea of this algorithm is to estimate the total butterfly count with the number of butterflies containing two randomly sampled vertices from the same side of the two vertex sets. We prove that our estimation is unbiased, and our technique can be further extended (non-trivially) for bi-triangle count estimation. Theoretical analyses under a power-law random bipartite graph model and extensive experiments on multiple large real datasets demonstrate that our proposed approximate counting algorithms can reach high accuracy, yet achieve up to three orders (resp. four orders) of magnitude speed-up over the state-of-the-art exact butterfly (resp. bi-triangle) counting algorithms. Additionally, we present an approximate clustering coefficient estimation framework for bipartite graphs, which shows a similar speed-up over the exact solutions with less than 1% relative error.

References

  1. 2013. KONECT. http://konect.cc/networks/.Google ScholarGoogle Scholar
  2. 2023. Code and technical report. https://github.com/CUHK-DBGroup/SIGMOD24-Butterfly-Bi-Triangle-Counting.Google ScholarGoogle Scholar
  3. Nesreen K. Ahmed, Nick G. Duffield, Jennifer Neville, and Ramana Rao Kompella. 2014. Graph sample and hold: a framework for big-graph analytics. In KDD. 1446--1455.Google ScholarGoogle Scholar
  4. William Aiello, Fan R. K. Chung, and Linyuan Lu. 2000. A random graph model for massive graphs. In STOC. 171--180.Google ScholarGoogle Scholar
  5. Sinan Aksoy, Tamara G. Kolda, and Ali Pinar. 2017. Measuring and modeling bipartite graphs with community structure. J. Complex Networks 5, 4 (2017), 581--603.Google ScholarGoogle ScholarCross RefCross Ref
  6. Michael J Barber. 2007. Modularity and community detection in bipartite networks. Physical Review E 76, 6 (2007), 066102.Google ScholarGoogle ScholarCross RefCross Ref
  7. Rémi Bardenet and Odalric-Ambrym Maillard. 2015. Concentration inequalities for sampling without replacement. Bernoulli 21, 3 (2015), 1361--1385.Google ScholarGoogle ScholarCross RefCross Ref
  8. Luca Becchetti, Paolo Boldi, Carlos Castillo, and Aristides Gionis. 2008. Efficient semi-streaming algorithms for local triangle counting in massive graphs. In KDD. 16--24.Google ScholarGoogle Scholar
  9. Etienne Birmelé. 2009. A scale-free graph model based on bipartite graphs. Discret. Appl. Math. 157, 10 (2009), 2267--2284.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Stephen P Borgatti and Martin G Everett. 1997. Network analysis of 2-mode data. Social networks 19, 3 (1997), 243--269.Google ScholarGoogle Scholar
  11. Sudarshan S. Chawathe and Hector Garcia-Molina. 1997. Meaningful Change Detection in Structured Data. In SIGMOD. 26--37.Google ScholarGoogle Scholar
  12. Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A Scalable Tree Boosting System. In SIGKDD. 785--794.Google ScholarGoogle Scholar
  13. Xingguang Chen and Sibo Wang. 2021. Efficient Approximate Algorithms for Empirical Entropy and Mutual Information. In SIGMOD. 274--286.Google ScholarGoogle Scholar
  14. Xingguang Chen, Fangyuan Zhang, and Sibo Wang. 2022. Efficient Approximate Algorithms for Empirical Variance with Hashed Block Sampling. In SIGKDD. 157--167.Google ScholarGoogle Scholar
  15. Norishige Chiba and Takao Nishizeki. 1985. Arboricity and subgraph listing algorithms. SIAM Journal on computing 14, 1 (1985), 210--223.Google ScholarGoogle Scholar
  16. Corinna Cortes and Vladimir Vapnik. 1995. Support-Vector Networks. Mach. Learn. 20, 3 (1995), 273--297.Google ScholarGoogle ScholarCross RefCross Ref
  17. Hongbo Deng, Michael R. Lyu, and Irwin King. 2009. A generalized Co-HITS algorithm and its application to bipartite graphs. In SIGKDD. 239--248.Google ScholarGoogle Scholar
  18. Inderjit S. Dhillon. 2001. Co-clustering documents and words using bipartite spectral graph partitioning. In SIGKDD. 269--274.Google ScholarGoogle Scholar
  19. Yixiang Fang, Xin Huang, Lu Qin, Ying Zhang, Wenjie Zhang, Reynold Cheng, and Xuemin Lin. 2020. A survey of community search over big graphs. VLDB J. 29, 1 (2020), 353--392.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Yixiang Fang, Kaiqiang Yu, Reynold Cheng, Laks V. S. Lakshmanan, and Xuemin Lin. 2019. Efficient Algorithms for Densest Subgraph Discovery. Proc. VLDB Endow. 12, 11 (2019), 1719--1732.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Xiaoli Zhang Fern and Carla E. Brodley. 2004. Solving cluster ensemble problems by bipartite graph partitioning. In ICML.Google ScholarGoogle Scholar
  22. Qintian Guo, Sibo Wang, Zhewei Wei, and Ming Chen. 2020. Influence Maximization Revisited: Efficient Reverse Reachable Set Generation with Bound Tightened. In SIGMOD. 2167--2181.Google ScholarGoogle Scholar
  23. Qintian Guo, Sibo Wang, Zhewei Wei, Wenqing Lin, and Jing Tang. 2022. Influence Maximization Revisited: Efficient Sampling with Bound Tightened. ACM Trans. Database Syst. 47, 3 (2022), 12:1--12:45.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Mohammad Al Hasan and Vachik S. Dave. 2018. Triangle counting in large networks: a review. WIREs Data Mining Knowl. Discov. 8, 2 (2018).Google ScholarGoogle Scholar
  25. Paul W Holland and Samuel Leinhardt. 1976. Local structure in social networks. Sociological methodology 7 (1976), 1--45.Google ScholarGoogle Scholar
  26. Bryan Hooi, Hyun Ah Song, Alex Beutel, Neil Shah, Kijung Shin, and Christos Faloutsos. 2016. FRAUDAR: Bounding Graph Fraud in the Face of Camouflage. In SIGKDD. 895--904.Google ScholarGoogle Scholar
  27. Guanhao Hou, Qintian Guo, Fangyuan Zhang, Sibo Wang, and Zhewei Wei. 2023. Personalized PageRank on Evolving Graphs with an Incremental Index-Update Scheme. Proc. ACM Manag. Data 1, 1 (2023), 25:1--25:26.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Xiaocheng Hu, Yufei Tao, and Chin-Wan Chung. 2013. Massive graph triangulation. In SIGMOD. 325--336.Google ScholarGoogle Scholar
  29. Chu-Yi Huang, Yen-Shen Chen, Youn-Long Lin, and Yu-Chin Hsu. 1990. Data Path Allocation Based on Bipartite Weighted Matching. In DAC. IEEE Computer Society Press, 499--504.Google ScholarGoogle Scholar
  30. Xin Huang, Hong Cheng, Lu Qin, Wentao Tian, and Jeffrey Xu Yu. 2014. Querying k-truss community in large and dynamic graphs. In SIGMOD. 1311--1322.Google ScholarGoogle Scholar
  31. Xin Huang, Wei Lu, and Laks V. S. Lakshmanan. 2016. Truss Decomposition of Probabilistic Graphs: Semantics and Algorithms. In SIGMOD. 77--90.Google ScholarGoogle Scholar
  32. Alon Itai. 1977. Finding a Minimum Circuit in a Graph. In STOC. 1--10.Google ScholarGoogle Scholar
  33. Mark Jerrum, Leslie G. Valiant, and Vijay V. Vazirani. 1986. Random Generation of Combinatorial Structures from a Uniform Distribution. Theor. Comput. Sci. 43 (1986), 169--188.Google ScholarGoogle ScholarCross RefCross Ref
  34. Yuli Jiang, Yu Rong, Hong Cheng, Xin Huang, Kangfei Zhao, and Junzhou Huang. 2022. Query Driven-Graph Neural Networks for Community Search: From Non-Attributed, Attributed, to Interactive Attributed. Proc. VLDB Endow. 15, 6 (2022), 1243--1255.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Tamara G. Kolda, Ali Pinar, and C. Seshadhri. 2013. Triadic Measures on Graphs: The Power of Wedge Sampling. In SDM. 10--18.Google ScholarGoogle Scholar
  36. Jérôme Kunegis. 2013. KONECT: the Koblenz network collection. In WWW. 1343--1350.Google ScholarGoogle Scholar
  37. Los Alamos National Laboratory. 2023. Networkx. https://networkx.org/.Google ScholarGoogle Scholar
  38. Matthieu Latapy, Clémence Magnien, and Nathalie Del Vecchio. 2008. Basic notions for the analysis of large two-mode networks. Social networks 30, 1 (2008), 31--48.Google ScholarGoogle Scholar
  39. David D. Lewis, Yiming Yang, Tony G. Rose, and Fan Li. 2004. RCV1: A New Benchmark Collection for Text Categorization Research. J. Mach. Learn. Res. 5 (2004), 361--397.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Feifei Li, Bin Wu, Ke Yi, and Zhuoyue Zhao. 2016. Wander Join: Online Aggregation via Random Walks. In SIGMOD. 615--629.Google ScholarGoogle Scholar
  41. Pedro G Lind, Marta C González, and Hans J Herrmann. 2005. Cycles and clustering in bipartite networks. Physical review E 72, 5 (2005), 056127.Google ScholarGoogle Scholar
  42. Boge Liu, Long Yuan, Xuemin Lin, Lu Qin, Wenjie Zhang, and Jingren Zhou. 2019. Efficient (?,?)-core Computation: an Index-based Approach. In WWW. 1130--1141.Google ScholarGoogle Scholar
  43. Xin Liu and Tsuyoshi Murata. 2009. Community Detection in Large-Scale Bipartite Networks. In Web Intelligence. 50--57.Google ScholarGoogle Scholar
  44. Bingqing Lyu, Lu Qin, Xuemin Lin, Ying Zhang, Zhengping Qian, and Jingren Zhou. 2020. Maximum Biclique Search at Billion Scale. Proc. VLDB Endow. 13, 9 (2020), 1359--1372.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Mohammad Mahdian and Qiqi Yan. 2011. Online bipartite matching with random arrivals: an approach based on strongly factor-revealing LPs. In STOC. 597--606.Google ScholarGoogle Scholar
  46. Charles Masson, Jee E. Rim, and Homin K. Lee. 2019. DDSketch: A Fast and Fully-Mergeable Quantile Sketch with Relative-Error Guarantees. Proc. VLDB Endow. 12, 12 (2019), 2195--2205.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Ron Milo, Shai Shen-Orr, Shalev Itzkovitz, Nadav Kashtan, Dmitri Chklovskii, and Uri Alon. 2002. Network motifs: simple building blocks of complex networks. Science 298, 5594 (2002), 824--827.Google ScholarGoogle Scholar
  48. Tore Opsahl. 2013. Triadic closure in two-mode networks: Redefining the global and local clustering coefficients. Soc. Networks (2013), 159--167.Google ScholarGoogle Scholar
  49. Michael D Ornstein. 1982. Interlocking directorates in Canada: evidence from replacement patterns. Social Networks 4, 1 (1982), 3--25.Google ScholarGoogle ScholarCross RefCross Ref
  50. Rasmus Pagh and Charalampos E. Tsourakakis. 2012. Colorful triangle counting and a MapReduce implementation. Inf. Process. Lett. 112, 7 (2012), 277--281.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Biological network comparison using graphlet degree distribution. Bioinformatics 23, 2 (2007), e177--e183.Google ScholarGoogle Scholar
  52. Pedro Ribeiro, Pedro Paredes, Miguel EP Silva, David Aparicio, and Fernando Silva. 2021. A survey on subgraph counting: concepts, algorithms, and applications to network motifs and graphlets. ACM Computing Surveys (CSUR) 54, 2 (2021), 1--36.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Garry Robins and Malcolm Alexander. 2004. Small Worlds Among Interlocking Directors: Network Structure and Distance in Bipartite Graphs. Comput. Math. Organ. Theory 10, 1 (2004), 69--94.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Boyu Ruan, Junhao Gan, Hao Wu, and Anthony Wirth. 2021. Dynamic Structural Clustering on Graphs. In SIGMOD. 1491--1503.Google ScholarGoogle Scholar
  55. Seyed-Vahid Sanei-Mehri, Ahmet Erdem Sariyüce, and Srikanta Tirthapura. 2018. Butterfly Counting in Bipartite Networks. In SIGKDD. 2150--2159.Google ScholarGoogle Scholar
  56. Seyed-Vahid Sanei-Mehri, Yu Zhang, Ahmet Erdem Sariyüce, and Srikanta Tirthapura. 2019. FLEET: Butterfly Estimation from a Bipartite Graph Stream. In CIKM. 1201--1210.Google ScholarGoogle Scholar
  57. Ahmet Erdem Sariyüce and Ali Pinar. 2018. Peeling Bipartite Networks for Dense Subgraph Discovery. In WSDM. 504--512.Google ScholarGoogle Scholar
  58. Thomas Schank and Dorothea Wagner. 2005. Approximating Clustering Coefficient and Transitivity. J. Graph Algorithms Appl. 9, 2 (2005), 265--275.Google ScholarGoogle ScholarCross RefCross Ref
  59. Nino Shervashidze, S. V. N. Vishwanathan, Tobias Petri, Kurt Mehlhorn, and Karsten M. Borgwardt. 2009. Efficient graphlet kernels for large graph comparison. In AISTATS (JMLR Proceedings, Vol. 5). 488--495.Google ScholarGoogle Scholar
  60. Aida Sheshbolouki and M. Tamer Özsu. 2022. sGrapp: Butterfly Approximation in Streaming Graphs. ACM Trans. Knowl. Discov. Data 16, 4 (2022), 76:1--76:43.Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Jessica Shi and Julian Shun. 2020. Parallel Algorithms for Butterfly Computations. In APOCS. SIAM, 16--30.Google ScholarGoogle Scholar
  62. Julian Shun and Kanat Tangwongsan. 2015. Multicore triangle computations without tuning. In ICDE. 149--160.Google ScholarGoogle Scholar
  63. Jimeng Sun, Huiming Qu, Deepayan Chakrabarti, and Christos Faloutsos. 2005. Neighborhood Formation and Anomaly Detection in Bipartite Graphs. In ICDM. 418--425.Google ScholarGoogle Scholar
  64. Siddharth Suri and Sergei Vassilvitskii. 2011. Counting triangles and the curse of the last reducer. In WWW. 607--614.Google ScholarGoogle Scholar
  65. Amos Tanay, Roded Sharan, and Ron Shamir. 2002. Discovering statistically significant biclusters in gene expression data. Bioinformatics 18, suppl_1 (2002), S136--S144.Google ScholarGoogle ScholarCross RefCross Ref
  66. Youze Tang, Yanchen Shi, and Xiaokui Xiao. 2015. Influence Maximization in Near-Linear Time: A Martingale Approach. In SIGMOD. 1539--1554.Google ScholarGoogle Scholar
  67. Charalampos E. Tsourakakis, U Kang, Gary L. Miller, and Christos Faloutsos. 2009. DOULION: counting triangles in massive graphs with a coin. In SIGKDD. 837--846.Google ScholarGoogle Scholar
  68. Duru Türkoglu and Ata Turk. 2017. Edge-Based Wedge Sampling to Estimate Triangle Counts in Very Large Graphs. In ICDM. 455--464.Google ScholarGoogle Scholar
  69. Johan Ugander, Lars Backstrom, and Jon M. Kleinberg. 2013. Subgraph frequencies: mapping the empirical and extremal geography of large graph collections. In WWW. 1307--1318.Google ScholarGoogle Scholar
  70. Demival Vasques Filho and Dion RJ O'Neale. 2018. Degree distributions of bipartite networks and their projections. Physical Review E 98, 2 (2018), 022307.Google ScholarGoogle ScholarCross RefCross Ref
  71. Alastair J. Walker. 1977. An Efficient Method for Generating Discrete Random Variables with General Distributions. ACM Trans. Math. Softw. 3, 3 (1977), 253--256.Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Jia Wang, Ada Wai-Chee Fu, and James Cheng. 2014. Rectangle Counting in Large Bipartite Graphs. In IEEE International Congress on Big Data. 17--24.Google ScholarGoogle Scholar
  73. Kai Wang, Yiheng Hu, Xuemin Lin, Wenjie Zhang, Lu Qin, and Ying Zhang. 2021. A Cohesive Structure Based Bipartite Graph Analytics System. In CIKM. 4799--4803.Google ScholarGoogle Scholar
  74. Kai Wang, Xuemin Lin, Lu Qin, Wenjie Zhang, and Ying Zhang. 2019. Vertex Priority Based Butterfly Counting for Large-scale Bipartite Networks. Proc. VLDB Endow. (2019), 1139--1152.Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. Kai Wang, Xuemin Lin, Lu Qin, Wenjie Zhang, and Ying Zhang. 2020. Efficient Bitruss Decomposition for Large-scale Bipartite Graphs. In ICDE. 661--672.Google ScholarGoogle Scholar
  76. Kai Wang, Xuemin Lin, Lu Qin, Wenjie Zhang, and Ying Zhang. 2022. Accelerated butterfly counting with vertex priority on bipartite graphs. VLDB J. (2022).Google ScholarGoogle Scholar
  77. Kai Wang, Xuemin Lin, Lu Qin, Wenjie Zhang, and Ying Zhang. 2022. Towards efficient solutions of bitruss decomposition for large-scale bipartite graphs. VLDB J. 31, 2 (2022), 203--226.Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. Sibo Wang, Youze Tang, Xiaokui Xiao, Yin Yang, and Zengxiang Li. 2016. HubPPR: Effective Indexing for Approximate Personalized PageRank. Proc. VLDB Endow. 10, 3 (2016), 205--216.Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. Sibo Wang, Renchi Yang, Xiaokui Xiao, Zhewei Wei, and Yin Yang. 2017. FORA: Simple and Effective Approximate Single-Source Personalized PageRank. In SIGKDD. 505--514.Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. Xiang Wang, Xiangnan He, Meng Wang, Fuli Feng, and Tat-Seng Chua. 2019. Neural Graph Collaborative Filtering. In SIGIR. 165--174.Google ScholarGoogle Scholar
  81. Qingyu Xu, Feng Zhang, Zhiming Yao, Lv Lu, Xiaoyong Du, Dong Deng, and Bingsheng He. 2022. Efficient Load-Balanced Butterfly Counting on GPU. Proc. VLDB Endow. 15, 11 (2022), 2450--2462.Google ScholarGoogle ScholarDigital LibraryDigital Library
  82. Jianye Yang, Yun Peng, and Wenjie Zhang. 2021. (p, q)-biclique Counting and Enumeration for Large Sparse Bipartite Graphs. Proc. VLDB Endow. 15, 2 (2021), 141--153.Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. Yixing Yang, Yixiang Fang, Xuemin Lin, and Wenjie Zhang. 2020. Effective and Efficient Truss Computation over Large Heterogeneous Information Networks. In ICDE. 901--912.Google ScholarGoogle Scholar
  84. Yixing Yang, Yixiang Fang, Maria E. Orlowska, Wenjie Zhang, and Xuemin Lin. 2021. Efficient Bi-triangle Counting for Large Bipartite Networks. Proc. VLDB Endow. (2021), 984--996.Google ScholarGoogle ScholarDigital LibraryDigital Library
  85. Fangyuan Zhang, Mengxu Jiang, and Sibo Wang. 2023. Efficient Dynamic Weighted Set Sampling and Its Extension. Proc. VLDB Endow. 17, 1 (2023), 15--27.Google ScholarGoogle ScholarDigital LibraryDigital Library
  86. Fangyuan Zhang and Sibo Wang. 2022. Effective Indexing for Dynamic Structural Graph Clustering. Proc. VLDB Endow. 15, 11 (2022), 2908--2920.Google ScholarGoogle ScholarDigital LibraryDigital Library
  87. Yun Zhang, Charles A Phillips, Gary L Rogers, Erich J Baker, Elissa J Chesler, and Michael A Langston. 2014. On finding bicliques in bipartite graphs: a novel algorithm and its application to the integration of diverse biological data types. BMC bioinformatics 15, 1 (2014), 1--18.Google ScholarGoogle Scholar
  88. Alexander Zhou, Yue Wang, and Lei Chen. 2021. Butterfly Counting on Uncertain Bipartite Networks. Proc. VLDB Endow. 15, 2 (2021), 211--223.Google ScholarGoogle ScholarDigital LibraryDigital Library
  89. Tao Zhou, Jie Ren and Yi-Cheng Zhang. 2007. Bipartite network projection and personal recommendation. Physical review E 76, 4 (2007), 046115.Google ScholarGoogle Scholar

Index Terms

  1. Scalable Approximate Butterfly and Bi-triangle Counting for Large Bipartite Networks

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image Proceedings of the ACM on Management of Data
      Proceedings of the ACM on Management of Data  Volume 1, Issue 4
      PACMMOD
      December 2023
      1317 pages
      EISSN:2836-6573
      DOI:10.1145/3637468
      • Editor:
      • Divyakant Agrawal
      Issue’s Table of Contents

      Copyright © 2023 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 12 December 2023
      Published in pacmmod Volume 1, Issue 4

      Permissions

      Request permissions about this article.

      Request Permissions

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader