Abstract
A bipartite graph is a graph that consists of two disjoint sets of vertices and only edges between vertices from different vertex sets. In this paper, we study the counting problems of two common types of em motifs in bipartite graphs: (i) butterflies (2x2 bicliques) and (ii) bi-triangles (length-6 cycles). Unlike most of the existing algorithms that aim to obtain exact counts, our goal is to obtain precise enough estimations of these counts in bipartite graphs, as such estimations are already sufficient and of great usefulness in various applications. While there exist approximate algorithms for butterfly counting, these algorithms are mainly based on the techniques designed for general graphs, and hence, they are less effective on bipartite graphs. Not to mention that there is still a lack of study on approximate bi-triangle counting. Motivated by this, we first propose a novel butterfly counting algorithm, called one-sided weighted sampling, which is tailored for bipartite graphs. The basic idea of this algorithm is to estimate the total butterfly count with the number of butterflies containing two randomly sampled vertices from the same side of the two vertex sets. We prove that our estimation is unbiased, and our technique can be further extended (non-trivially) for bi-triangle count estimation. Theoretical analyses under a power-law random bipartite graph model and extensive experiments on multiple large real datasets demonstrate that our proposed approximate counting algorithms can reach high accuracy, yet achieve up to three orders (resp. four orders) of magnitude speed-up over the state-of-the-art exact butterfly (resp. bi-triangle) counting algorithms. Additionally, we present an approximate clustering coefficient estimation framework for bipartite graphs, which shows a similar speed-up over the exact solutions with less than 1% relative error.
- 2013. KONECT. http://konect.cc/networks/.Google Scholar
- 2023. Code and technical report. https://github.com/CUHK-DBGroup/SIGMOD24-Butterfly-Bi-Triangle-Counting.Google Scholar
- Nesreen K. Ahmed, Nick G. Duffield, Jennifer Neville, and Ramana Rao Kompella. 2014. Graph sample and hold: a framework for big-graph analytics. In KDD. 1446--1455.Google Scholar
- William Aiello, Fan R. K. Chung, and Linyuan Lu. 2000. A random graph model for massive graphs. In STOC. 171--180.Google Scholar
- Sinan Aksoy, Tamara G. Kolda, and Ali Pinar. 2017. Measuring and modeling bipartite graphs with community structure. J. Complex Networks 5, 4 (2017), 581--603.Google ScholarCross Ref
- Michael J Barber. 2007. Modularity and community detection in bipartite networks. Physical Review E 76, 6 (2007), 066102.Google ScholarCross Ref
- Rémi Bardenet and Odalric-Ambrym Maillard. 2015. Concentration inequalities for sampling without replacement. Bernoulli 21, 3 (2015), 1361--1385.Google ScholarCross Ref
- Luca Becchetti, Paolo Boldi, Carlos Castillo, and Aristides Gionis. 2008. Efficient semi-streaming algorithms for local triangle counting in massive graphs. In KDD. 16--24.Google Scholar
- Etienne Birmelé. 2009. A scale-free graph model based on bipartite graphs. Discret. Appl. Math. 157, 10 (2009), 2267--2284.Google ScholarDigital Library
- Stephen P Borgatti and Martin G Everett. 1997. Network analysis of 2-mode data. Social networks 19, 3 (1997), 243--269.Google Scholar
- Sudarshan S. Chawathe and Hector Garcia-Molina. 1997. Meaningful Change Detection in Structured Data. In SIGMOD. 26--37.Google Scholar
- Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A Scalable Tree Boosting System. In SIGKDD. 785--794.Google Scholar
- Xingguang Chen and Sibo Wang. 2021. Efficient Approximate Algorithms for Empirical Entropy and Mutual Information. In SIGMOD. 274--286.Google Scholar
- Xingguang Chen, Fangyuan Zhang, and Sibo Wang. 2022. Efficient Approximate Algorithms for Empirical Variance with Hashed Block Sampling. In SIGKDD. 157--167.Google Scholar
- Norishige Chiba and Takao Nishizeki. 1985. Arboricity and subgraph listing algorithms. SIAM Journal on computing 14, 1 (1985), 210--223.Google Scholar
- Corinna Cortes and Vladimir Vapnik. 1995. Support-Vector Networks. Mach. Learn. 20, 3 (1995), 273--297.Google ScholarCross Ref
- Hongbo Deng, Michael R. Lyu, and Irwin King. 2009. A generalized Co-HITS algorithm and its application to bipartite graphs. In SIGKDD. 239--248.Google Scholar
- Inderjit S. Dhillon. 2001. Co-clustering documents and words using bipartite spectral graph partitioning. In SIGKDD. 269--274.Google Scholar
- Yixiang Fang, Xin Huang, Lu Qin, Ying Zhang, Wenjie Zhang, Reynold Cheng, and Xuemin Lin. 2020. A survey of community search over big graphs. VLDB J. 29, 1 (2020), 353--392.Google ScholarDigital Library
- Yixiang Fang, Kaiqiang Yu, Reynold Cheng, Laks V. S. Lakshmanan, and Xuemin Lin. 2019. Efficient Algorithms for Densest Subgraph Discovery. Proc. VLDB Endow. 12, 11 (2019), 1719--1732.Google ScholarDigital Library
- Xiaoli Zhang Fern and Carla E. Brodley. 2004. Solving cluster ensemble problems by bipartite graph partitioning. In ICML.Google Scholar
- Qintian Guo, Sibo Wang, Zhewei Wei, and Ming Chen. 2020. Influence Maximization Revisited: Efficient Reverse Reachable Set Generation with Bound Tightened. In SIGMOD. 2167--2181.Google Scholar
- Qintian Guo, Sibo Wang, Zhewei Wei, Wenqing Lin, and Jing Tang. 2022. Influence Maximization Revisited: Efficient Sampling with Bound Tightened. ACM Trans. Database Syst. 47, 3 (2022), 12:1--12:45.Google ScholarDigital Library
- Mohammad Al Hasan and Vachik S. Dave. 2018. Triangle counting in large networks: a review. WIREs Data Mining Knowl. Discov. 8, 2 (2018).Google Scholar
- Paul W Holland and Samuel Leinhardt. 1976. Local structure in social networks. Sociological methodology 7 (1976), 1--45.Google Scholar
- Bryan Hooi, Hyun Ah Song, Alex Beutel, Neil Shah, Kijung Shin, and Christos Faloutsos. 2016. FRAUDAR: Bounding Graph Fraud in the Face of Camouflage. In SIGKDD. 895--904.Google Scholar
- Guanhao Hou, Qintian Guo, Fangyuan Zhang, Sibo Wang, and Zhewei Wei. 2023. Personalized PageRank on Evolving Graphs with an Incremental Index-Update Scheme. Proc. ACM Manag. Data 1, 1 (2023), 25:1--25:26.Google ScholarDigital Library
- Xiaocheng Hu, Yufei Tao, and Chin-Wan Chung. 2013. Massive graph triangulation. In SIGMOD. 325--336.Google Scholar
- Chu-Yi Huang, Yen-Shen Chen, Youn-Long Lin, and Yu-Chin Hsu. 1990. Data Path Allocation Based on Bipartite Weighted Matching. In DAC. IEEE Computer Society Press, 499--504.Google Scholar
- Xin Huang, Hong Cheng, Lu Qin, Wentao Tian, and Jeffrey Xu Yu. 2014. Querying k-truss community in large and dynamic graphs. In SIGMOD. 1311--1322.Google Scholar
- Xin Huang, Wei Lu, and Laks V. S. Lakshmanan. 2016. Truss Decomposition of Probabilistic Graphs: Semantics and Algorithms. In SIGMOD. 77--90.Google Scholar
- Alon Itai. 1977. Finding a Minimum Circuit in a Graph. In STOC. 1--10.Google Scholar
- Mark Jerrum, Leslie G. Valiant, and Vijay V. Vazirani. 1986. Random Generation of Combinatorial Structures from a Uniform Distribution. Theor. Comput. Sci. 43 (1986), 169--188.Google ScholarCross Ref
- Yuli Jiang, Yu Rong, Hong Cheng, Xin Huang, Kangfei Zhao, and Junzhou Huang. 2022. Query Driven-Graph Neural Networks for Community Search: From Non-Attributed, Attributed, to Interactive Attributed. Proc. VLDB Endow. 15, 6 (2022), 1243--1255.Google ScholarDigital Library
- Tamara G. Kolda, Ali Pinar, and C. Seshadhri. 2013. Triadic Measures on Graphs: The Power of Wedge Sampling. In SDM. 10--18.Google Scholar
- Jérôme Kunegis. 2013. KONECT: the Koblenz network collection. In WWW. 1343--1350.Google Scholar
- Los Alamos National Laboratory. 2023. Networkx. https://networkx.org/.Google Scholar
- Matthieu Latapy, Clémence Magnien, and Nathalie Del Vecchio. 2008. Basic notions for the analysis of large two-mode networks. Social networks 30, 1 (2008), 31--48.Google Scholar
- David D. Lewis, Yiming Yang, Tony G. Rose, and Fan Li. 2004. RCV1: A New Benchmark Collection for Text Categorization Research. J. Mach. Learn. Res. 5 (2004), 361--397.Google ScholarDigital Library
- Feifei Li, Bin Wu, Ke Yi, and Zhuoyue Zhao. 2016. Wander Join: Online Aggregation via Random Walks. In SIGMOD. 615--629.Google Scholar
- Pedro G Lind, Marta C González, and Hans J Herrmann. 2005. Cycles and clustering in bipartite networks. Physical review E 72, 5 (2005), 056127.Google Scholar
- Boge Liu, Long Yuan, Xuemin Lin, Lu Qin, Wenjie Zhang, and Jingren Zhou. 2019. Efficient (?,?)-core Computation: an Index-based Approach. In WWW. 1130--1141.Google Scholar
- Xin Liu and Tsuyoshi Murata. 2009. Community Detection in Large-Scale Bipartite Networks. In Web Intelligence. 50--57.Google Scholar
- Bingqing Lyu, Lu Qin, Xuemin Lin, Ying Zhang, Zhengping Qian, and Jingren Zhou. 2020. Maximum Biclique Search at Billion Scale. Proc. VLDB Endow. 13, 9 (2020), 1359--1372.Google ScholarDigital Library
- Mohammad Mahdian and Qiqi Yan. 2011. Online bipartite matching with random arrivals: an approach based on strongly factor-revealing LPs. In STOC. 597--606.Google Scholar
- Charles Masson, Jee E. Rim, and Homin K. Lee. 2019. DDSketch: A Fast and Fully-Mergeable Quantile Sketch with Relative-Error Guarantees. Proc. VLDB Endow. 12, 12 (2019), 2195--2205.Google ScholarDigital Library
- Ron Milo, Shai Shen-Orr, Shalev Itzkovitz, Nadav Kashtan, Dmitri Chklovskii, and Uri Alon. 2002. Network motifs: simple building blocks of complex networks. Science 298, 5594 (2002), 824--827.Google Scholar
- Tore Opsahl. 2013. Triadic closure in two-mode networks: Redefining the global and local clustering coefficients. Soc. Networks (2013), 159--167.Google Scholar
- Michael D Ornstein. 1982. Interlocking directorates in Canada: evidence from replacement patterns. Social Networks 4, 1 (1982), 3--25.Google ScholarCross Ref
- Rasmus Pagh and Charalampos E. Tsourakakis. 2012. Colorful triangle counting and a MapReduce implementation. Inf. Process. Lett. 112, 7 (2012), 277--281.Google ScholarDigital Library
- Biological network comparison using graphlet degree distribution. Bioinformatics 23, 2 (2007), e177--e183.Google Scholar
- Pedro Ribeiro, Pedro Paredes, Miguel EP Silva, David Aparicio, and Fernando Silva. 2021. A survey on subgraph counting: concepts, algorithms, and applications to network motifs and graphlets. ACM Computing Surveys (CSUR) 54, 2 (2021), 1--36.Google ScholarDigital Library
- Garry Robins and Malcolm Alexander. 2004. Small Worlds Among Interlocking Directors: Network Structure and Distance in Bipartite Graphs. Comput. Math. Organ. Theory 10, 1 (2004), 69--94.Google ScholarDigital Library
- Boyu Ruan, Junhao Gan, Hao Wu, and Anthony Wirth. 2021. Dynamic Structural Clustering on Graphs. In SIGMOD. 1491--1503.Google Scholar
- Seyed-Vahid Sanei-Mehri, Ahmet Erdem Sariyüce, and Srikanta Tirthapura. 2018. Butterfly Counting in Bipartite Networks. In SIGKDD. 2150--2159.Google Scholar
- Seyed-Vahid Sanei-Mehri, Yu Zhang, Ahmet Erdem Sariyüce, and Srikanta Tirthapura. 2019. FLEET: Butterfly Estimation from a Bipartite Graph Stream. In CIKM. 1201--1210.Google Scholar
- Ahmet Erdem Sariyüce and Ali Pinar. 2018. Peeling Bipartite Networks for Dense Subgraph Discovery. In WSDM. 504--512.Google Scholar
- Thomas Schank and Dorothea Wagner. 2005. Approximating Clustering Coefficient and Transitivity. J. Graph Algorithms Appl. 9, 2 (2005), 265--275.Google ScholarCross Ref
- Nino Shervashidze, S. V. N. Vishwanathan, Tobias Petri, Kurt Mehlhorn, and Karsten M. Borgwardt. 2009. Efficient graphlet kernels for large graph comparison. In AISTATS (JMLR Proceedings, Vol. 5). 488--495.Google Scholar
- Aida Sheshbolouki and M. Tamer Özsu. 2022. sGrapp: Butterfly Approximation in Streaming Graphs. ACM Trans. Knowl. Discov. Data 16, 4 (2022), 76:1--76:43.Google ScholarDigital Library
- Jessica Shi and Julian Shun. 2020. Parallel Algorithms for Butterfly Computations. In APOCS. SIAM, 16--30.Google Scholar
- Julian Shun and Kanat Tangwongsan. 2015. Multicore triangle computations without tuning. In ICDE. 149--160.Google Scholar
- Jimeng Sun, Huiming Qu, Deepayan Chakrabarti, and Christos Faloutsos. 2005. Neighborhood Formation and Anomaly Detection in Bipartite Graphs. In ICDM. 418--425.Google Scholar
- Siddharth Suri and Sergei Vassilvitskii. 2011. Counting triangles and the curse of the last reducer. In WWW. 607--614.Google Scholar
- Amos Tanay, Roded Sharan, and Ron Shamir. 2002. Discovering statistically significant biclusters in gene expression data. Bioinformatics 18, suppl_1 (2002), S136--S144.Google ScholarCross Ref
- Youze Tang, Yanchen Shi, and Xiaokui Xiao. 2015. Influence Maximization in Near-Linear Time: A Martingale Approach. In SIGMOD. 1539--1554.Google Scholar
- Charalampos E. Tsourakakis, U Kang, Gary L. Miller, and Christos Faloutsos. 2009. DOULION: counting triangles in massive graphs with a coin. In SIGKDD. 837--846.Google Scholar
- Duru Türkoglu and Ata Turk. 2017. Edge-Based Wedge Sampling to Estimate Triangle Counts in Very Large Graphs. In ICDM. 455--464.Google Scholar
- Johan Ugander, Lars Backstrom, and Jon M. Kleinberg. 2013. Subgraph frequencies: mapping the empirical and extremal geography of large graph collections. In WWW. 1307--1318.Google Scholar
- Demival Vasques Filho and Dion RJ O'Neale. 2018. Degree distributions of bipartite networks and their projections. Physical Review E 98, 2 (2018), 022307.Google ScholarCross Ref
- Alastair J. Walker. 1977. An Efficient Method for Generating Discrete Random Variables with General Distributions. ACM Trans. Math. Softw. 3, 3 (1977), 253--256.Google ScholarDigital Library
- Jia Wang, Ada Wai-Chee Fu, and James Cheng. 2014. Rectangle Counting in Large Bipartite Graphs. In IEEE International Congress on Big Data. 17--24.Google Scholar
- Kai Wang, Yiheng Hu, Xuemin Lin, Wenjie Zhang, Lu Qin, and Ying Zhang. 2021. A Cohesive Structure Based Bipartite Graph Analytics System. In CIKM. 4799--4803.Google Scholar
- Kai Wang, Xuemin Lin, Lu Qin, Wenjie Zhang, and Ying Zhang. 2019. Vertex Priority Based Butterfly Counting for Large-scale Bipartite Networks. Proc. VLDB Endow. (2019), 1139--1152.Google ScholarDigital Library
- Kai Wang, Xuemin Lin, Lu Qin, Wenjie Zhang, and Ying Zhang. 2020. Efficient Bitruss Decomposition for Large-scale Bipartite Graphs. In ICDE. 661--672.Google Scholar
- Kai Wang, Xuemin Lin, Lu Qin, Wenjie Zhang, and Ying Zhang. 2022. Accelerated butterfly counting with vertex priority on bipartite graphs. VLDB J. (2022).Google Scholar
- Kai Wang, Xuemin Lin, Lu Qin, Wenjie Zhang, and Ying Zhang. 2022. Towards efficient solutions of bitruss decomposition for large-scale bipartite graphs. VLDB J. 31, 2 (2022), 203--226.Google ScholarDigital Library
- Sibo Wang, Youze Tang, Xiaokui Xiao, Yin Yang, and Zengxiang Li. 2016. HubPPR: Effective Indexing for Approximate Personalized PageRank. Proc. VLDB Endow. 10, 3 (2016), 205--216.Google ScholarDigital Library
- Sibo Wang, Renchi Yang, Xiaokui Xiao, Zhewei Wei, and Yin Yang. 2017. FORA: Simple and Effective Approximate Single-Source Personalized PageRank. In SIGKDD. 505--514.Google ScholarDigital Library
- Xiang Wang, Xiangnan He, Meng Wang, Fuli Feng, and Tat-Seng Chua. 2019. Neural Graph Collaborative Filtering. In SIGIR. 165--174.Google Scholar
- Qingyu Xu, Feng Zhang, Zhiming Yao, Lv Lu, Xiaoyong Du, Dong Deng, and Bingsheng He. 2022. Efficient Load-Balanced Butterfly Counting on GPU. Proc. VLDB Endow. 15, 11 (2022), 2450--2462.Google ScholarDigital Library
- Jianye Yang, Yun Peng, and Wenjie Zhang. 2021. (p, q)-biclique Counting and Enumeration for Large Sparse Bipartite Graphs. Proc. VLDB Endow. 15, 2 (2021), 141--153.Google ScholarDigital Library
- Yixing Yang, Yixiang Fang, Xuemin Lin, and Wenjie Zhang. 2020. Effective and Efficient Truss Computation over Large Heterogeneous Information Networks. In ICDE. 901--912.Google Scholar
- Yixing Yang, Yixiang Fang, Maria E. Orlowska, Wenjie Zhang, and Xuemin Lin. 2021. Efficient Bi-triangle Counting for Large Bipartite Networks. Proc. VLDB Endow. (2021), 984--996.Google ScholarDigital Library
- Fangyuan Zhang, Mengxu Jiang, and Sibo Wang. 2023. Efficient Dynamic Weighted Set Sampling and Its Extension. Proc. VLDB Endow. 17, 1 (2023), 15--27.Google ScholarDigital Library
- Fangyuan Zhang and Sibo Wang. 2022. Effective Indexing for Dynamic Structural Graph Clustering. Proc. VLDB Endow. 15, 11 (2022), 2908--2920.Google ScholarDigital Library
- Yun Zhang, Charles A Phillips, Gary L Rogers, Erich J Baker, Elissa J Chesler, and Michael A Langston. 2014. On finding bicliques in bipartite graphs: a novel algorithm and its application to the integration of diverse biological data types. BMC bioinformatics 15, 1 (2014), 1--18.Google Scholar
- Alexander Zhou, Yue Wang, and Lei Chen. 2021. Butterfly Counting on Uncertain Bipartite Networks. Proc. VLDB Endow. 15, 2 (2021), 211--223.Google ScholarDigital Library
- Tao Zhou, Jie Ren and Yi-Cheng Zhang. 2007. Bipartite network projection and personal recommendation. Physical review E 76, 4 (2007), 046115.Google Scholar
Index Terms
- Scalable Approximate Butterfly and Bi-triangle Counting for Large Bipartite Networks
Recommendations
Butterfly Counting in Bipartite Networks
KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data MiningWe consider the problem of counting motifs in bipartite affiliation networks, such as author-paper, user-product, and actor-movie relations. We focus on counting the number of occurrences of a "butterfly", a complete 2x2 biclique, the simplest cohesive ...
L(2,1)-labeling of dually chordal graphs and strongly orderable graphs
An L(2,1)-labeling of a graph G=(V,E) is a function f:V(G)->{0,1,2,...} such that |f(u)-f(v)|>=2 whenever uv@__ __E(G) and |f(u)-f(v)|>=1 whenever u and v are at distance two apart. The span of an L(2,1)-labeling f of G, denoted as SP"2(f,G), is the ...
Bipartite subgraphs of triangle-free subcubic graphs
Suppose G is a graph with n vertices and m edges. Let n^' be the maximum number of vertices in an induced bipartite subgraph of G and let m^' be the maximum number of edges in a spanning bipartite subgraph of G. Then b(G)=m^'/m is called the bipartite ...
Comments