Skip to main content
Log in

A benchmarking tool for the generation of bipartite network models with overlapping communities

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Many real-world networks display hidden community structures with important potential implications in their dynamics. Many algorithms highly relevant to network analysis have been introduced to unveil community structures. Accurate assessment and comparison of alternative solutions are typically approached by benchmarking the target algorithm(s) on a set of diverse networks that exhibit a broad range of controlled features, ensuring the assessment contemplates multiple representative properties. Tools have been developed to synthesize bipartite networks, but none of the previous solutions address the issue of generating networks with overlapping community structures. This is the motivation for the BNOC tool introduced in this paper. It allows synthesizing bipartite networks that mimic a wide range of features from real-world networks, including overlapping community structures. Multiple parameters ensure flexibility in controlling the scale and topological properties of the networks and embedded communities. BNOC’s applicability is illustrated assessing and comparing two popular overlapping community detection algorithms on bipartite networks, namely HLC and OSLOM. Results reveal interesting features of the algorithms in this scenario and confirm the relevant role played by a suitable benchmarking tool. Finally, to validate our approach, we present results comparing networks synthesized with BNOC with those obtained with an existing benchmarking tool and with already established sets of synthetic networks, in two different scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

Notes

  1. https://snap.stanford.edu/data/.

  2. http://konect.uni-koblenz.de/.

  3. http://netwiki.amath.unc.edu/SharedData/SharedData.

  4. https://networkdata.ics.uci.edu/.

  5. http://www-personal.umich.edu/~mejn/netdata/.

  6. http://vlado.fmf.uni-lj.si/pub/networks/data/.

  7. https://igraph.org.

  8. https://networkx.github.io/.

  9. http://www.numpy.org/.

  10. https://github.com/junipertcy/det_k_bisbm.

  11. The tool will be made available immediately after paper acceptance.

  12. https://dblp.uni-trier.de/.

References

  1. Ahn YY, Bagrow JP, Lehmann S (2010) Link communities reveal multiscale complexity in networks. Nature 466(7307):761–764

    Article  Google Scholar 

  2. Akoglu L (2014) Quantifying political polarity based on bipartite opinion networks. In: Proceedings of the international AAAI conference on web and social media (AAAI) eighth international AAAI conference on weblogs and social media (ICWSM)

  3. Akoglu L, Faloutsos C (2009) RTG: a recursive realistic graph generator using random typing. Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics) 5781 LNAI(PART 1):13–28

  4. Alessandro M, Vittorio CC (2018) Leveraging the nonuniform PSO network model as a benchmark for performance evaluation in community detection and link prediction. New J Phys 20(6):063,022

    Article  Google Scholar 

  5. Ali AM, Alvari H, Hajibagheri A, Lakkaraj K, Sukthankar G (2014) Synthetic generators for cloning social network data. In: Proceedings of the international conference on social informatics (SocInfo)

  6. Armstrong TG, Ponnekanti V, Borthakur D, Callaghan M (2013) Linkbench : a database benchmark based on the facebook social graph. In: Proceedings of the international conference on management of data (SIGMOD), pp 1185–1196

  7. Barabasi AL, Bonabeau E (2003) Scale-free networks. Sci Am 288(5):60–69

    Article  Google Scholar 

  8. Barber MJ (2007) Modularity and community detection in bipartite networks. Phys Rev E Stat Nonlinear Soft Matter Phys 76(6):1–11

    Article  MathSciNet  Google Scholar 

  9. Barrett CL, Beckman RJ, Khan M, Kumar VSA, Marathe MV, Stretz PE, Dutta T, Lewis B (2009) Generation and analysis of large synthetic social contact networks. In: Proceedings of the winter simulation conference, WSC ’09, pp 1003–1014

  10. Beckett SJ (2016) Improved community detection in weighted bipartite networks. R Soc Open Sci 3(1):140,536

    Article  MathSciNet  Google Scholar 

  11. Birmelé E (2009) A scale-free graph model based on bipartite graphs. Discrete Appl Math 157(10):2267–2284

    Article  MathSciNet  Google Scholar 

  12. Boncz P (2013) LDBC: benchmarks for graph and RDF data management. In: Proceedings of the international database engineering and applications symposium, pp 1–2

  13. Capota M, Hegeman T, Iosup A, Prat-Pérez A, Erling O, Boncz P (2015) Graphalytics: a big data benchmark for graph-processing platforms. In: Proceedings of the graph data management experiences and systems (GRADES), pp 1–6

  14. Chakrabarti D, Zhan Y, Faloutsos C (2004) R-MAT: a recursive model for graph mining. In: Proceedings of the society for industrial and applied mathematics (SIAM) international conference on data mining (SDM), p 5

  15. Cormen TH, Leiserson CE, Rivest RL, Stein C (2009) Introduction to algorithms, 3rd edn. MIT Press, Cambridge

    MATH  Google Scholar 

  16. Cui Y, Wang X (2014) Uncovering overlapping community structures by the key bi-community and intimate degree in bipartite networks. Physica A Stat Mech Appl 407:7–14

    Article  Google Scholar 

  17. Danon L, Díaz-Guilera A, Duch J, Arenas A (2005) Comparing community structure identification. J Stat Mech Theory Exp P09:008

    MATH  Google Scholar 

  18. Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

    MathSciNet  MATH  Google Scholar 

  19. Du N, Wang B, Wu B, Wang Y (2008) Overlapping community detection in bipartite networks. In: Proceedings of the international conference on web intelligence (IEEE/WIC/ACM) (60402011), pp 176–179

  20. Faleiros TP, Rossi RG, de Andrade Lopes A (2017) Optimizing the class information divergence for transductive classification of texts using propagation in bipartite graphs. Pattern Recognit Lett 87(Supplement C):127–138

    Article  Google Scholar 

  21. Fortunato S (2010) Community detection in graphs. Phys Rep 486(3–5):75–174

    Article  MathSciNet  Google Scholar 

  22. Fruchterman TMJ, Reingold EM (1991) Graph drawing by force-directed placement. Softw Pract Exp 21(11):1129–1164

    Article  Google Scholar 

  23. Girvan M, Newman MEJ (2002) Community structure in social and biological networks. Proc Natl Acad Sci USA 99:7821–7826

    Article  MathSciNet  Google Scholar 

  24. Grujić J (2008) Movies recommendation networks as bipartite graphs. In: Proceedings of the international conference on computational science (ICCS). Springer, Berlin, pp 576–583

  25. Hwang T, Sicotte H, Tian Z, Wu B, Kocher JP, Wigle DA, Kumar V, Kuang R (2008) Robust and efficient identification of biomarkers by classifying features on graphs. Bioinformatics 24(18):2023–2029

    Article  Google Scholar 

  26. Jonnalagadda A, Kuppusamy L (2016) A survey on game theoretic models for community detection in social networks. Soc Netw Anal Min 6(1):83

    Article  Google Scholar 

  27. Lancichinetti A, Fortunato S (2009) Community detection algorithms: a comparative analysis. Phys Rev E Stat Nonlinear Soft Matter Phys 80(5):1–11

    Article  Google Scholar 

  28. Lancichinetti A, Fortunato S, Radicchi F (2008) Benchmark graphs for testing community detection algorithms. Phys Rev E Stat Nonlinear Soft Matter Phys 78(4):1–5

    Article  Google Scholar 

  29. Lancichinetti A, Radicchi F, Ramasco JJ, Fortunato S (2011) Finding statistically significant communities in networks. PLoS One 6(4):1–18

    Article  Google Scholar 

  30. Largeron C, Mougel PN, Rabbany R, Zaïane OR (2015) Generating attributed networks with communities. PLoS One 10(4):1–21

    Article  Google Scholar 

  31. Larremore DB, Clauset A, Jacobs AZ (2014) Efficiently inferring community structure in bipartite networks. Phys Rev E 90(012):805

    Google Scholar 

  32. Latapy M, Magnien C, Vecchio ND (2008) Basic notions for the analysis of large two-mode networks. Soc Netw 30(1):31–48

    Article  Google Scholar 

  33. Lehmann S, Schwartz M, Hansen LK (2008) Biclique communities. Phys Rev E Stat Nonlinear Soft Matter Phys 78(1):1–9

    Article  MathSciNet  Google Scholar 

  34. Li Z, Zhang S, Zhang X (2015) Mathematical model and algorithm for link community detection in bipartite networks. Am J Oper Res 5:421–434

    Google Scholar 

  35. McDaid AF, Greene D, Hurley N (2011) Normalized mutual information to evaluate overlapping community finding algorithms. eprint arXiv:1110.2515

  36. Melamed D (2014) Community structures in bipartite networks: a dual-projection approach. PLoS One 9(5):1–5

    Article  Google Scholar 

  37. Moussiades L, Vakali A (2009) Benchmark graphs for the evaluation of clustering algorithms. In: Proceedings of the international conference on research challenges in information science (RCIS), pp 197–206

  38. Nettleton DF (2016) A synthetic data generator for online social network graphs. Soc Netw Anal Min 6(1):44

    Article  Google Scholar 

  39. Newman MEJ (2001a) Scientific collaboration networks. I. Network construction and fundamental results. Phys Rev E 64:016,131

    Article  Google Scholar 

  40. Newman MEJ (2001b) Scientific collaboration networks. II. Shortest paths, weighted networks, and centrality. Phys Rev E 64:016,132

    Article  Google Scholar 

  41. Newman MEJ (2010) Networks: an introduction. Oxford University Press Inc, New York

    Book  Google Scholar 

  42. Pasta MQ, Zaidi F (2016) Leveraging evolution dynamics to generate benchmark complex networks with community structures. eprint arXiv:1606.01169

  43. Pérez-Rosés H, Sebé F (2014) Synthetic generation of social network data with endorsements. eprint arXiv:1411.6273

  44. Pham MD, Boncz P, Erling O (2013) S3G2: A scalable structure-correlated social graph generator. In: Proceedings in selected topics in performance evaluation and benchmarking: 4th TPC technology conference (August)

  45. Rabbany R, Takaffoli M, Fagnan J, Zaïane OR, Campello RJGB (2013) Communities validity: methodical evaluation of community mining algorithms. Soc Netw Anal Min 3(4):1039–1062

    Article  Google Scholar 

  46. Rees BS, Gallagher KB (2012) Overlapping community detection using a community optimized graph swarm. Soc Netw Anal Min 2(4):405–417

    Article  Google Scholar 

  47. Rosvall M, Delvenne JC, Schaub MT, Lambiotte R (2017) Different approaches to community detection. arXiv e-print arXiv:1712.06468

  48. Shi C, Li Y, Zhang J, Sun Y, Philip SY (2017) A survey of heterogeneous information network analysis. IEEE Trans Knowl Data Eng 29(1):17–37

    Article  Google Scholar 

  49. Souam F, Aitelhadj A, Baba-Ali R (2014) Dual modularity optimization for detecting overlapping communities in bipartite networks. Knowl Inf Syst 40(2):455–488

    Article  Google Scholar 

  50. Uslu T, Mehler A (2018) PolyViz: a visualization system for a special kind of multipartite graphs. In: Proceedings of the IEEE VIS 2018

  51. Valejo A, Drury B, Valverde-Rebaza J, de Alneu de Andrade Lopes (2014) Identification of related brazilian portuguese verb groups using overlapping community detection. In: Proceeding of the international conference on computational processing of the Portuguese language. Springer, Cham, pp 292–297

  52. Valejo A, Valverde-Rebaza JC, de Andrade Lopes A (2014) A multilevel approach for overlapping community detection. In: Proceedings of the Brazilian conference on intelligent systems (BRACIS). Springer, Berlin

  53. Valejo A, Oliveira MCRF, Filho GP, Lopes AA (2018) Multilevel approach for combinatorial optimization in bipartite network. Knowl-Based Syst 151:45–61. https://doi.org/10.1016/j.knosys.2018.03.021

    Article  Google Scholar 

  54. Yang Z, Perotti JI, Tessone CJ (2017) Hierarchical benchmark graphs for testing community detection algorithms. Phys Rev E 96(052):311

    Google Scholar 

  55. Zhang ZY, Ahn YY (2015) Community detection in bipartite networks using weighted symmetric binary matrix factorization. Int J Mod Phys C 26:1–14

    MathSciNet  Google Scholar 

  56. Zhong E, Fan W, Zhu Y, Yang Q (2013) Modeling the dynamics of composite social networks. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 937–945

Download references

Acknowledgements

This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior—Brasil (CAPES)—Finance Code 001. This work has been partially supported by the State of São Paulo Research Foundation (FAPESP) Grants 15/14228-9 and 17/05838-3; and the Brazilian Federal Research Council (CNPq) Grants 302645/2015-2 and 301847/2017-7.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alan Valejo.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A HNOC: Extension to k-partite and heterogeneous networks

Appendix A HNOC: Extension to k-partite and heterogeneous networks

As in the case of bipartite networks there is a lack of benchmarking tools to create k-partite and heterogeneous networks for assessing community detection and other algorithms. k-partite networks have k vertex types, rather than just two, with edges connecting only vertices of different types. In heterogeneous networks, this second restriction is dropped, i.e., edges can occur between vertices of the same type. Since they are direct generalizations of bipartite networks, we extended BNOC to synthesize general heterogeneous networks. This extension, called HNOC, can be useful to generate HIN models to support development and validation of new methods. HNOC inherits BNOC’s major features as a flexible and robust resource to synthesize a variety of benchmarking networks with distinct properties in reasonable times.

A heterogeneous information network (HIN) [48] consists of m disjoint subsets of vertices of different types (called layers) and edges connecting these elements. A network \(\mathcal {X}\) with m vertex types can be partitioned into subsets \(X_i = \{ x_{i,1}, \dots , x_{i,n_i} \}\) for each type i. A HIN is represented as a graph \(G = (V, E, W)\), where \(V = \bigcup _{i=1}^m X_i\) with \(m > 2\), E is the set of edges, and W is the set of edge weights.

HINs have a inherently complex structure that can be difficult to handle and visualize. The structure of connections can be described by a network schema [48], which defines a meta-template for the network that describes its vertex and connection types. Given a graph G, its network schema, denoted \(T_G(\mathcal {A}, \mathcal {R})\), is a directed graph defined over element types \(\mathcal {A}\) with edges as relations from \(\mathcal {R}\), obtained via mapping functions \(\varphi : V \rightarrow \mathcal {A}\) and \(\psi : E \rightarrow \mathcal {R}\), respectively. Figure 18 exemplifies network schemas for a bipartite network, a k-partite network and an heterogeneous network.

Fig. 18
figure 18

Network schemas: a bipartite network schema; bk-partite network schema; and c heterogeneous network schema

The schema explicitly identifies the m vertex types and the r connection types. Each connection type is described by the types of the two endpoints and the connection meaning, since pairs of entities can admit multiple types of connections, as in the case of multi-relational networks [56]. Thus, a HIN can be interpreted as a composition of r bipartite (or homogeneous, if both endpoints of the connections are of the same type) networks, as illustrated in Fig. 19.

Fig. 19
figure 19

A HIN described as r bipartite/homogeneous networks

Following this interpretation, we extended BNOC to generate synthetic heterogeneous networks. The extension iterates over each pair of vertex and connection types specified in the network schema, employing similar steps to build the communities in each iteration:

  1. 1.

    Execute m iterations of BNOC’s Steps 1 and 2 to build each layer i with \(V_i\) vertices, set a single community on each layer and introduce the overlapping structures.

  2. 2.

    For each pair of layers specified in the schema, execute BNOC’s Steps 3, 4 and 5 in order to establish the specified connections, weights, density and noise levels.

Since heterogeneous networks have multiple layers and multiple connection types, the extension required modifying some BNOC parameters and introducing a few additional parameters, as described in Table 3. The following parameters were added: the number of layers m and the set of connected layers, henceforth called “schema”, e. Furthermore, the dispersion and noise parameters must be defined for each schema, since each iteration handles a pair of connected layers.

Table 3 Modified and additional parameters in HNOC

Figure 20 illustrates two networks created with distinct parameter combinations. Unless informed otherwise, the parameter settings correspond to the default values informed in Tables 1 and 3. Figure 20a depicts a 4-partite network with some community overlapping.

Fig. 20
figure 20

Heterogeneous networks generated with HNOC presenting distinct topological structures and properties: red squares depict overlapping vertices and colored circles indicate non-overlapping vertices and their assigned community; line widths reflect the corresponding edge weights. a a 4-partite network with \(v=[15,15,15,15]\), \(e=[(0,1), (1,2), (2,3), (3,1)]\) and \(x=[8,3,0,1]\); b a heterogeneous network obtained with settings \(v=[40,25,15]\), \(e=[(0,1), (1,2), (2,2)]\), \(c=[2,2,4]\), and \(d=[0.45, 0.85, 0.15, 0.15]\). The network drawings were obtained based on the technique described by [50] (color figure online)

The network was built so that all layers have the same number of communities, the probability set to produce balanced communities, different numbers of overlapping vertices in each layer, and the same dispersion (edge density) d in all pairs of connected layers. Figure 20b illustrates a 3-partite network with heterogeneous structure and edges between vertices of the same type in one of the layers. The example can describe a hypothetical author-paper-term network, in which authors are connected with their papers, terms are connected with their neighboring terms in the text and with the papers in which they appear. The upper left, central and upper right layers, represent, respectively, the Term, Paper and Author entities. The network has been created so that the connections between the different pairs of entities display different patterns, e.g., there is a dense topology of connections between terms and papers, whereas connections between terms and terms, or between authors and papers are sparser. The schema is inspired in the largely used real-world data of DBLPFootnote 12 (Digital Bibliography & Library Project), a computer science bibliographic dataset that relates documents, authors and terms.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Valejo, A., Góes, F., Romanetto, L. et al. A benchmarking tool for the generation of bipartite network models with overlapping communities. Knowl Inf Syst 62, 1641–1669 (2020). https://doi.org/10.1007/s10115-019-01411-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-019-01411-9

Keywords

Navigation