A benchmarking tool for the generation of bipartite network models with overlapping communities

Valejo, Alan; Góes, Fabiana; Romanetto, Luzia; Ferreira de Oliveira, Maria Cristina; de Andrade Lopes, Alneu

doi:10.1007/s10115-019-01411-9

A benchmarking tool for the generation of bipartite network models with overlapping communities

Regular Paper
Published: 19 October 2019

Volume 62, pages 1641–1669, (2020)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

728 Accesses
4 Citations
Explore all metrics

Abstract

Many real-world networks display hidden community structures with important potential implications in their dynamics. Many algorithms highly relevant to network analysis have been introduced to unveil community structures. Accurate assessment and comparison of alternative solutions are typically approached by benchmarking the target algorithm(s) on a set of diverse networks that exhibit a broad range of controlled features, ensuring the assessment contemplates multiple representative properties. Tools have been developed to synthesize bipartite networks, but none of the previous solutions address the issue of generating networks with overlapping community structures. This is the motivation for the BNOC tool introduced in this paper. It allows synthesizing bipartite networks that mimic a wide range of features from real-world networks, including overlapping community structures. Multiple parameters ensure flexibility in controlling the scale and topological properties of the networks and embedded communities. BNOC’s applicability is illustrated assessing and comparing two popular overlapping community detection algorithms on bipartite networks, namely HLC and OSLOM. Results reveal interesting features of the algorithms in this scenario and confirm the relevant role played by a suitable benchmarking tool. Finally, to validate our approach, we present results comparing networks synthesized with BNOC with those obtained with an existing benchmarking tool and with already established sets of synthetic networks, in two different scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A review and comparative analysis of coarsening algorithms on bipartite networks

Article 07 June 2021

Alan Demétrius Baria Valejo, Wellington de Oliveira dos Santos, … Liang Zhao

An empirical characterization of community structures in complex networks using a bivariate map of quality metrics

Article 10 April 2021

Vinh-Loc Dao, Cécile Bothorel & Philippe Lenca

Overlapping Communities in Bipartite Graphs

Notes

https://snap.stanford.edu/data/.
http://konect.uni-koblenz.de/.
http://netwiki.amath.unc.edu/SharedData/SharedData.
https://networkdata.ics.uci.edu/.
http://www-personal.umich.edu/~mejn/netdata/.
http://vlado.fmf.uni-lj.si/pub/networks/data/.
https://igraph.org.
https://networkx.github.io/.
http://www.numpy.org/.
https://github.com/junipertcy/det_k_bisbm.
The tool will be made available immediately after paper acceptance.
https://dblp.uni-trier.de/.

References

Ahn YY, Bagrow JP, Lehmann S (2010) Link communities reveal multiscale complexity in networks. Nature 466(7307):761–764
Article Google Scholar
Akoglu L (2014) Quantifying political polarity based on bipartite opinion networks. In: Proceedings of the international AAAI conference on web and social media (AAAI) eighth international AAAI conference on weblogs and social media (ICWSM)
Akoglu L, Faloutsos C (2009) RTG: a recursive realistic graph generator using random typing. Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics) 5781 LNAI(PART 1):13–28
Alessandro M, Vittorio CC (2018) Leveraging the nonuniform PSO network model as a benchmark for performance evaluation in community detection and link prediction. New J Phys 20(6):063,022
Article Google Scholar
Ali AM, Alvari H, Hajibagheri A, Lakkaraj K, Sukthankar G (2014) Synthetic generators for cloning social network data. In: Proceedings of the international conference on social informatics (SocInfo)
Armstrong TG, Ponnekanti V, Borthakur D, Callaghan M (2013) Linkbench : a database benchmark based on the facebook social graph. In: Proceedings of the international conference on management of data (SIGMOD), pp 1185–1196
Barabasi AL, Bonabeau E (2003) Scale-free networks. Sci Am 288(5):60–69
Article Google Scholar
Barber MJ (2007) Modularity and community detection in bipartite networks. Phys Rev E Stat Nonlinear Soft Matter Phys 76(6):1–11
Article MathSciNet Google Scholar
Barrett CL, Beckman RJ, Khan M, Kumar VSA, Marathe MV, Stretz PE, Dutta T, Lewis B (2009) Generation and analysis of large synthetic social contact networks. In: Proceedings of the winter simulation conference, WSC ’09, pp 1003–1014
Beckett SJ (2016) Improved community detection in weighted bipartite networks. R Soc Open Sci 3(1):140,536
Article MathSciNet Google Scholar
Birmelé E (2009) A scale-free graph model based on bipartite graphs. Discrete Appl Math 157(10):2267–2284
Article MathSciNet Google Scholar
Boncz P (2013) LDBC: benchmarks for graph and RDF data management. In: Proceedings of the international database engineering and applications symposium, pp 1–2
Capota M, Hegeman T, Iosup A, Prat-Pérez A, Erling O, Boncz P (2015) Graphalytics: a big data benchmark for graph-processing platforms. In: Proceedings of the graph data management experiences and systems (GRADES), pp 1–6
Chakrabarti D, Zhan Y, Faloutsos C (2004) R-MAT: a recursive model for graph mining. In: Proceedings of the society for industrial and applied mathematics (SIAM) international conference on data mining (SDM), p 5
Cormen TH, Leiserson CE, Rivest RL, Stein C (2009) Introduction to algorithms, 3rd edn. MIT Press, Cambridge
MATH Google Scholar
Cui Y, Wang X (2014) Uncovering overlapping community structures by the key bi-community and intimate degree in bipartite networks. Physica A Stat Mech Appl 407:7–14
Article Google Scholar
Danon L, Díaz-Guilera A, Duch J, Arenas A (2005) Comparing community structure identification. J Stat Mech Theory Exp P09:008
MATH Google Scholar
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
MathSciNet MATH Google Scholar
Du N, Wang B, Wu B, Wang Y (2008) Overlapping community detection in bipartite networks. In: Proceedings of the international conference on web intelligence (IEEE/WIC/ACM) (60402011), pp 176–179
Faleiros TP, Rossi RG, de Andrade Lopes A (2017) Optimizing the class information divergence for transductive classification of texts using propagation in bipartite graphs. Pattern Recognit Lett 87(Supplement C):127–138
Article Google Scholar
Fortunato S (2010) Community detection in graphs. Phys Rep 486(3–5):75–174
Article MathSciNet Google Scholar
Fruchterman TMJ, Reingold EM (1991) Graph drawing by force-directed placement. Softw Pract Exp 21(11):1129–1164
Article Google Scholar
Girvan M, Newman MEJ (2002) Community structure in social and biological networks. Proc Natl Acad Sci USA 99:7821–7826
Article MathSciNet Google Scholar
Grujić J (2008) Movies recommendation networks as bipartite graphs. In: Proceedings of the international conference on computational science (ICCS). Springer, Berlin, pp 576–583
Hwang T, Sicotte H, Tian Z, Wu B, Kocher JP, Wigle DA, Kumar V, Kuang R (2008) Robust and efficient identification of biomarkers by classifying features on graphs. Bioinformatics 24(18):2023–2029
Article Google Scholar
Jonnalagadda A, Kuppusamy L (2016) A survey on game theoretic models for community detection in social networks. Soc Netw Anal Min 6(1):83
Article Google Scholar
Lancichinetti A, Fortunato S (2009) Community detection algorithms: a comparative analysis. Phys Rev E Stat Nonlinear Soft Matter Phys 80(5):1–11
Article Google Scholar
Lancichinetti A, Fortunato S, Radicchi F (2008) Benchmark graphs for testing community detection algorithms. Phys Rev E Stat Nonlinear Soft Matter Phys 78(4):1–5
Article Google Scholar
Lancichinetti A, Radicchi F, Ramasco JJ, Fortunato S (2011) Finding statistically significant communities in networks. PLoS One 6(4):1–18
Article Google Scholar
Largeron C, Mougel PN, Rabbany R, Zaïane OR (2015) Generating attributed networks with communities. PLoS One 10(4):1–21
Article Google Scholar
Larremore DB, Clauset A, Jacobs AZ (2014) Efficiently inferring community structure in bipartite networks. Phys Rev E 90(012):805
Google Scholar
Latapy M, Magnien C, Vecchio ND (2008) Basic notions for the analysis of large two-mode networks. Soc Netw 30(1):31–48
Article Google Scholar
Lehmann S, Schwartz M, Hansen LK (2008) Biclique communities. Phys Rev E Stat Nonlinear Soft Matter Phys 78(1):1–9
Article MathSciNet Google Scholar
Li Z, Zhang S, Zhang X (2015) Mathematical model and algorithm for link community detection in bipartite networks. Am J Oper Res 5:421–434
Google Scholar
McDaid AF, Greene D, Hurley N (2011) Normalized mutual information to evaluate overlapping community finding algorithms. eprint arXiv:1110.2515
Melamed D (2014) Community structures in bipartite networks: a dual-projection approach. PLoS One 9(5):1–5
Article Google Scholar
Moussiades L, Vakali A (2009) Benchmark graphs for the evaluation of clustering algorithms. In: Proceedings of the international conference on research challenges in information science (RCIS), pp 197–206
Nettleton DF (2016) A synthetic data generator for online social network graphs. Soc Netw Anal Min 6(1):44
Article Google Scholar
Newman MEJ (2001a) Scientific collaboration networks. I. Network construction and fundamental results. Phys Rev E 64:016,131
Article Google Scholar
Newman MEJ (2001b) Scientific collaboration networks. II. Shortest paths, weighted networks, and centrality. Phys Rev E 64:016,132
Article Google Scholar
Newman MEJ (2010) Networks: an introduction. Oxford University Press Inc, New York
Book Google Scholar
Pasta MQ, Zaidi F (2016) Leveraging evolution dynamics to generate benchmark complex networks with community structures. eprint arXiv:1606.01169
Pérez-Rosés H, Sebé F (2014) Synthetic generation of social network data with endorsements. eprint arXiv:1411.6273
Pham MD, Boncz P, Erling O (2013) S3G2: A scalable structure-correlated social graph generator. In: Proceedings in selected topics in performance evaluation and benchmarking: 4th TPC technology conference (August)
Rabbany R, Takaffoli M, Fagnan J, Zaïane OR, Campello RJGB (2013) Communities validity: methodical evaluation of community mining algorithms. Soc Netw Anal Min 3(4):1039–1062
Article Google Scholar
Rees BS, Gallagher KB (2012) Overlapping community detection using a community optimized graph swarm. Soc Netw Anal Min 2(4):405–417
Article Google Scholar
Rosvall M, Delvenne JC, Schaub MT, Lambiotte R (2017) Different approaches to community detection. arXiv e-print arXiv:1712.06468
Shi C, Li Y, Zhang J, Sun Y, Philip SY (2017) A survey of heterogeneous information network analysis. IEEE Trans Knowl Data Eng 29(1):17–37
Article Google Scholar
Souam F, Aitelhadj A, Baba-Ali R (2014) Dual modularity optimization for detecting overlapping communities in bipartite networks. Knowl Inf Syst 40(2):455–488
Article Google Scholar
Uslu T, Mehler A (2018) PolyViz: a visualization system for a special kind of multipartite graphs. In: Proceedings of the IEEE VIS 2018
Valejo A, Drury B, Valverde-Rebaza J, de Alneu de Andrade Lopes (2014) Identification of related brazilian portuguese verb groups using overlapping community detection. In: Proceeding of the international conference on computational processing of the Portuguese language. Springer, Cham, pp 292–297
Valejo A, Valverde-Rebaza JC, de Andrade Lopes A (2014) A multilevel approach for overlapping community detection. In: Proceedings of the Brazilian conference on intelligent systems (BRACIS). Springer, Berlin
Valejo A, Oliveira MCRF, Filho GP, Lopes AA (2018) Multilevel approach for combinatorial optimization in bipartite network. Knowl-Based Syst 151:45–61. https://doi.org/10.1016/j.knosys.2018.03.021
Article Google Scholar
Yang Z, Perotti JI, Tessone CJ (2017) Hierarchical benchmark graphs for testing community detection algorithms. Phys Rev E 96(052):311
Google Scholar
Zhang ZY, Ahn YY (2015) Community detection in bipartite networks using weighted symmetric binary matrix factorization. Int J Mod Phys C 26:1–14
MathSciNet Google Scholar
Zhong E, Fan W, Zhu Y, Yang Q (2013) Modeling the dynamics of composite social networks. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 937–945

Download references

Acknowledgements

This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior—Brasil (CAPES)—Finance Code 001. This work has been partially supported by the State of São Paulo Research Foundation (FAPESP) Grants 15/14228-9 and 17/05838-3; and the Brazilian Federal Research Council (CNPq) Grants 302645/2015-2 and 301847/2017-7.

Author information

Authors and Affiliations

Institute of Mathematics and Computer Science, University of São Paulo, P.O. Box 668, São Carlos, SP, 14560-970, Brazil
Alan Valejo, Fabiana Góes, Luzia Romanetto, Maria Cristina Ferreira de Oliveira & Alneu de Andrade Lopes

Authors

Alan Valejo
View author publications
You can also search for this author in PubMed Google Scholar
Fabiana Góes
View author publications
You can also search for this author in PubMed Google Scholar
Luzia Romanetto
View author publications
You can also search for this author in PubMed Google Scholar
Maria Cristina Ferreira de Oliveira
View author publications
You can also search for this author in PubMed Google Scholar
Alneu de Andrade Lopes
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alan Valejo.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A HNOC: Extension to k-partite and heterogeneous networks

As in the case of bipartite networks there is a lack of benchmarking tools to create k-partite and heterogeneous networks for assessing community detection and other algorithms. k-partite networks have k vertex types, rather than just two, with edges connecting only vertices of different types. In heterogeneous networks, this second restriction is dropped, i.e., edges can occur between vertices of the same type. Since they are direct generalizations of bipartite networks, we extended BNOC to synthesize general heterogeneous networks. This extension, called HNOC, can be useful to generate HIN models to support development and validation of new methods. HNOC inherits BNOC’s major features as a flexible and robust resource to synthesize a variety of benchmarking networks with distinct properties in reasonable times.

A heterogeneous information network (HIN) [48] consists of m disjoint subsets of vertices of different types (called layers) and edges connecting these elements. A network \(\mathcal {X}\) with m vertex types can be partitioned into subsets \(X_i = \{ x_{i,1}, \dots , x_{i,n_i} \}\) for each type i. A HIN is represented as a graph \(G = (V, E, W)\), where \(V = \bigcup _{i=1}^m X_i\) with \(m > 2\), E is the set of edges, and W is the set of edge weights.

HINs have a inherently complex structure that can be difficult to handle and visualize. The structure of connections can be described by a network schema [48], which defines a meta-template for the network that describes its vertex and connection types. Given a graph G, its network schema, denoted \(T_G(\mathcal {A}, \mathcal {R})\), is a directed graph defined over element types \(\mathcal {A}\) with edges as relations from \(\mathcal {R}\), obtained via mapping functions \(\varphi : V \rightarrow \mathcal {A}\) and \(\psi : E \rightarrow \mathcal {R}\), respectively. Figure 18 exemplifies network schemas for a bipartite network, a k-partite network and an heterogeneous network.

The schema explicitly identifies the m vertex types and the r connection types. Each connection type is described by the types of the two endpoints and the connection meaning, since pairs of entities can admit multiple types of connections, as in the case of multi-relational networks [56]. Thus, a HIN can be interpreted as a composition of r bipartite (or homogeneous, if both endpoints of the connections are of the same type) networks, as illustrated in Fig. 19.

Following this interpretation, we extended BNOC to generate synthetic heterogeneous networks. The extension iterates over each pair of vertex and connection types specified in the network schema, employing similar steps to build the communities in each iteration:

1.
Execute m iterations of BNOC’s Steps 1 and 2 to build each layer i with \(V_i\) vertices, set a single community on each layer and introduce the overlapping structures.
2.
For each pair of layers specified in the schema, execute BNOC’s Steps 3, 4 and 5 in order to establish the specified connections, weights, density and noise levels.

Since heterogeneous networks have multiple layers and multiple connection types, the extension required modifying some BNOC parameters and introducing a few additional parameters, as described in Table 3. The following parameters were added: the number of layers m and the set of connected layers, henceforth called “schema”, e. Furthermore, the dispersion and noise parameters must be defined for each schema, since each iteration handles a pair of connected layers.

Table 3 Modified and additional parameters in HNOC

Full size table

Figure 20 illustrates two networks created with distinct parameter combinations. Unless informed otherwise, the parameter settings correspond to the default values informed in Tables 1 and 3. Figure 20a depicts a 4-partite network with some community overlapping.

The network was built so that all layers have the same number of communities, the probability set to produce balanced communities, different numbers of overlapping vertices in each layer, and the same dispersion (edge density) d in all pairs of connected layers. Figure 20b illustrates a 3-partite network with heterogeneous structure and edges between vertices of the same type in one of the layers. The example can describe a hypothetical author-paper-term network, in which authors are connected with their papers, terms are connected with their neighboring terms in the text and with the papers in which they appear. The upper left, central and upper right layers, represent, respectively, the Term, Paper and Author entities. The network has been created so that the connections between the different pairs of entities display different patterns, e.g., there is a dense topology of connections between terms and papers, whereas connections between terms and terms, or between authors and papers are sparser. The schema is inspired in the largely used real-world data of DBLP^{Footnote 12} (Digital Bibliography & Library Project), a computer science bibliographic dataset that relates documents, authors and terms.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Valejo, A., Góes, F., Romanetto, L. et al. A benchmarking tool for the generation of bipartite network models with overlapping communities. Knowl Inf Syst 62, 1641–1669 (2020). https://doi.org/10.1007/s10115-019-01411-9

Download citation

Received: 24 December 2017
Revised: 24 September 2019
Accepted: 28 September 2019
Published: 19 October 2019
Issue Date: April 2020
DOI: https://doi.org/10.1007/s10115-019-01411-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A benchmarking tool for the generation of bipartite network models with overlapping communities

Abstract

Access this article

Similar content being viewed by others

A review and comparative analysis of coarsening algorithms on bipartite networks

An empirical characterization of community structures in complex networks using a bivariate map of quality metrics

Overlapping Communities in Bipartite Graphs

Notes

References

Acknowledgements