Abstract
Most networks, such as those generated from social media, tend to evolve gradually with frequent changes in the activity and the interactions of their participants. Furthermore, the communities inside the network can grow, shrink, merge, or split, and the entities can move from one community to another. The aim of community detection methods is precisely to detect the evolution of these communities. However, evaluating these algorithms requires tests on real or artificial networks with verifiable ground truth. Dynamic networks generators have been recently proposed for this task, but most of them consider only the structure of the network, disregarding the characteristics of the nodes. In this paper, we propose a new generator for dynamic attributed networks with community structure that follow the properties of real-world networks. The evolution of the network is performed using two kinds of operations: Micro-operations are applied on the edges and vertices, while macro-operations on the communities. Moreover, the properties of real-world networks such as preferential attachment or homophily are preserved during the evolution of the network, as confirmed by our experiments.
Similar content being viewed by others
References
Akoglu L, Faloutsos C (2009) RTG: a recursive realistic graph generator using random typing. Data Min Knowl Discov 19(2):194–209
Akoglu L et al (2008) RTM: laws and a recursive generator for weighted time-evolving graphs. In: Eighth IEEE international conference on data mining, 2008 (ICDM’08). IEEE, pp 701–706
Albert R, Barabási A-L (2002) Statistical mechanics of complex networks. Rev Mod Phys 74(1):47–97
Amaral LAN et al (2000) Classes of small-world networks. Proc Natl Acad Sci 97(21):11149–11152
Barabási A-L, Albert R (1999) Emergence of scaling in random networks. Science 286(5439):509–512
Benson AR et al (2014) Learning multifractal structure in large networks. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 1326–1335
Chung F, Lu L (2002) The average distances in random graphs with given expected degrees. Proc Natl Acad Sci 99(25):15879–15882
Dang TA (2012) Analysis of communities in social networks. Ph.D. thesis, Université Paris 13
Easley D, Kleinberg J (2010) Networks, crowds and markets: reasoning about a highly connected world. Cambridge University Press, Cambridge
Erdős P, Rényi A (1960) On the evolution of random graphs. Publ Math Inst Hung Acad Sci 5:17–61
Girvan M, Newman ME (2002) Community structure in social and biological networks. Proc Natl Acad Sci 99(12):7821–7826
Gong NZ et al (2012) Evolution of social-attribute networks: measurements, modeling, and implications using Google+. In: ACM conference on internet measurement conference (IMC). ACM, pp 131–144
Görke R et al (2012) An efficient generator for clustered dynamic random networks. Springer, Berlin
Görke R, Staudt C (2009) A generator for dynamic clustered random graphs. Tech. rep., ITI Wagner, Department of Informatics, Universität Karlsruhe. Informatik, Uni Karlsruhe, TR 2009-7
Granell C et al (2015) A benchmark model to assess community structure in evolving networks. CoRR arXiv:1501.05808
Holland PW, Leinhardt S (1971) Transitivity in structural models of small groups. Comp Group Stud 2:107–124
Kim M, Leskovec J (2012) Multiplicative attribute graph model of real-world networks. Internet Math 8(1–2):113–160
Lancichinetti A, Fortunato S (2009) Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities. Phys Rev E 80(1):016118
Lancichinetti A et al (2008) Benchmark graphs for testing community detection algorithms. Phys Rev E 78(4):046110
Largeron C et al (2015) Generating attributed networks with communities. PLoS ONE 10(4):e0122777
Lazarsfeld PF, Merton RK (1954) Friendship as a social process: a substantive and methodological analysis. Freedom Control Mod Soc 18(1):18–66
Leskovec J et al (2008) Microscopic evolution of social networks. In: ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 462–470
Leskovec J et al (2005a) Realistic, mathematically tractable graph generation and evolution, using kronecker multiplication. In: Knowledge discovery in databases: PKDD 2005. Springer, Berlin, pp 133–145
Leskovec J et al (2010) Kronecker graphs: an approach to modeling networks. J Mach Learn Res 11:985–1042
Leskovec J et al (2005b) Graphs over time: densification laws, shrinking diameters and possible explanations. In: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining. ACM, pp 177–187
McPherson M et al (2001) Birds of a feather: homophily in social networks. Annu Rev Sociol 27(1):415–444
Milgram S (1967) The small-world problem. Psychol Today 2:60–67
Newman ME (2006) Finding community structure in networks using the eigenvectors of matrices. Phys Rev E 74(3):036104
Newman MEJ, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69(2):026113
Palla G et al (2010) Multifractal network generator. Proc Natl Acad Sci 107(17):7640–7645
Pfeiffer JJ III et al (2014) Attributed graph models: modeling network structure with correlated attributes. In: Proceedings of the 23rd international conference on World Wide Web. ACM, pp 831–842
Watts DJ, Strogatz SH (1998) Collective dynamics of ‘small-world’ networks. Nature 393(6684):440–442
Wong LH et al (2006) A spatial model for social networks. Phys A Stat Mech Its Appl 360(1):99–120
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix 1: Additional functions
See Table 4.
Appendix 2: User manual
The software DANCer as well as a detailed user manual is available at http://perso.univ-st-etienne.fr/largeron/DANC_Generator/. The user interface of DANCer generator is formed by three views as shown in Fig. 9.
1.1 Graph parameters
The parameters are on the left panel. They correspond to the parameters of algorithms given in Table 2. They are detailed below.
1.1.1 Communities
-
K : Number of communities in the first graph;
-
n : Number of vertices in the first graph;
-
Nb. Rep. : Number of representatives in each community. The higher is the value, the slower is the computation;
-
Theta : Percentage of vertices assigned to a random community. The higher is this value, the less likely the community will be homogeneous w.r.t. the attributes.
1.1.2 Attributes
-
Nb. Attr. : Number of real attributes associated with the vertices. Each attribute is distributed according to centered normal distribution with mean equals to 0;
-
Dev. i : Standard deviation of the ith attribute.
1.1.3 Edges
-
Edges Within : Maximum number of within community edges added to a newly inserted vertex;
-
Edges Between : Maximum number of between community edges added to a newly inserted vertex
-
MTE : Minimum number of edges in the resulting graph (up to a graph where communities are cliques).
1.1.4 Micro-dynamic
-
Proba Micro : The probability to perform a micro-update operation;
-
Add Vertex : The ratio of vertices created at each timestamp. When set to 1, the number of vertices is doubled at each timestamp;
-
Remove Vertex : The ratio of vertices removed at each timestamp;
-
Update Attr. : The ratio of vertices having their attribute values updated;
-
Add Btw. Edges : The ratio of edges inserted connecting two vertices in different communities;
-
Remove Btw. Edges : The ratio of edges removed connecting two vertices in different communities;
-
Add Wth. Edges : The ratio of edges inserted connecting two vertices in the same communities;
-
Remove Wth. Edges : The ratio of edges removed connecting two vertices in the same communities;
1.1.5 Macro-dynamic
-
Timestamps : The number of timestamp (i.e., the number of single graphs generated to form the dynamic network);
-
Proba Merge : The probability to perform a merge operation at a single timestamp (i.e., merging two communities into a single one);
-
Proba Split : The probability to perform a split operation at a single timestamp (i.e., split one community into two)
-
Proba Migrate : The probability to perform a migrate operation at a single timestamp (i.e., migrate vertices from a community to either a new or an existing community).
1.1.6 Network reproduction
-
Seed parameter : A seed is used for the random number generator. It allows to reproduce exactly the same network.
1.2 Graph visualization and manipulation
The central part of the user interface as shown in Fig. 9 allows to display the generated network and the changes in its communities at each time step. Each graph in the sequence can be viewed separately in the Graph tab. The sequence of graphs can also be visualized through the timestamp scrollbar at the right side of the panel.
For each graph plotted, in the Graph View tab, we can set different options (see Fig. 32) allowing, for example, to hide or display the edges and vertices through the Graph View section at the right side panel. The graph can then be displayed with different layout options (kamada-kawai, fruchtman-reynolds or self-organizing map) where the sizes of the plotted vertices are chosen according to their degree, age or community membership. Moreover, we can select or filter the displayed vertices according to their different events, as described in the micro-dynamic operations, from the Select Vertices panel.
In the plotted graph, vertices of the same color are member of the same community. The user can then interactively select or manipulate a vertex (respectively a group of vertices) using the cursor. The informations for each node (id, degree, attributes) are displayed when a vertex is pointed.
The community dynamics (see Figs. 24, 25, 26) are available through the Community Dynamics tabs, in the central part of the user interface. It displays the size and the evolution of the different communities in the sequence of graphs according to the macro-dynamic operations (split, merge and migrate).
1.3 Graph measures
Several measures, listed in Table 3, such as modularity or homophily are computed on each graph of the dynamic network to describe its properties, notably P1, P2, P3, P4 and P5 detailed in Sect. 2. The changes in these different measures on the sequence of graphs is presented at the bottom of the interface as Fig. 9 shows.
1.3.1 Attribute measures
-
Observed homophily : Ratio of edges connecting similar vertices w.r.t. their attribute values;
-
Expected homophily : Ratio of pair of similar vertices among all possible pairs of vertices;
The difference between the expected and observed homophily allows to measure if similar vertices according to the attributes tend to be more connected than dissimilar vertices (cf. P5);
-
Within inertia : Measure of the dispersion of the attribute values inside the communities (cf. P4). A low within inertia indicates that the communities are highly homogeneous with regard to the attribute values;
1.3.2 Structural measures
-
Modularity : gives the partition modularity measure as defined by [28] (cf. P3);
-
Average clustering coefficient : is given as an indication of the transitivity of connections in the network [32];
-
Random clustering coefficient : gives the clustering coefficient in a Erdös–Renyi random graph having the same number of vertices and edges;
The network average clustering coefficient is a measure of the clustering tendency of the network (cf. P3). This observed value can be compared with the expected value computed on a random graph having the same vertex set: An observed value higher than the expected value confirms the community structure;
-
Average degree : the average number of neighbors of the vertices (cf. P1);
-
Average shortest path length : the average minimum number of hops required to reach two arbitrary vertices (cf. P2). It is not computed when the graph is formed by several disconnected components (i.e., \(E^\mathrm{max}_{btw}{}=0\));
-
Diameter : length of the longest shortest path between any pair of vertices (cf. P2);
-
Nb. edges between : number of edges connecting two vertices belonging to different communities;
-
Nb. edges within : number of edges connecting two vertices belonging to the same community (cf. P3);
-
Nb. edges : total number of edges in the graph, i.e., \(\mathcal {E}\).
1.3.3 Degree distribution
The bottom of the user interface includes also a panel displaying the distribution of vertex degrees on each graph of the sequence as shown in Fig. 33.
1.4 Output files
The generated dynamic network can be saved as a collection of files, one for each time step, under the out directory located in the same working directory as the generator. For each graph of the sequence, the file with the extension “.graph” indicates the composition of the graph (vertices and edges), and the “parameters” file enumerates all the parameters used by the generator.
-
Parameters : The parameters are output in a separated file. Each line starts by the parameter name and its value.
-
Vertices : In the graph file, the vertices section starts with the line # Vertices. Each consecutive line describes a vertex. A line consists of an integer corresponding to the vertex id, the list of its attribute values separated by “; ” and an integer corresponding to the vertex community id.
-
Edges : This section starts with the line # Edges. Each consecutive line corresponds to an edge. A line is composed of two vertex ids separated by a “; ”.
-
Measures : the measures are saved in a separated file. Each line gives the measure name and its consecutive values at each time step.
Appendix 3: Benchmark profiles
Table 5 presents a first network (Configuration 1 obtained with parameters given in Table 6), having a good community structure according to the relationships and the attributes and then three other networks in which the link-based structure or the attribute-based structure or the both are weaken. Table 7 presents modifications related to the dynamicity of the first network. The parameter setting is given for each network as well as its characteristics (modularity and within inertia).
Rights and permissions
About this article
Cite this article
Largeron, C., Mougel, P.N., Benyahia, O. et al. DANCer: dynamic attributed networks with community structure generation. Knowl Inf Syst 53, 109–151 (2017). https://doi.org/10.1007/s10115-017-1028-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-017-1028-2