Probabilistic analysis of communities and inner roles in networks: Bayesian generative models and approximate inference

Costa, Gianni; Ortale, Riccardo

doi:10.1007/s13278-013-0130-z

Probabilistic analysis of communities and inner roles in networks: Bayesian generative models and approximate inference

Original Article
Published: 31 August 2013

Volume 3, pages 1015–1038, (2013)
Cite this article

Social Network Analysis and Mining Aims and scope Submit manuscript

Gianni Costa¹ &
Riccardo Ortale¹

251 Accesses
11 Citations
Explore all metrics

Abstract

Two fundamental tasks in network analysis are community discovery and role assignment. Hitherto, these have been conducted separately. We argue that their integration provides a deeper understanding of connectivity patterns and present unsupervised learning approaches to the exploratory analysis of communities and inner roles of nodes across their interactions in directed networks. In particular, we propose two Bayesian probabilistic models of network interactions that seamlessly integrate community discovery and role assignment. One is the model of a generative process, in which pairs of nodes in a network are associated with communities and roles in the context of their communities; before that an interaction is possibly established between them. According to the generative semantics of such a model, nodes are represented as probability distributions over communities, while communities are viewed as probability distributions over roles. The other model specifies a generative process based on link partitioning, that associates pairs of nodes with respective roles and then assigns their possible interaction to one link community. This is accomplished by representing nodes as probability distributions over roles and, additionally, by explicitly modeling how roles interact with each other. The foresaid distributions are unknown parameters of the proposed models that are estimated from the observed network interactions through suitable Markov Chain Monte Carlo algorithms for approximated posterior inference and parameter estimation. One model overcomes state-of-the-art competitors in link-prediction over real-world networks. The other exhibits a competitive predictive power. Both models overcome an established probabilistic competitor at identifying relatively recognizable structure in synthetic networks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Emergence in complex networks of simple agents

Article Open access 23 May 2023

Complex Networks: a Mini-review

Article 13 July 2020

Centrality measures in networks

Article 24 April 2023

References

Aggarwal C (ed) (2011) Social network data analytics. Springer, Berlin
Ahn YY, Bagrow JP, Lehmann S (2010) Link communities reveal multiscale complexity in networks. Nature 466:761–764
Article Google Scholar
Airoldi EM, Blei DM, Fienberg SE, Xing EP (2008) Mixed membership stochastic blockmodels. J Mach Learn Res 9:1981–2014
MATH Google Scholar
Andrieu C, De Freitas N, Doucet A, Jordan MI (2003) An introduction to MCMC for machine learning. Mach Learn 50(1-2):5–43
Article MATH Google Scholar
Ball B, Karrer B, Newman MEJ(2011) An efficient and principled method for detecting communities in networks. Phys Rev E 84:036103
Google Scholar
Barbieri N, Costa G, Manco G, Ortale R (2011) Modeling item selection and relevance for accurate recommendations: a bayesian approach. In: Proceedings of ACM conference on recommender systems, pp 21–28
Barbieri N, Manco G, Ortale R, Ritacco E. (2012) Balancing prediction and recommendation accuracy: hierarchical latent factors for preference data. In: Proceedings of SIAM international conference on data mining, pp 1035–1046
Bickel PJ, Chen A (2009) A nonparametric view of network models and Newman–Girvan and other modularities. Proc Natl Acad Sci 106:21068–21073
Google Scholar
Bishop CM (2006) Pattern recognition and machine learning. Information Science and Statistics. Springer, New York, Secaucus, NJ. ISBN 0387310738
Blei D, Lafferty J (2009) Topic models. In: Srivastava A, Sahami M (eds) Text mining: classification, clustering, and applications. Chapman and Hall/CRC Data Mining and Knowledge Discovery Series, pp71 – 94
Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022
MATH Google Scholar
Box GEP, Tiao GC (1992) Bayesian inference in statistical analysis. Wiley-Interscience, New York
Chatterjee N, Sinha S (2008) Understanding the mind of a worm: hierarchical network structure underlying nervous system function in C. elegans. In: Banerjee R, Chakrabarti BK (eds) Progress in brain research. Elsevier, Amsterdam, 145–153
Chou B-H, Suzuki E (2010) Discovering community-oriented roles of nodes in a social network. In: Proceedings of international conference on data warehousing and knowledge discovery, pp 52–64
Costa G, Ortale R (2012) A bayesian hierarchical approach for exploratory analysis of communities and roles in social networks. In: Proceedings of the IEEE/ACM international conference on advances in social networks analysis and mining, pp 194–201
Creamer G, Rowe R, Hershkop S, Stolfo SJ (2009) Segmentation and Automated Social Hierarchy Detection through Email Network Analysis. In: Zhang H, Spiliopoulou M, Mobasher B (eds)Advances in web mining and web usage analysis. Springer, Berlin, pp 40–58
Danon L, Duch J, Arenas A, Dfaz-Guilera A (2005) Comparing community structure identification. J Stat Mech Theory Exp 09008
Easley D, Kleinberg J (2010) Networks, crowds, and markets: reasoning about a highly connected world. Cambridge University Press, Cambridge
Evans TS (2010) Clique graphs and overlapping communities. J Stat Mech 12037
Evans TS, Lambiotte R (2009) Line graphs, line partitions and overlapping communities. Phys Rev E Stat Phys Plasmas Fluids 80:016105
Google Scholar
Evans TS, Lambiotte R (2010) Line graphs of weighted networks for overlapping communities. Eur Phys J B 77(2):265–272
Article Google Scholar
Fortunato S (2010) Community detection in graphs. Phys Rep 486(3–5):75–174
Article MathSciNet Google Scholar
Fortunato S, Barthélemy M (2007) Resolution limit in community detection. Proc Natl Acad Sci 104(1):36–41
Article Google Scholar
Freeman LC (1977) A set of measures of centrality based upon betweenness. Sociometry 40(1):35–41
Article Google Scholar
Gelman A, Carlin JB, Stern HS, Rubin DB, Dunson DB (2013) Bayesian data analysis. Chapman and Hall/CRC, London
Girvan M, Newman MEJ (2002) Community structure in social and biological networks. Proc Natl Acad Sci 99(12):7821–7826
Article MathSciNet MATH Google Scholar
Heinrich G (2008) Parameter estimation for text analysis. Technical report, University of Leipzig. Available at http://www.arbylon.net/publications/text-est.pdf
Henderson K, Eliassi Rad T (2009) Applying latent Dirichlet allocation to group discovery in large graphs. In: Proceedings of ACM symposium on applied computing, pp 1456–1461
Henderson K, Eliassi-Rad T, Papadimitriou S, Faloutsos C (2010) Hcdf: s hybrid community discovery framework. In: Proceedings of SIAM international conference data mining, pp 754–765
Kernighan BW, Lin S (1970) An efficient heuristic procedure for partitioning graphs. Bell Syst Tech J 49(1):291–307
Article MATH Google Scholar
Kim Y, Jeong H (2011) Map equation for link community. Phys Rev E Stat Phys Plasmas Fluids 84:026110
Google Scholar
Kolaczyk ED (2009) Statistical analysis of network data. Springer, Berlin
Lancichinetti A, Fortunato S (2009a) Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities. Phys Rev E Stat Phys Plasmas Fluids 80(1):016118
Google Scholar
Lancichinetti A, Fortunato S (2009b) Community detection algorithms: a comparative analysis. Phys Rev E 80:056117
Google Scholar
Leskovec J, Lang KJ, Mahoney M (2010) Empirical comparison of algorithms for network community detection. In: Proceedings of international conference on World Wide Web, pp 631–640
Liu JS (2001) Monte Carlo strategies in scientific computing. Springer, Berlin
Lorrain F, White HC (1971) The structural equivalence of individuals in social networks. J Math Sociol 1:49–80
Google Scholar
McCallum A, Wang X, Corrada-Emmanuel A (2007) Topic and role discovery in social networks with experiments on Enron and academic email. J Artif Intell Res 30(1):249–272
Google Scholar
Kourtellis N, Alahakoon T, Simha R, Iamnitchi A, Tripathi R (2013) Identifying high betweenness centrality nodes in large social networks. Soc Netw Anal Min (online first list)
Neal RM (1993) Probabilistic inference using Markov chain Monte Carlo methods. Technical report
Newman MEJ (2004a) Detecting community structure in networks. Eur Phys J B 38(2):321–330
Article Google Scholar
Newman MEJ (2004b) Fast algorithm for detecting community structure in networks. Phys Rev E 69:066133
Google Scholar
Newman MEJ, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69(2):026113
Google Scholar
Newman MEJ, Leicht EA (2007) Mixture models and exploratory analysis in networks. Proc Natl Acad Sci USA 104:9564–9569
Article MATH Google Scholar
Newman M, Barabási A-L, Watts DJ (2006) The structure and dynamics of networks. Princeton University Press, Princeton
Pathak N, Delong C, Banerjee A, Erickson K (2008) Social topic models for community extraction. In: Proceedings of KDD workshop on social network mining and analysis
Porter MA, Onnela J-P, Mucha PJ (2009) Communities in networks. Notices Am Math Soc 56(9):1082–1166
MathSciNet MATH Google Scholar
Pothen A, Simon HD, Liou K-P (1990) Partitioning sparse matrices with eigenvectors of graphs. SIAM J Matrix Anal Appl 11(3):430–452
Article MathSciNet MATH Google Scholar
Radicchi F, Castellano C, Cecconi F, Loreto V, Parisi D (2004) Defining and identifying communities in networks. Proc Natl Acad Sci USA 101(9):2658–2663
Article Google Scholar
Resnik P, Hardisty E (2010) Gibbs sampling for the uninitiated. Technical report, Computer Science Department, University of Maryland, 2010. Available at http://hdl.handle.net/1903/10058
Scott J (2000) Social network analysis. SAGE, London
Scripps J, Tan P-N, Esfahanian A-H (2007a) Exploration of link structure and community-based node roles in network analysis. Proceedings of international conference on data mining, pp 649–654
Scripps J, Tan P-N, Esfahanian A-H (2007b) Node roles and community structure in networks. In: Proceedings of workshop on web mining and social network analysis (WebKDD and SNA-KDD), pp 26–35
Shetty J, Adibi J (2004) Enron email dataset. Technical report. USC Information Sciences Institute
Sohn Y, Choi M-K, Ahn Y-Y, Lee J, Jeong J (2011) Topological cluster analysis reveals the systemic organization of the caenorhabditis elegans connectome. PLoS Comput Biol 7(5):e1001139
Google Scholar
Steyvers M, Griffiths T (2007) Probabilistic topic models. In: Landauer T, McNamara D, Dennis S, Kintsch W (eds) Latent semantic analysis: a road to meaning. Lawrence Erlbaum, Mahwah, NJ, pp 427–448
Yoshida T (2013) Weighted line graphs for overlapping community discovery. Soc Netw Anal Min (online first list
Tierney L (1994) Markov chains for exploring posterior distributions. Ann Stat 22(4):1701–1728
Article MathSciNet MATH Google Scholar
Wasserman S, Faust K (1994) Social network analysis: methods and applications. Cambridge University Press, Cambridge
Watts DJ, Strogatz SH (1998) Collective dynamics of small-world networks. Nature 393(6684):440–442
Article Google Scholar
White JG, Southgate E, Thompson JN, Brenner S (1986) The structure of the nervous system of the nematode Caenorhabditis elegans. Philos Trans R Soc B Biol Sci 314(1165):1–340
Article Google Scholar
Winkler R (2003) An introduction to bayesian inference and decision. Probabilistic Publishing, Gainesville, FL
Wu Z (2010) A fast and reasonable method for community detection with adjustable extent of overlapping. In: Proceedings of international conference on intelligent systems and knowledge engineering, pp 376–379
Xie J, Kelley S, Szymanski BK (2013) Overlapping community detection in networks: the state of the art and comparative study. ACM Comput Surv 45(4)
Liaghat Z, Hossein Rasekh A, Mahdavi A (2013) Application of data mining methods for link prediction in social networks. Soc Netw Anal Min (online first list)
Zhang H, Qiu B, Giles CL, Foley HC, Yen J (2007) An LDA-based community structure discovery approach for large-scale social networks. In: Proceedings of IEEE international conference on intelligence and security informatics, p 200–207
Zhou D, Manavoglu E, Li J, Giles CL, Zha H (2006) Probabilistic models for discovering e-communities. In: Proceedings of international conference on World Wide Web, pp 173–182

Download references

Author information

Authors and Affiliations

ICAR, CNR, Via P. Bucci, 41C, 87036, Rende (CS), Italy
Gianni Costa & Riccardo Ortale

Authors

Gianni Costa
View author publications
You can also search for this author in PubMed Google Scholar
Riccardo Ortale
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gianni Costa.

Appendices

Appendix

The mathematical details concerning the BH-CRM model and, in particular, the derivation of the complete data likelihood (Eq. 2), the Gibbs sampling updates (Eq. 4) as well as the estimation of its multinomial and Bernoulli parameters (Eqs. 6, 7 , 8) are reported in the following.

Appendix 1: Complete data Likelihood.

The complete-data likelihood $P( {\bf{L}}, {\bf{C}},{\bf{R}}| {\varvec{\alpha}}, \varvec{\beta}, \varvec{\gamma})$ is, according to Eq. 1, the product of three factors. In turn, each factor exploits the properties of conjugate-prior and likelihood pairs to marginalize out specific parameters of the graphical model of Fig. 1 in closed form. The marginalization operated at the level of the individual factors ultimately allows to express the complete-data likelihood directly in terms of hyperparameters. As discussed in Sect. 3.1, integrating out parameters captures the uncertainty associated with them and simplifies the resulting Gibbs sampling algorithm, since in this manner parameters are not represented as additional variables, which avoids to sample their values.

Notice that the first factor in the right-hand side of Eq. 1 corresponds to the marginal distribution $P({\bf{C}} | {\varvec{\alpha}}) = \int P({\bf{C}}|\varTheta) P(\varTheta|{\varvec{\alpha}}) {\rm d}\varTheta$, that involves the below conjugate prior $P(\varTheta|{\varvec{\alpha}})$ and community likelihood $P({\bf{C}}|\varTheta)$

$$ \begin{aligned} P(\varTheta | \varvec{\alpha}) & = \prod_{i \in {\mathcal{N}}} {\rm Dirichlet} (\varvec{\vartheta}_{i}|\varvec{\alpha}) \\ & =\prod_{i \in {\mathcal{N}}} \frac{1}{\Updelta(\varvec{\alpha})} \prod_{k=1}^K \vartheta_{i,k}^{\alpha_k - 1}\\ P({\bf{C}} | \varTheta) & = \prod_{i,j \in {\mathcal{N}}} P(C^{(f)}_{ij} | \varTheta) \cdot P(C^{(t)}_{ji} | \varTheta) \\ & =\prod_{i \in {\mathcal{N}}} \prod_{k=1}^K \vartheta_{i,k}^{n^{(k)}_i} \\ \end{aligned} $$

where n ^(f)_ij is the number of times that user i belongs to community k. Thus, $P({\bf{C}} | {\varvec{\alpha}})$ is computed as follows

$$ \begin{aligned} P({\bf{C}}| \varvec{\alpha}) & = \int P({\bf{C}}| \varTheta) P (\varTheta|\varvec{\alpha}) {\rm d}\varTheta\\ & = \int \prod_{i \in {\mathcal{N}}} \prod_{k=1}^K \vartheta_{i,k}^{n_i^{(k)}} \cdot \prod_{i \in {\mathcal{N}}} \frac{1}{\Updelta(\varvec{\alpha})} \prod_{k=1}^K \vartheta_{i,k}^{\alpha_k - 1} {\rm d}\varvec{\vartheta}_i\\ & = \int \prod_{i \in {\mathcal{N}}} \prod_{k=1}^K \frac{1}{\Updelta(\varvec{\alpha})} \vartheta_{i,k}^{n_i^{(k)} + \alpha_k-1}{\rm d}\varvec{\vartheta}_i\\ & = \prod_{i \in {\mathcal{N}}} \frac{\Updelta({\bf{n}}_i + \varvec{\alpha})}{\Updelta(\varvec{\alpha})} \ {\text{with}} \ {\bf{n}}_i = \left \{n^{(k)}_i \right \}_{k=1}^K \end{aligned} $$

(9)

The second factor in the right-hand side of Eq. 1 corresponds to the marginal distribution $P({\bf{R}} | {\bf{C}}, {\varvec{\beta}}) = \int P({\bf{R}}|{\bf{C}}, \varPhi) P(\varPhi|{\varvec{\beta}}) {\rm d}\varPhi$, that involves the conjugate prior $P(\varPhi|{\varvec{\beta}})$ and role likelihood $P({\bf{R}}| {\bf{C}}, \varPhi)$ reported next

$$ \begin{aligned} P(\varPhi | \varvec{\beta}) & = \prod_{k=1}^K {\text{Dirichlet}} (\varvec{\varphi}_{k}|\varvec{\beta}) \\ & = \prod_{k=1}^K \frac{1}{\Updelta(\varvec{\beta})} \prod_{l=1}^H \varphi_{k,l}^{\beta_l - 1}\\ P({\bf{R}} | {\bf{C}}, \varPhi) & = \prod_{i,j \in {\mathcal{N}}} P(R^{(f)}_{ij} | {\bf{C}}, \varPhi) \cdot P(R^{(t)}_{ji} | {\bf{C}}^{(t)}, \varPhi) \\ & = \prod_{k = 1}^K \prod_{l=1}^H \varphi_{k,l}^{n^{(l)}_k} \end{aligned} $$

where n ^(l)_k is the number of times that role l is played within community k.

$P({\bf{R}} | {\bf{C}}, {\varvec{\beta}}) $ is calculated as follows

$$ \begin{aligned} P({\bf{R}}| {\bf{C}}, \varvec{\beta}) & = \int P({\bf{R}} | {\bf{C}}, \varPhi) P(\varPhi | \varvec{\beta}) {\rm d}\varPhi\\ & = \int \prod_{k=1}^K \prod_{l=1}^H \varphi_{k,l}^{n_k^{(l)}} \cdot \prod_{k=1}^K \prod_{l=1}^H \frac{1}{\Updelta(\varvec{\beta})} \varphi_{k,l}^{\beta_l -1} {\rm d}\varvec{\varphi}_k\\ & = \prod_{k=1}^K \frac{\Updelta({\bf{n}}_k +\varvec{\beta})}{\Updelta(\varvec{\beta})} \hbox{ with } {\bf{n}}_k = \left \{n^{(l)}_k \right \}_{l=1}^H \end{aligned} $$

(10)

The third factor in the right-hand side of Eq. 1 corresponds to the marginal distribution $P({\bf{L}} | {\bf{R}}, {\varvec{\gamma}}) = \int P({\bf{L}}|{\bf{R}}, \varXi) P(\varXi|{\varvec{\gamma}}) {\rm d}\varXi$, that involves the conjugate prior $P(\varXi|{\varvec{\gamma}})$ and link likelihood $P({\bf{L}}| {\bf{R}}, \varXi)$ beneath

$$ \begin{aligned} P(\varXi | \varvec{\gamma}) & = \prod_{r,r^{\prime} \in {\mathcal{R}}} {\rm Beta}(\xi_{r,r^{\prime}} | \varvec{\gamma})\\ & = \prod_{r,r^{\prime} \in {\mathcal{R}}} \frac{1}{B(\gamma_1,\gamma_2)} \left (\xi_{r,r^{\prime}} \right )^{\gamma_1 - 1} \left (1 - \xi_{r,r^{\prime}} \right )^{\gamma_2 - 1}\\ & = \prod_{r,r^{\prime} \in {\mathcal{R}}} \frac{\varGamma(\gamma_1 + \gamma_2)}{\varGamma(\gamma_1) \varGamma(\gamma_2)} \left (\xi_{r,r^{\prime}} \right )^{\gamma_1 - 1} \left (1 - \xi_{r,r^{\prime}} \right )^{\gamma_2 - 1}\\ P({\bf{L}}|{\bf{R}}, \varXi) & = \prod_{i,j \in {\mathcal{N}}} \left ( \xi_{R^{(f)}_{ij}, R^{(t)}_{ji}} \right)^{L_{ij}} \left ( 1 - \xi_{R^{(f)}_{ij}, R^{(t)}_{ji}} \right)^{1 - L_{ij}} \\ & = \prod_{r,r^{\prime} \in {\mathcal{R}}} \left ( \xi_{r,r^{\prime}} \right)^{n_{r,r^{\prime}}} \left ( 1 - \xi_{r,r^{\prime}} \right)^{\overline{n}_{r,r^{\prime}}} \end{aligned} $$

where $n_{r,r^{\prime}}$ is the number of directed links from nodes with role r to nodes with role $r^{\prime}.$ Accordingly, $\overline{n}_{r,r^{\prime}}$ is the number of missing links from role r to role $r^{\prime}$, i.e., $\overline{n}_{r,r^{\prime}} = |\{ (i,j) | R^{(f)}_{ij} = r, R^{(t)}_{ji} = r^{\prime}, L_{ij} = 0 \}| .$

$P({\bf{L}} | {\bf{R}}, {\varvec{\gamma}})$ is obtained in the following manner:

$$ \begin{aligned} P({\bf{L}}| {\bf{R}}, \varvec{\gamma}) &= \int P({\bf{L}}| {\bf{R}},\varXi) P(\varXi| \varvec{\gamma}){\rm d}\varXi\\ & = \int \prod_{r,r^{\prime} \in{\mathcal{R}}} \left ( \xi_{r,r^{\prime}} \right)^{n_{r,r^{\prime}}} \left ( 1 - \xi_{r,r^{\prime}} \right)^{\overline{n}_{r,r^{\prime}}}\\ & \quad \cdot \frac{1}{B(\gamma_1,\gamma_2)} \left ( \xi_{r,r^{\prime}} \right)^{\gamma_1 -1} \left ( 1 - \xi_{r,r^{\prime}} \right)^{\gamma_2 - 1} {\rm d}\varXi\\ & = \prod_{r,r^{\prime} \in{\mathcal{R}}} \frac{1}{B(\gamma_1,\gamma_2)} \\ & \quad \cdot \int_0^1 \left ( \xi_{r,r^{\prime}} \right)^{n_{r,r^{\prime}} + \gamma_1 - 1} \left ( 1 - \xi_{r,r^{\prime}} \right)^{\overline{n}_{r,r^{\prime}} + \gamma_2 - 1} {\rm d}\xi_{r,r^{\prime}}\\ & = \prod_{r,r^{\prime} \in{\mathcal{R}}} \frac{B(n_{r,r^{\prime}} + \gamma_1 , \overline{n}_{r,r^{\prime}} + \gamma_2 )}{B(\gamma_1,\gamma_2)}\\ & = \prod_{r,r^{\prime} \in {\mathcal{R}}} \frac{\varGamma(n_{r,r^{\prime}} + \gamma_1)\varGamma(\overline{n}_{r,r^{\prime}} + \gamma_2) \varGamma(\gamma_1+\gamma_2)}{\varGamma(n_{r,r^{\prime}} + \overline{n}_{r,r^{\prime}} + \gamma_1 + \gamma_2) \varGamma(\gamma_1)\varGamma(\gamma_2)} \end{aligned} $$

(11)

Finally, the (marginalized) complete-data likelihood of Eq. 2 follows from substituting Eqs. 9, 10, 11 into Eq. 1.

Appendix 2: Gibbs sampling

The basic updates Eqs. 4 and 5 used for posterior inference in the context of the BH-CRM model of Fig. 1 are derived for the case $i \neq j, C^{(f)}_{ij} = k, C^{(t)}_{ji} = k^{\prime}, k \neq k^{\prime}, R^{(f)}_{ij} = r, R^{(t)}_{ji} = r^{\prime}$ and $r \neq r^{\prime}$. The exceptions arising in all other cases are treated in Sects. Appendixes 3, 4 and 5.

The starting point for the derivation of both Eq. 4 and Eq. 5 is the computation of $P\left( C^{(f)}_{ij} = k, R^{(f)}_{ij} = r, C^{(t)}_{ji} = k^{\prime}, R^{(t)}_{ji} = r^{\prime},| {\bf{C}}_{\neg{ij}}, {\bf{R}}_{\neg{ij}}, {\bf{L}}\right)$.

$$ \begin{aligned} & P\left( C^{(f)}_{ij} = k, R^{(f)}_{ij} = r, C^{(t)}_{ji} = k^{\prime}, R^{(t)}_{ji} = r^{\prime},| {\bf{C}}_{\neg{ij}}, {\bf{R}}_{\neg{ij}}, {\bf{L}}\right) \\ &\quad = \frac{P({\bf{L}}, {\bf{C}}, {\bf{R}})} {P({\bf{L}}, {\bf{C}}_{\neg{ij}}, {\bf{R}}_{\neg{ij}}) } = \frac{P({\bf{L}}, {\bf{C}}, {\bf{R}}) }{P({\bf{L}}_{\neg{ij}}, {\bf{C}}_{\neg{ij}}, {\bf{R}}_{\neg{ij}}) P(L_{ij})}\\ &\quad \propto \frac{P({\bf{L}}, {\bf{C}}, {\bf{R}}) }{P({\bf{L}}_{\neg{ij}}, {\bf{C}}_{\neg{ij}}, {\bf{R}}_{\neg{ij}}) }\\ & \quad = \frac{ P({\bf{L}} | {\bf{R}}) }{ P({\bf{L}}_{\neg{ij}} | {\bf{R}}_{\neg{ij}}) } \cdot \frac{ P({\bf{R}} | {\bf{C}}) }{ P({\bf{R}}_{\neg{ij}} | {\bf{C}}_{\neg{ij}}) } \cdot \frac{ P({\bf{C}}) }{ P({\bf{C}}_{\neg{ij}})} \end{aligned} $$

(12)

The three factors that appear in the rightmost hand side of Eq. A4 are separately dealt with below. We start with the first factor.

$$ \begin{aligned} \frac{ P({\bf{L}} | {\bf{R}}) }{ P({\bf{L}}_{\neg{ij}} | {\bf{R}}_{\neg{ij}}) } & = \prod_{r_1,r_2 \in {\mathcal{R}}} \frac{ \varGamma( n_{r_1,r_2} + \gamma_1 ) \varGamma( \overline{n}_{r_1,r_2} + \gamma_2 )} { \varGamma( n_{r_1,r_2} + \gamma_1 + \overline{n}_{r_1,r_2} + \gamma_2) } \\ & \quad\cdot \prod_{r_1,r_2 \in {\mathcal{R}}} \frac{ \varGamma( n_{r_1,r_2,\neg{ij}} + \gamma_1 + \overline{n}_{r_1,r_2,\neg{ij}} + \gamma_2) } { \varGamma( n_{r_1,r_2,\neg{ij}} + \gamma_1 ) \varGamma( \overline{n}_{r_1,r_2,\neg{ij}} + \gamma_2 )} \\ & = \frac{\varGamma(n_{r,r^{\prime}} + \gamma_1) }{ \varGamma(n_{r,r^{\prime},\neg{ij}} + \gamma_1) } \cdot \frac{ \varGamma(\overline{n}_{r,r^{\prime}} + \gamma_2) }{ \varGamma(\overline{n}_{r,r^{\prime},\neg{ij}} + \gamma_2) } \\ & \quad\cdot \frac{ \varGamma(n_{r,r^{\prime},\neg{ij}} + \gamma_1 + \overline{n}_{r,r^{\prime},\neg{ij}} + \gamma_2) }{ \varGamma(n_{r,r^{\prime}} + \gamma_1 + \overline{n}_{r,r^{\prime}} + \gamma_2) } \end{aligned} $$

where $n_{r_1,r_2,\neg{ij}}$ and $\overline{n}_{r_1,r_2,\neg{ij}}$ indicate that the present (or absent) link from node i to node j is excluded from the respective role assignments. Given the roles assumed for nodes i and j in Eq. 12, the proper evaluation of $n_{r_1,r_2,\neg{ij}}$ and $\overline{n}_{r_1,r_2,\neg{ij}}$ requires to distinguish two cases, i.e., L _ij = 0 and L _ij = 1. If L _ij = 0, there is no link from i to j. Therefore, since in such a case $n_{r_1,r_2,\neg{ij}} = n_{r_1,r_2}$ and $\overline{n}_{r_1,r_2,\neg{ij}} = \overline{n}_{r_1,r_2} - 1$, it follows that

$$ \frac{ P({\bf{L}} | {\bf{R}}) }{ P({\bf{L}}_{\neg{ij}} | {\bf{R}}_{\neg{ij}}) } = \frac{\overline{n}_{r,r^{\prime}} - 1 + \gamma_2} {n_{r,r^{\prime}} + \gamma_1 + \overline{n}_{r,r^{\prime}} - 1 + \gamma_2} $$

(13)

Instead, if L _ij = 1, there is a link from node i to node j. Hence, $n_{r_1,r_2,\neg{ij}} = n_{r_1,r_2} - 1$ and $\overline{n}_{r_1,r_2,\neg{ij}} = \overline{n}_{r_1,r_2}$. Consequently,

$$ \frac{ P({\bf{L}} | {\bf{R}}) }{ P({\bf{L}}_{\neg{ij}} | {\bf{R}}_{\neg{ij}}) } = \frac{n_{r,r^{\prime}} - 1 + \gamma_1} {n_{r,r^{\prime}} - 1 + \gamma_1 + \overline{n}_{r,r^{\prime}} + \gamma_2} $$

(14)

We next focus on the second factor in the right hand side of Eq. 12.

$$ \begin{aligned} \frac{ P({\bf{R}} | {\bf{C}}) }{ P({\bf{R}}_{\neg{ij}} | {\bf{C}}_{\neg{ij}}) } & = \prod_{c=1}^K \frac{ \Updelta(\varvec{n}_c + \varvec{\beta}) }{ \Updelta(\varvec{\beta})} \prod_{c=1}^K \frac{ \Updelta(\varvec{\beta}) }{ \Updelta(\varvec{n}_{c,\neg{ij},\neg{ji}} + \varvec{\beta})}\\ & = \frac{ \prod_{c=1}^K \Updelta({\bf{n}}_c + \varvec{\beta}) } {\prod_{c=1}^K \Updelta(\varvec{n}_{c,\neg{ij},\neg{ji}} + \varvec{\beta})}\\ & = \frac{ \Updelta({\bf{n}}_k + \varvec{\beta}) \Updelta({\bf{n}}_{k^{\prime}} + \varvec{\beta}) }{\Updelta(\varvec{n}_{k,\neg{ij}} + \varvec{\beta}) \Updelta(\varvec{n}_{k^{\prime},\neg{ji}} + \varvec{\beta})}\\ & = \frac{ \prod_{l=1}^H \varGamma(n^{(l)}_k + \beta_l) }{\varGamma(\sum_{l=1}^H n^{(l)}_k + \beta_l) } \cdot \frac{\varGamma(\sum_{l=1}^H n^{(l)}_{k,\neg{ij}} + \beta_l) }{ \prod_{l=1}^H \varGamma(n^{(l)}_{k,\neg{ij}} + \beta_l)}\\ &\quad \cdot\frac{\prod_{l=1}^H \varGamma(n^{(l)}_{k^{\prime}} + \beta_l) }{\varGamma(\sum_{l=1}^H n^{(l)}_{k^{\prime}} + \beta_l) } \cdot \frac{\varGamma(\sum_{l=1}^H n^{(l)}_{k^{\prime},\neg{ji}} + \beta_l) }{\prod_{l=1}^H \varGamma(n^{(l)}_{k^{\prime},\neg{ji}} +\beta_l) }\\ \end{aligned} $$

where ${\varvec{n}}_{c,\neg{ij},\neg{ji}}$ indicates that roles R ^(f)_ij and R ^(t)_ji are excluded from the communities of nodes i and j, respectively. Formally, for the generic role l and community c, $n^{(l)}_{c, \neg{ij}} = n^{(l)}_c - 1$ if C ^(f)_ij = c and R ^(f)_ij = l. Otherwise, $n^{(l)}_{c, \neg{ij}} = n^{(l)}_c$. Likewise, if C ^(t)_ji = c and R ^(t)_ji = l, $n^{(l)}_{c, \neg{ij}, \neg{ji}} = n^{(l)}_{c, \neg{ij}} - 1$, whereas $n^{(l)}_{c, \neg{ij}, \neg{ji}} = n^{(l)}_{c, \neg{ij}} $ in all other cases. Due to the hypothesis C ^(f)_ij ≠ C ^(t)_ji , the two separate counts ${\varvec{n}}_{k, \neg{ij}}$ and ${\varvec{n}}_{k^{\prime}, \neg{ji}}$ remain as the arguments of the $\Updelta$ functions at the denominator. In particular, since $n^{(r)}_{k,\neg{ij}} = n^{(r)}_k - 1$ and $n^{(r^{\prime})}_{k^{\prime},\neg{ji}} = n^{(r^{\prime})}_{k^{\prime}} - 1$, it follows that

$$ \begin{aligned} \frac{ P({\bf{R}} | {\bf{C}}) }{ P({\bf{R}}_{\neg{ij}} | {\bf{C}}_{\neg{ij}}) } & =\frac{ \varGamma(n^{(r)}_k + \beta_r) }{ \varGamma(n^{(r)}_{k, \neg{ij}} + \beta_r) } \cdot \frac{ \varGamma\left(\sum_{l=1}^H n^{(l)}_{k,\neg{ij}} + \beta_l\right) }{ \varGamma\left(\sum_{l=1}^H n^{(l)}_k + \beta_l\right) }\\ & \quad \cdot \frac{ \varGamma(n^{(r^{\prime})}_{k^{\prime}} + \beta_{r^{\prime}}) }{ \varGamma(n^{(r^{\prime})}_{k^{\prime}, \neg{ji}} + \beta_{r^{\prime}}) } \cdot \frac{ \varGamma\left(\sum_{l=1}^H n^{(l)}_{k^{\prime},\neg{ji}} + \beta_l\right) }{ \varGamma\left(\sum_{l=1}^H n^{(l)}_{k^{\prime}} + \beta_l\right) }\\ & = \frac{ n^{(r)}_k + \beta_r - 1 }{ \left ( \sum_{l=1}^H n^{(l)}_k + \beta_l \right ) - 1 } \\ & \quad \cdot \frac{ n^{(r^{\prime})}_{k^{\prime}} + \beta_{r^{\prime}} - 1 }{ \left ( \sum_{l=1}^H n^{(l)}_{k^{\prime}} + \beta_l \right ) - 1 } \end{aligned} $$

(15)

Finally, the third factor of Eq. 12 is computed.

$$ \begin{aligned} \frac{ P({\bf{C}}) } {P({\bf{C}}_{\neg{ij}})} & = \prod_{u \in {\mathcal{N}}} \frac{\Updelta( {\bf{n}}_u + \varvec{\alpha} )}{ \Updelta(\varvec{\alpha} ) } \prod_{u \in {\mathcal{N}}} \frac{ \Updelta(\varvec{\alpha}) }{ \Updelta( \varvec{n}_{u, \neg{ij}, \neg{ji}} + \varvec{\alpha} )}\\ & = \frac{ \prod_{u \in {\mathcal{N}}} \Updelta( {\bf{n}}_u + \varvec{\alpha} )} {\prod_{u \in {\mathcal{N}}} \Updelta( \varvec{n}_{u, \neg{ij}, \neg{ji}} + \varvec{\alpha})}\\ & = \frac{\Updelta({\bf{n}}_i + \varvec{\alpha}) }{\Updelta (\varvec{n}_{i, \neg{ij}} + \varvec{\alpha})} \cdot \frac{ \Updelta({\bf{n}}_j + \varvec{\alpha}) } { \Updelta(\varvec{n}_{j, \neg{ji}} + \varvec{\alpha})}\\ & = \frac{ \prod_{c=1}^K \varGamma(n^{(c)}_i + \alpha_c) }{ \varGamma\left(\sum_{c=1}^K n^{(c)}_i +\alpha_c\right)} \cdot \frac{ \varGamma\left(\sum_{c=1}^K n^{(c)}_{i,\neg{ij}} +\alpha_c\right)} { \prod_{c=1}^K \varGamma(n^{(c)}_{i,\neg{ij}} + \alpha_c) } \\ & \quad \cdot \frac{ \prod_{c=1}^K \varGamma(n^{(c)}_j + \alpha_c) } { \varGamma\left(\sum_{c=1}^K n^{(c)}_j +\alpha_c\right)} \cdot \frac{ \varGamma\left(\sum_{c=1}^K n^{(c)}_{j,\neg{ji}} +\alpha_c\right) }{ \prod_{c=1}^K \varGamma(n^{(c)}_{j,\neg{ji}} + \alpha_c)} \end{aligned} $$

where ${\varvec{n}}_{u,\neg{ij},\neg{ji}}$ indicates that communities C ^(f)_ij and C ^(t)_ji are excluded from the community memberships of nodes i and j, respectively. More precisely, for the generic node u and community c, $n^{(c)}_{u,\neg{ij}} = n^{(c)}_u - 1$ if u = i and C ^(f)_ij = c, whereas it is $n^{(c)}_{u,\neg{ij}} = n^{(c)}_u$ otherwise. Analogously, $n^{(c)}_{u,\neg{ij}, \neg{ji}} = n^{(c)}_{u, \neg{ij}} - 1$ if u = j and C ^(t)_ji = c. In all other cases $n^{(c)}_{u,\neg{ij}, \neg{ji}} = n^{(c)}_{u, \neg{ij}}$. Due to the assumption i ≠ j, the two separate counts ${\varvec{n}}_{i,\neg{ij}}$ and ${\varvec{n}}_{j,\neg{ji}}$ remain as the arguments of the $\Updelta$ functions at the denominator. In particular, since $n^{(k)}_{i,\neg{ij}} = n^{(k)}_i - 1$ and $n^{(k^{\prime})}_{j,\neg{ji}} = n^{(k^{\prime})}_j - 1$, we find that

$$ \begin{aligned} \frac{ P({\bf{C}}) } {P({\bf{C}}_{\neg{ij}})} & = \frac{\varGamma(n^{(k)}_i + \alpha_k)}{\varGamma(n^{(k)}_{i,\neg{ij}} + \alpha_k)} \cdot \frac{\varGamma\left(\sum_{k=1}^K n^{(k)}_{i,\neg{ij}} +\alpha_k\right)}{\varGamma\left(\sum_{k=1}^K n^{(k)}_i +\alpha_k\right)}\\ & \quad \cdot \frac{ \varGamma(n^{(k^{\prime})}_j + \alpha_{k^{\prime}}) }{ \varGamma(n^{(k^{\prime})}_{j,\neg{ji}} +\alpha_{k^{\prime}}) } \cdot\frac{\varGamma\left(\sum_{k=1}^K n^{(k)}_{j,\neg{ji}} +\alpha_k\right)}{\varGamma\left(\sum_{k=1}^K n^{(k)}_j+\alpha_k\right)}\\ & = \frac{n^{(k)}_i + \alpha_k - 1}{\left(\sum_{k=1}^K n^{(k)}_i + \alpha_k \right) -1}\\ & \quad \cdot \frac{ n^{(k^{\prime})}_j + \alpha_{k^{\prime}} - 1 }{ \left( \sum_{k=1}^K n^{(k)}_j + \alpha_k \right) -1} \end{aligned} $$

(16)

By plugging Eqs. 13, 15 and 16 into Eq. 12, we obtain Eq. 4. Equation 5 is obtained by plugging Eq. 14 instead of Eq. 13 into Eq. 12. Recall that Eq. 4 is the sampling step when i ≠ j, C ^(f)_ij ≠ C ^(t)_ji and R ^(f)_ij ≠ R ^(t)_ji . In all other cases, the right-hand side of Eq. 12 varies and can be obtained by multiplying the factors computed in the following Appendixes 3, 4, 5.

Appendix 3: Role exceptions

There can be three possible exceptions in the computation of the second factor (i.e., Eq. 15) in the right-hand side of Eq. 12. Foremost, C ^(f)_ij ≠ C ^(t)_ji and R ^(f)_ij = R ^(t)_ji . Moreover, nodes i and j can belong to the same community, i.e., C ^(f)_ij = C ^(t)_ji , and in this latter case we must further distinguish between R ^(f)_ij ≠ R ^(t)_ji and R ^(f)_ij = R ^(t)_ji . These three alternative cases are treated next.

1.
C ^(f)_ij ≠ C ^(t)_ji and R ^(f)_ij = R ^(t)_ji . Let C ^(f)_ij = k and $C^{(t)}_{ji} = k^{\prime}$ with $k \neq k^{\prime}$. Also, assume that R ^(f)_ij = R ^(t)_ji = r. Then, it holds that $n^{(r)}_{k, \neg{ij}} = n^{(r)}_k - 1$ and $n^{(r)}_{k^{\prime}, \neg{ij}} = n^{(r)}_{k^{\prime}} - 1.$ Hence,
$$ \begin{aligned} \frac{ P({\bf{R}} | {\bf{C}}) }{ P({\bf{R}}_{\neg{ij}} | {\bf{C}}_{\neg{ij}}) } & = \frac{ \varGamma(n^{(r)}_k + \beta_r) }{ \varGamma(n^{(r)}_{k, \neg{ij}} + \beta_r)} \cdot \frac{ \varGamma\left(\sum_{l=1}^H n^{(l)}_{k,\neg{ij}} + \beta_l\right)}{ \varGamma\left(\sum_{l=1}^H n^{(l)}_k + \beta_l\right)}\\ &\quad \cdot \frac{\varGamma(n^{(r)}_{k^{\prime}} + \beta_{r}) }{ \varGamma(n^{(r)}_{k^{\prime}, \neg{ji}} + \beta_{r}) } \cdot \frac{ \varGamma\left(\sum_{l=1}^H n^{(l)}_{k^{\prime},\neg{ji}} + \beta_l\right)}{ \varGamma\left(\sum_{l=1}^H n^{(l)}_{k^{\prime}} + \beta_l\right)}\\ & = \frac{n^{(r)}_k + \beta_r - 1 }{ \left ( \sum_{l=1}^H n^{(l)}_k + \beta_l \right ) - 1 } \\ & \quad \cdot \frac{ n^{(r)}_{k^{\prime}} + \beta_{r} - 1 }{ \left ( \sum_{l=1}^H n^{(l)}_{k^{\prime}} + \beta_l \right ) - 1 } \end{aligned} $$
(17)
2.
Case C ^(f)_ij = C ^(t)_ji and R ^(f)_ij ≠ R ^(t)_ji . Let C ^(f)_ij = C ^(t)_ji = k. Additionally, assume that R ^(f)_ij = r and $R^{(t)}_{ji} = r^{\prime}$ (with $r \neq r^{\prime}$). Then,
$$ \begin{aligned} \frac{ P({\bf{R}} | {\bf{C}}) }{ P({\bf{R}}_{\neg{ij}} | {\bf{C}}_{\neg{ij}}) } & = \frac{ \prod_{c=1}^K \Updelta({\bf{n}}_c + \varvec{\beta}) } {\prod_{c=1}^K \Updelta(\varvec{n}_{c,\neg{ij},\neg{ji}} + \varvec{\beta})}\\ & = \frac{ \Updelta({\bf{n}}_k + \varvec{\beta} ) }{ \Updelta(\varvec{n}_{k, \neg{ij}, \neg{ji}} + \varvec{\beta})}\\ & = \frac{ \prod_{l=1}^H \varGamma(n^{(l)}_k + \beta_l) }{\varGamma\left(\sum_{l=1}^H n^{(l)}_k + \beta_l\right) } \\ & \quad \cdot \frac{\varGamma\left(\sum_{l=1}^H n^{(l)}_{k,\neg{ij}, \neg{ji}} + \beta_l\right) }{\prod_{l=1}^H \varGamma(n^{(l)}_{k,\neg{ij}, \neg{ji}} + \beta_l)} \end{aligned} $$

By hypothesis R ^(f)_ij ≠ R ^(t)_ji and, therefore, the count ${\varvec{n}}_{k, \neg{ij}, \neg{ji}}$ can be decomposed into the separate counts ${\varvec{n}}_{k, \neg{ij}}$ and ${\varvec{n}}_{k, \neg{ji}}.$ Moreover, being $n^{(r)}_{k, \neg{ij}} = n^{(r)}_k - 1$ and $n^{(r^{\prime})}_{k, \neg{ji}} = n^{(r^{\prime})}_k - 1$, it follows that
$$ \begin{aligned} \frac{ P({\bf{R}} | {\bf{C}}) }{ P({\bf{R}}_{\neg{ij}} | {\bf{C}}_{\neg{ij}}) } & = \frac{ \varGamma(n^{(r)}_k + \beta_r) \cdot \varGamma(n^{(r^{\prime})}_k + \beta_{r^{\prime}})}{ \varGamma(n^{(r)}_{k, \neg{ij}} + \beta_r) \cdot \varGamma(n^{(r^{\prime})}_{k, \neg{ji}} + \beta_{r^{\prime}}) }\\ &\quad \cdot \frac{ \varGamma\left(\sum_{l=1}^H n^{(l)}_{k,\neg{ij}, \neg{ji}} + \beta_l\right) } { \varGamma\left(\sum_{l=1}^H n^{(l)}_{k} + \beta_l\right) }\\ & =\frac{ \left(n^{(r)}_k + \beta_r - 1 \right)} {\left[\left( \sum_{l=1}^H n^{(l)}_{k} + \beta_l \right) - 1 \right] } \\ & \quad \cdot \frac{\left(n^{(r^{\prime})}_k + \beta_{r^{\prime}} -1 \right)}{\left[\left( \sum_{l=1}^H n^{(l)}_{k} + \beta_l \right) -2 \right]} \end{aligned} $$
(18)
3.
Case C ^(f)_ij = C ^(t)_ji and R ^(f)_ij = R ^(t)_ji . Let C ^(f)_ij = C ^(t)_ji = k. Also, assume that R ^(f)_ij = R ^(t)_ji = r. With these assumptions, $n^{(r)}_{k, \neg{ij}, \neg{ji}} = n^{(r)}_k - 2.$ Therefore,
$$ \begin{aligned} \frac{ P({\bf{R}} | {\bf{C}}) }{ P({\bf{R}}_{\neg{ij}} | {\bf{C}}_{\neg{ij}}) } & = \frac{ \varGamma(n^{(r)}_k + \beta_r)}{ \varGamma(n^{(r)}_{k, \neg{ij}, \neg{ji}} + \beta_r)} \\ & \quad \cdot \frac{\varGamma\left(\sum_{l=1}^H n^{(l)}_{k,\neg{ij}, \neg{ji}} + \beta_l\right) }{ \varGamma\left(\sum_{l=1}^H n^{(l)}_{k} + \beta_l\right) }\\ & = \frac{\left(n^{(r)}_k + \beta_r - 1 \right)} {\left[\left( \sum_{l=1}^H n^{(l)}_{k} + \beta_l \right) - 1 \right] } \\ & \quad \cdot\frac{\left(n^{(r)}_k + \beta_{r} - 2 \right) }{\left[\left( \sum_{l=1}^H n^{(l)}_{k} + \beta_l \right) -2 \right]} \end{aligned} $$
(19)

Appendix 4: Community exceptions

Three possible exceptions must be taken into account in the calculation of the third factor (i.e., Eq. 16) in the right-hand side of Eq. 12. In general, two distinct nodes i and j can belong to the same community, i.e., C ^(f)_ij = C ^(t)_ji . Besides, nodes i and j can coincide, i.e., i = j, and in this latter case C ^(f)_ij ≠ C ^(t)_ji and C ^(f)_ij = C ^(t)_ji must be considered separately. The foresaid three exceptions are individually considered beneath.

1.
Case i ≠ j and C ^(f)_ij = C ^(t)_ji . Let C ^(f)_ij = C ^(t)_ji = k. Then, $n^{(k)}_{i, \neg{ij}} = n^{(k)}_i - 1$ and $n^{(k)}_{j, \neg{ji}} = n^{(k)}_j - 1$. Hence,
$$ \begin{aligned} \frac{ P({\bf{C}}) } {P({\bf{C}}_{\neg{ij}})} & = \frac{\varGamma(n^{(k)}_i + \alpha_k)}{\varGamma(n^{(k)}_{i,\neg{ij}} + \alpha_k)} \cdot \frac{\varGamma\left(\sum_{c=1}^K n^{(c)}_{i,\neg{ij}} +\alpha_c\right)}{\varGamma\left(\sum_{c=1}^K n^{(c)}_i +\alpha_c\right)}\\ & \quad \cdot\frac{ \varGamma(n^{(k)}_j + \alpha_{k}) }{ \varGamma(n^{(k)}_{j,\neg{ji}} +\alpha_{k}) } \cdot\frac{\varGamma\left(\sum_{c=1}^K n^{(c)}_{j,\neg{ji}} +\alpha_c\right)}{\varGamma\left(\sum_{c=1}^K n^{(c)}_j+\alpha_c\right)}\\ & = \frac{n^{(k)}_i + \alpha_k - 1 }{ \left( \sum_{c=1}^K n^{(c)}_i + \alpha_c \right) -1}\\ &\quad \cdot \frac{ n^{(k)}_j + \alpha_{k} - 1 }{\left(\sum_{c=1}^K n^{(c)}_j + \alpha_c \right) -1} \end{aligned} $$
(20)
2.
Case i = j and C ^(f)_ij ≠ C ^(t)_ji . Let C ^(f)_ij = k and $C^{(t)}_{ji} = k^{\prime}$ with $k \neq k^{\prime}$. Then,
$$ \begin{aligned} \frac{ P({\bf{C}})} {P({\bf{C}}_{\neg{ij}})}& =\frac{ \prod_{u \in {\mathcal{N}}} \Updelta( {\bf{n}}_u + \varvec{\alpha})} {\prod_{u \in {\mathcal{N}}}\Updelta( \varvec{n}_{u, \neg{ij}, \neg{ji}} +\varvec{\alpha} )}\\ &=\frac{\Updelta({\bf{n}}_i + \varvec{\alpha})} {\Updelta( \varvec{n}_{i, \neg{ij}, \neg{ji}} +\varvec{\alpha})}\\ &= \frac{ \prod_{c=1}^K \varGamma(n^{(c)}_i +\alpha_c) }{\varGamma\left(\sum_{c=1}^K n^{(c)}_i + \alpha_c\right)} \cdot\frac{\varGamma\left(\sum_{c=1}^K n^{(c)}_{i,\neg{ij}, \neg{ji}} + \alpha_c\right) }{ \prod_{c=1}^K \varGamma(n^{(c)}_{i,\neg{ij}, \neg{ji}} + \alpha_c)} \end{aligned} $$

By hypothesis C ^(f)_ij ≠ C ^(t)_ji and, hence, the count ${\varvec{n}}_{i, \neg{ij}, \neg{ji}}$ can be decomposed into the two separate counts ${\varvec{n}}_{i, \neg{ij}}$ and ${\varvec{n}}_{i, \neg{ji}}$. Moreover, since $n^{(k)}_{i, \neg{ij}} = n^{(k)}_i - 1$ and $n^{(k^{\prime})}_{i, \neg{ji}} = n^{(k^{\prime})}_i - 1$, it follows that
$$ \begin{aligned} \frac{ P({\bf{C}}) } {P({\bf{C}}_{\neg{ij}})} & = \frac{\varGamma\left(n^{(k)}_i + \alpha_k\right) \cdot \varGamma\left(n^{(k^{\prime})}_i + \alpha_{k^{\prime}}\right)} {\varGamma\left(n^{(k)}_{i,\neg{ij}} + \alpha_k\right) \cdot \varGamma\left(n^{(k^{\prime})}_{i,\neg{ji}} + \alpha_{k^{\prime}}\right)}\\ &\quad \cdot \frac{ \varGamma\left(\sum_{c=1}^K n^{(c)}_{i,\neg{ij}, \neg{ji}} + \alpha_c\right) } { \varGamma\left(\sum_{c=1}^K n^{(c)}_{i} + \alpha_c\right) }\\ &=\frac{ \left(n^{(k)}_i + \alpha_k - 1 \right)}{\left[\left( \sum_{c=1}^K n^{(c)}_{i} + \alpha_c \right) - 1 \right]}\\ &\quad \cdot\frac{\left(n^{(k^{\prime})}_i + \alpha_{k^{\prime}} - 1 \right)}{\left[\left( \sum_{c=1}^K n^{(c)}_{i} + \alpha_c \right) -2 \right]} \end{aligned} $$
(21)
3.
Case i = j and C ^(f)_ij = C ^(t)_ji . Let C ^(f)_ij = C ^(t)_ji = k. We find that
$$ \begin{aligned} \frac{ P({\bf{C}}) } {P({\bf{C}}_{\neg{ij}})} & = \frac{\varGamma(n^{(k)}_i + \alpha_k) }{\varGamma(n^{(k)}_{i,\neg{ij}, \neg{ji}} + \alpha_k)} \\ & \quad \cdot \frac{\varGamma\left(\sum_{c=1}^K n^{(c)}_{i,\neg{ij}, \neg{ji}} + \alpha_c\right)} {\varGamma\left(\sum_{c=1}^K n^{(c)}_{i} + \alpha_c\right)}\\ & =\frac{ \left(n^{(k)}_i + \alpha_k - 1 \right)} {\left[\left( \sum_{c=1}^K n^{(c)}_i + \alpha_c \right) - 1 \right]} \\ & \quad \frac{\left(n^{(k)}_i + \alpha_{k} - 2 \right)}{\left[\left( \sum_{c=1}^K n^{(c)}_i + \alpha_c \right) -2 \right]} \end{aligned} $$
(22)

Appendix 5: Link exceptions

The calculation of the first factor in the right-hand side of Eq. 12 does not meaningfully vary with the values of R ^(f)_ij and R ^(t)_ji . Equations 13 and 14 were obtained earlier under the assumption that R ^(f)_ij ≠ R ^(t)_ji . The resulting equations when R ^(f)_ij = R ^(t)_ji are similar to Eqs. 13 and 14, with the only difference that $n_{r,r^{\prime}}$ is replaced with n _r,r.

Appendix 6: Parameter estimation

Once the state of Markov chain run by the Gibbs sampler algorithm reaches its stationary distribution, the posterior estimates of parameters $\varTheta, \,\varPhi$ and $\varXi$ can be computed by conditioning them on the respective Markov blankets (i.e., the portions of the directed graphical model in Fig. 1 constituted by the parents, children and co-parents of the parameters themselves) and applying the Bayes theorem. The mathematical derivation of the estimates for the multinomial parameters $\varTheta$ and $\varPhi$ as well as the Bernoulli parameters $\varXi$ are reported next.

Multinomial parameters The probability of ${\varvec{\vartheta}}_i$ given the community assignments $\bf{C}$ and the hyperparameter ${\varvec{\alpha}}$ is

$$ \begin{aligned} P(\varvec{\vartheta}_i| {\bf{C}}, \varvec{\alpha}) & = \frac{\prod_{k=1}^K \vartheta_{i,k}^{n_i^{(k)}} \cdot \frac{1}{\Updelta(\varvec{\alpha})}\prod_{k=1}^K \vartheta_{i,k}^{\alpha_k - 1} }{ \frac{ \Updelta({\bf{n}}_i + \varvec{\alpha}) }{ \Updelta(\varvec{\alpha}) } } \\ & =\frac{ 1 }{ \Updelta({\bf{n}}_i + \varvec{\alpha}) } \prod_{k=1}^K \vartheta_{i,k}^{n_i^{(k)} + \alpha_k -1}\\ & ={\text{Dirichlet}} (\varvec{\vartheta}_i | {\bf{n}}_i + \varvec{\alpha}) \end{aligned} $$

(23)

By using the expectation of the Dirichlet distribution as in Heinrich (2008), one obtains Eq. 6:

$$ \vartheta_{i,k} = \frac{n_i^{(k)} + \alpha_k}{\sum_{k^{\prime}=1}^K n_i^{(k^{\prime})} + \alpha_{k^{\prime}}} $$

(24)

Analogously, the probability of ${\varvec{\varphi}}_k$ given the community assignments $\bf{C}$, the role assignments $\bf{R}$ and the hyperparameter ${\varvec{\beta}}$ is

$$ \begin{aligned} P(\varvec{\varphi}_k| {\bf{C}}, {\bf{R}}, \varvec{\beta}) & = \frac{ \prod_{l=1}^H \varphi_{k,l}^{n^{(l)}_k} \cdot \frac{ 1 }{ \Updelta(\varvec{\beta}) } \prod_{l=1}^H \varphi^{\beta_l - 1}_{k,l}}{\frac{\Updelta({\bf{n}}_k + \varvec{\beta})}{ \Updelta(\beta)}}\\ & = \frac{1}{\Updelta({\bf{n}}_k + \varvec{\beta})} \prod_{l=1}^H \varphi^{n^{(l)}_k + \beta_l - 1}_{k,l}\\ & = {\rm Dirichlet}(\varvec{\varphi}_k | \varvec{n}_k + \beta) \end{aligned} $$

(25)

By computing the expectation of the Dirichlet distribution we find Eq. 7:

$$ \varphi_{k,r} = \frac{n^{(r)}_k + \beta_r}{\sum^L_{r^{\prime}=1} n^{(r^{\prime})}_{k} + \beta_{r^{\prime}}} $$

(26)

Bernoulli Parameters The probability of $\xi_{r,r^{\prime}}$ given the adjacency matrix $\bf{L}$, the role assignments $\bf{R}$ and the hyperparameter ${\varvec{\gamma}}$ is

$$ \begin{aligned} P(\xi_{r,r^{\prime}}| {\bf{L}}, {\bf{R}}, \varvec{\gamma}) & = \frac{ \left (\xi_{r,r^{\prime}} \right )^{n_{r,r^{\prime}}} \cdot \left (1 - \xi_{r,r^{\prime}} \right )^{\overline{n}_{r,r^{\prime}}} \cdot {\rm Beta}(\xi_{r,r^{\prime}} | \varvec{\gamma})} {\frac{\varGamma(n_{r,r^{\prime}} + \gamma_1) \varGamma(\overline{n}_{r,r^{\prime}} + \gamma_2) \varGamma(\gamma_1+\gamma_2)} {\varGamma(n_{r,r^{\prime}} + \gamma_1 + \overline{n}_{r,r^{\prime}} + \gamma_2) \varGamma(\gamma_1) \varGamma(\gamma_2)}}\\ & = \frac{\varGamma(n_{r,r^{\prime}} + \gamma_1 + \overline{n}_{r,r^{\prime}} + \gamma_2)}{\varGamma(n_{r,r^{\prime}} + \gamma_1) \varGamma(\overline{n}_{r,r^{\prime}} + \gamma_2)}\\ & \quad \cdot\left (\xi_{r,r^{\prime}} \right )^{n_{r,r^{\prime}} + \gamma_1 - 1} \cdot \left( 1 - \xi_{r,r^{\prime}} \right )^{\overline{n}_{r,r^{\prime}}+\gamma_2-1}\\ & = {\rm Beta}(\xi_{r,r^{\prime}} | n_{r,r^{\prime}} + \gamma_1, \overline{n}_{r,r^{\prime}}+\gamma_2). \end{aligned} $$

(27)

By taking the expectation of the Beta distribution we obtain Eq. 8:

$$ \xi_{r,r^{\prime}} = \frac{n_{r,r^{\prime}} + \gamma_1}{n_{r,r^{\prime}} + \gamma_1 + \overline{n}_{r,r^{\prime}} + \gamma_2}. $$

(28)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Costa, G., Ortale, R. Probabilistic analysis of communities and inner roles in networks: Bayesian generative models and approximate inference. Soc. Netw. Anal. Min. 3, 1015–1038 (2013). https://doi.org/10.1007/s13278-013-0130-z

Download citation

Received: 30 December 2012
Revised: 07 May 2013
Accepted: 12 July 2013
Published: 31 August 2013
Issue Date: December 2013
DOI: https://doi.org/10.1007/s13278-013-0130-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Probabilistic analysis of communities and inner roles in networks: Bayesian generative models and approximate inference

Abstract

Access this article

Similar content being viewed by others

Emergence in complex networks of simple agents

Complex Networks: a Mini-review

Centrality measures in networks

References

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix

Appendix 1: Complete data Likelihood.

Appendix 2: Gibbs sampling

Appendix 3: Role exceptions

Appendix 4: Community exceptions

Appendix 5: Link exceptions

Appendix 6: Parameter estimation

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Probabilistic analysis of communities and inner roles in networks: Bayesian generative models and approximate inference

Abstract

Access this article

Similar content being viewed by others

Emergence in complex networks of simple agents

Complex Networks: a Mini-review

Centrality measures in networks

References

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix

Appendix 1: Complete data Likelihood.

Appendix 2: Gibbs sampling

Appendix 3: Role exceptions

Appendix 4: Community exceptions

Appendix 5: Link exceptions

Appendix 6: Parameter estimation

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation