Dealing with reciprocity in dynamic stochastic block models
Introduction
A number of social, behavioral, and biological phenomena can be naturally represented in terms of networks. In this literature, the relation between units, that is, “actors” or “nodes”, is the main target of inference and statistical models for the analysis of these relations have known a flowering interest. Most methods available in the literature are tailored to deal with static networks, where data consist of a single snapshot observed at a given occasion; see, among others, Goldenberg et al. (2010) and Amati et al. (2018) for a review.
Within this context, models for clustering and community discovering based on latent variables play an important role. Among these models, it is worth mentioning latent space models Sarkar and Moore (2006), Sarkar et al. (2007), Hoff (2011), Lee and Priebe (2011), Durante and Dunson (2014), which project network nodes on a reduced latent space where relations between them are explored, and Stochastic Block Models (SBMs; Holland et al. (1983), Snijders and Nowicki (1997), Nowicki and Snijders (2001), Daudin et al. (2008)), which assume that network nodes belong to one of distinct blocks. In this latter framework, relational variables are assumed to be independent conditional on the blocks of the nodes involved in the relation (local independence assumption). Blocks are defined by a discrete latent variable, with the probability of observing a connection between two nodes only depending on the corresponding block memberships. Therefore, nodes in the same block connect to all the others in a similar fashion and are said to be stochastically equivalent. The identification of these blocks provides a concise description of the network.
However, in some cases, the research interest may be on the evolution of the network over time, provided that longitudinal network data are available. In this context, standard tools of analysis need to be extended to deal with observations repeatedly taken over time, that is, with multiple snapshots of the network observed at different time points. Although longitudinal data permit a deeper study of the phenomenon of interest, the dependence between measures taken on the same sample units represents a further challenge that has to be faced (e.g., Diggle et al., 2002).
The literature about models specifically tailored to deal with dynamic networks is rather recent, with most proposals starting from approaches developed for static networks. In this article, we focus on extensions of the SBM for longitudinal data. In particular, Yang et al. (2011) developed a dynamic SBM by considering time-varying block memberships that evolve over time according to an unobservable Markov chain. The resulting model can be conceived as a particular type of hidden (latent) Markov model (for general references, see Bartolucci et al. (2013), Zucchini et al. (2016)) for dynamic networks. Xu and Hero (2014) further extended the dynamic SBM of Yang et al. (2011) by considering time-varying edge probabilities. The same model has been recently discussed by Matias and Miele (2017), who proposed an approach to solve the lack of identifiability due to label switching between time steps and a well-principled estimation approach. Finally, Xu (2015) proposed a model in which the presence of a relation at a given occasion directly influences future relation probabilities. An approach that is in between the dynamic latent space and the dynamic SBM is the dynamic mixed-membership SBM of Xing et al. (2010) and Ho et al. (2011). In this context, each node may have partial membership to different blocks.
Alternative proposals to the dynamic SBM are represented by the dynamic exponential random graph model for the analysis of social networks observed in discrete time (Robins and Pattison (2001), Hanneke et al. (2010), Lusher et al. (2013), Chapter 10). Further references include the stochastic actor-oriented model Snijders (1996), Snijders (2001), Snijders (2005), Snijders et al. (2010) and the relational event model Butts (2008), Quintane et al. (2014), which are based on a continuous time Markov process and on a time-to-event representation, respectively.
Extending the proposal of Yang et al. (2011), we develop an SBM for dynamic networks observed in discrete time in which the principal element of analysis is the dyad referred to each pair of nodes, conditional on the blocks they occupy at each occasion. Therefore, we avoid restrictive assumptions about the dependence/independence between reciprocal relations and, thus, obtain higher flexibility than that of standard dynamic SBMs. The main assumption is that of conditional independence between the dyads, given the corresponding latent variables representing the blocks. Note, however, that marginal dependence between dyads is not ruled out, but is explained in a meaningful way by the latent variables. Therefore, triangulation or similar higher-order effects are accounted for. In agreement with Vu et al. (2013), among others, conditional independence between dyads offers at least three advantages: (i) it leads to simplifications in the estimation process; (ii) it facilitates data simulation; and (iii) it avoids the degeneracy issue which is frequently encountered when dealing with SBMs.
To permit a deeper insight into reciprocity effects, we propose to parametrically specify every dyadic relation between nodes in the network by means of a suitably formulated log-linear model, given the latent blocks. Therefore, we may distinguish between main and reciprocal effects reflecting the tendency to observe asymmetric and symmetric relations, respectively, and therefore we may obtain information on the network’s cohesion. In particular, our approach allows us to formulate three different hypotheses: reciprocity may depend on the blocks to which the nodes involved in the relation belong; reciprocity is constant across blocks; and reciprocity is absent.
Estimation of the proposed model represents a challenging matter as computing the log-likelihood function would require the evaluation of a multiple summation defined over all possible configurations of the latent variables. Clearly, this becomes quickly unfeasible as the size of the network, and then the number of such latent variables, increases. In the literature, two main approaches are available to derive model parameter estimates. Markov Chain Monte Carlo (MCMC) algorithms represent a typical option in the Bayesian framework (e.g., Yang et al., 2011), while variational approximation methods represent a quite classical solution in the frequentist context (e.g., Yang et al. (2011), Matias and Miele (2017)). In this paper, we start from the proposal of Yang et al. (2011) and obtain parameter estimates through a Variational Expectation–Maximization (VEM) algorithm based on the assumption of posterior independence between dyads. We also propose an approximate inferential procedure with the aim of testing for the absence of reciprocity effects in the network or for the hypothesis that the level of reciprocity is constant with respect to the blocks. Starting from the lower bound of the likelihood function required for variational inference, we show how an approximate Likelihood Ratio (LR) test statistic, which is simply computed, may be used for inferential purposes on the reciprocity parameters.
Properties of the proposed inferential method, and in particular of the approximate LR test, are investigated via simulation and through the analysis of two benchmark datasets in the dynamic network literature: the Newcomb Fraternity network and the Enron network. The results show the potentialities of the proposed approach. Upon request, we make available the R implementation of the proposed estimation algorithm.
The paper is organized as follows. Section 2 introduces the dynamic SBM according to the initial proposal of Yang et al. (2011) and then illustrates the proposed extension to deal with different forms of reciprocity. Section 3 entails the description of the VEM algorithm for parameter estimation and introduces the approximate LR test for specific hypotheses on the type of reciprocity. The results of the simulation study and of the real data applications are provided in Sections 4 Simulation study, 5 Empirical applications, respectively. The last section contains some concluding remarks and outlines potential future developments.
Section snippets
Dynamic stochastic block models
For a network of individuals observed at time occasions, let denote a binary response variable which is equal to if there exists an edge from node to node at occasion and is equal to otherwise; is used to denote a realization of . Moreover, let be the binary adjacency matrix recorded at occasion which summarizes the relations between nodes. Here, we focus on directed networks without self-loops, so that is not constrained to be
Model inference
Let denote the overall set of latent variables in the model; based on the assumptions introduced so far, the observed network distribution is obtained by marginalizing out all these latent variables from the joint distribution of and . In particular, we have where denotes the sum over the support of and Computation of the network distribution
Simulation study
In this section, we illustrate the results of a large scale Monte Carlo simulation study. We focused both on the performance of the approximate LR statistics and on the clustering performance of the proposed approach and considered several distinct experimental scenarios, based on different network sizes and different values of the reciprocity parameter.
Empirical applications
In this section, we describe the application of the proposed methodology to two benchmark datasets in the network literature: the Newcomb Fraternity network and the Enron email network.
Concluding remarks
In this paper, we introduce a class of stochastic block models for dynamic networks where the standard hypothesis of independence between univariate responses is relaxed in favor of less stringent assumptions. In particular, the element of analysis is the set of dyads referred to ordered pairs of nodes and the assumption of conditional independence between them is considered. Obviously, marginal dependence, due for instance to triangulation effects, is not ruled out but can be, instead,
Acknowledgment
We acknowledge the financial support from grant RBFR12SHVV of the Italian Government (FIRB “Mixture and latent variable models for causal inference and analysis of socio-economic data”, 2012).
References (43)
Hierarchical multilinear models for multiway data
Comput. Statist. Data Anal.
(2011)- et al.
Stochastic blockmodels: first steps
Soc. Netw.
(1983) - et al.
Introduction to stochastic actor-based models for network dynamics
Social Networks
(2010) - et al.
Exponential random graph (p) models for affiliation networks
Social Networks
(2009) Categorical Data Analysis
(2013)- et al.
Social network modeling
Annu. Rev. Stat. Appl.
(2018) - et al.
New consistent and asymptotically normal parameter estimates for random-graph mixture models
J. R. Stat. Soc. Ser. B Stat. Methodol.
(2012) - et al.
- et al.
Assessing a mixture model for clustering with the integrated completed likelihood
IEEE Trans. Pattern Anal. Mach. Intell.
(2000) A relational event framework for social action
Sociol. Methodol.
(2008)
Theoretical Statistics
A mixture model for random graphs
Stat. Comput.
Maximum likelihood from incomplete data via the EM algorithm
J. R. Stat. Soc. Ser. B Stat. Methodol.
Analysis of Longitudinal Data
Nonparametric Bayes dynamic modelling of relational data
Biometrika
Bootstrap methods: another look at the jackknife
Ann. Statist.
A survey of statistical network models
Found. Trends Mach. Learn.
Discrete temporal models of social networks
Electron. J. Stat.
Local structure in social networks
Sociol. Methodol.
Cited by (16)
Hybrid maximum likelihood inference for stochastic block models
2022, Computational Statistics and Data AnalysisCitation Excerpt :The analysis of the belligerent network provides further insights into the proposed method. An interesting and straightforward evolution of the approach here illustrated may entail the analysis of dynamic networks (Yang et al., 2011; Matias and Miele, 2017; Bartolucci et al., 2018; Bartolucci and Pandolfi, 2020). Other possible extensions entail the treatment of directed and/or weighted networks, as well as considering nodal attributes as influencing the observed or the latent structure of the model.
A stochastic block model approach for the analysis of multilevel networks: An application to the sociology of organizations
2021, Computational Statistics and Data AnalysisCitation Excerpt :The ICL has since illustrated its efficiency and relevance for various SBMs and their extensions such as multiplex network (Barbillon et al., 2017), dynamic SBM (Matias and Miele, 2017) or degree corrected SBM (Yan, 2016). A further reference for dynamic SBMs is Bartolucci et al. (2018). Besides, a critical issue in sociology is to verify the multilevel interdependence hypothesis in a multilevel network, i.e. if the two levels (inter-individual and inter-organizational) should be analyzed jointly or if a separate analysis is sufficient.
An exact algorithm for time-dependent variational inference for the dynamic stochastic block model
2020, Pattern Recognition LettersCitation Excerpt :Xu and Hero [6] further extended the dynamic SBM of Yang et al. [5] by introducing time-varying edge probabilities, whereas Matias and Miele [7] developed a more general framework, including the previous two as special cases, discussing in detail issues related to identifiability. Rastelli et al. [8] also assumed a Markov-type structure to model the evolution of groups across time, while another relevant dynamic extension of the SBM, focused in particular on reciprocity, has been introduced by Bartolucci et al. [9]. Finally, Xu [10] proposed the so-called stochastic transition block models in which the presence of a relation at a given occasion directly influences future relation probabilities; see also Rastelli [11] for an extension of this approach in terms of Bayesian inference.
Robust stochastic block model
2020, NeurocomputingCitation Excerpt :This characteristic makes SBM also usable for network prediction. The flexibility of SBM in network partitioning makes it useful in many tasks, such as community detection [26,27], link prediction, dynamic network analysis [15,28,29], multi-structure detection [11], symbolic network analysis [30], etc. The researchers have proposed a variety of SBM-based models to solve different problems.
A dynamic network model with persistent links and node-specific latent variables, with an application to the interbank market
2020, European Journal of Operational ResearchCitation Excerpt :On the other hand, Xu (2015) focuses on a stochastic block transition model in which the probability of future links is influenced both by past links and present communities, but preserving the static SBM marginal distribution for the adjacency matrix of a graph at any given time step. In order to investigate reciprocity in asymmetric networks, Bartolucci, Marino, and Pandolfi (2018) introduce a dyadic sampling of reciprocal pairs of links, based on the dynamic stochastic block model. Recently, Friel, Rastelli, Wyse, and Raftery (2016) study the persistence in company boards of directors of Irish companies by introducing a bipartite model which utilises two Latent Euclidean spaces for the two types of nodes, i.e. directors and boards, and two persistence parameters for conditioning future probabilities on both absent and present past links.