Dealing with reciprocity in dynamic stochastic block models

https://doi.org/10.1016/j.csda.2018.01.010Get rights and content

Abstract

A stochastic block model for dynamic network data is introduced, where directed relations among a set of nodes are observed at different time occasions and the blocks are represented by a sequence of latent variables following a Markov chain. Dyads are explicitly modeled conditional on the states occupied by both nodes involved in the relation. With respect to the approaches already available in the literature, the main focus is on reciprocity. In this regard, three different parameterizations are proposed in which: (i) reciprocity is allowed to depend on the blocks of the nodes in the dyad; (ii) reciprocity is assumed to be constant across blocks; and (iii) reciprocity is ruled out. The assumption of conditional independence between dyads (referred to different pairs of nodes and time occasions) given the latent blocks is always retained. Given the complexity of the model, inference on its parameters is based on a variational approach, where a lower bound of the log-likelihood function is maximized instead of the intractable full model log-likelihood. An approximate likelihood ratio test statistic is proposed which compares the value at convergence of this lower bound under different model specifications. This allows us to formally test for both the hypothesis of no reciprocity and that of constant reciprocity with respect to the latent blocks. The proposed approach is illustrated via a simulation study based on different scenarios. The application to two benchmark datasets in the social network literature is also proposed to illustrate the effectiveness of the proposal in studying reciprocity and identifying groups of nodes having a similar social behavior.

Introduction

A number of social, behavioral, and biological phenomena can be naturally represented in terms of networks. In this literature, the relation between units, that is, “actors” or “nodes”, is the main target of inference and statistical models for the analysis of these relations have known a flowering interest. Most methods available in the literature are tailored to deal with static networks, where data consist of a single snapshot observed at a given occasion; see, among others, Goldenberg et al. (2010) and Amati et al. (2018) for a review.

Within this context, models for clustering and community discovering based on latent variables play an important role. Among these models, it is worth mentioning latent space models Sarkar and Moore (2006), Sarkar et al. (2007), Hoff (2011), Lee and Priebe (2011), Durante and Dunson (2014), which project network nodes on a reduced latent space where relations between them are explored, and Stochastic Block Models (SBMs; Holland et al. (1983), Snijders and Nowicki (1997), Nowicki and Snijders (2001), Daudin et al. (2008)), which assume that network nodes belong to one of k distinct blocks. In this latter framework, relational variables are assumed to be independent conditional on the blocks of the nodes involved in the relation (local independence assumption). Blocks are defined by a discrete latent variable, with the probability of observing a connection between two nodes only depending on the corresponding block memberships. Therefore, nodes in the same block connect to all the others in a similar fashion and are said to be stochastically equivalent. The identification of these blocks provides a concise description of the network.

However, in some cases, the research interest may be on the evolution of the network over time, provided that longitudinal network data are available. In this context, standard tools of analysis need to be extended to deal with observations repeatedly taken over time, that is, with multiple snapshots of the network observed at different time points. Although longitudinal data permit a deeper study of the phenomenon of interest, the dependence between measures taken on the same sample units represents a further challenge that has to be faced (e.g., Diggle et al., 2002).

The literature about models specifically tailored to deal with dynamic networks is rather recent, with most proposals starting from approaches developed for static networks. In this article, we focus on extensions of the SBM for longitudinal data. In particular, Yang et al. (2011) developed a dynamic SBM by considering time-varying block memberships that evolve over time according to an unobservable Markov chain. The resulting model can be conceived as a particular type of hidden (latent) Markov model (for general references, see Bartolucci et al. (2013), Zucchini et al. (2016)) for dynamic networks. Xu and Hero (2014) further extended the dynamic SBM of Yang et al. (2011) by considering time-varying edge probabilities. The same model has been recently discussed by Matias and Miele (2017), who proposed an approach to solve the lack of identifiability due to label switching between time steps and a well-principled estimation approach. Finally, Xu (2015) proposed a model in which the presence of a relation at a given occasion directly influences future relation probabilities. An approach that is in between the dynamic latent space and the dynamic SBM is the dynamic mixed-membership SBM of Xing et al. (2010) and Ho et al. (2011). In this context, each node may have partial membership to different blocks.

Alternative proposals to the dynamic SBM are represented by the dynamic exponential random graph model for the analysis of social networks observed in discrete time (Robins and Pattison (2001), Hanneke et al. (2010), Lusher et al. (2013), Chapter 10). Further references include the stochastic actor-oriented model Snijders (1996), Snijders (2001), Snijders (2005), Snijders et al. (2010) and the relational event model Butts (2008), Quintane et al. (2014), which are based on a continuous time Markov process and on a time-to-event representation, respectively.

Extending the proposal of Yang et al. (2011), we develop an SBM for dynamic networks observed in discrete time in which the principal element of analysis is the dyad referred to each pair of nodes, conditional on the blocks they occupy at each occasion. Therefore, we avoid restrictive assumptions about the dependence/independence between reciprocal relations and, thus, obtain higher flexibility than that of standard dynamic SBMs. The main assumption is that of conditional independence between the dyads, given the corresponding latent variables representing the blocks. Note, however, that marginal dependence between dyads is not ruled out, but is explained in a meaningful way by the latent variables. Therefore, triangulation or similar higher-order effects are accounted for. In agreement with Vu et al. (2013), among others, conditional independence between dyads offers at least three advantages: (i) it leads to simplifications in the estimation process; (ii) it facilitates data simulation; and (iii) it avoids the degeneracy issue which is frequently encountered when dealing with SBMs.

To permit a deeper insight into reciprocity effects, we propose to parametrically specify every dyadic relation between nodes in the network by means of a suitably formulated log-linear model, given the latent blocks. Therefore, we may distinguish between main and reciprocal effects reflecting the tendency to observe asymmetric and symmetric relations, respectively, and therefore we may obtain information on the network’s cohesion. In particular, our approach allows us to formulate three different hypotheses: (i) reciprocity may depend on the blocks to which the nodes involved in the relation belong; (ii) reciprocity is constant across blocks; and (iii) reciprocity is absent.

Estimation of the proposed model represents a challenging matter as computing the log-likelihood function would require the evaluation of a multiple summation defined over all possible configurations of the latent variables. Clearly, this becomes quickly unfeasible as the size of the network, and then the number of such latent variables, increases. In the literature, two main approaches are available to derive model parameter estimates. Markov Chain Monte Carlo (MCMC) algorithms represent a typical option in the Bayesian framework (e.g., Yang et al., 2011), while variational approximation methods represent a quite classical solution in the frequentist context (e.g., Yang et al. (2011), Matias and Miele (2017)). In this paper, we start from the proposal of Yang et al. (2011) and obtain parameter estimates through a Variational Expectation–Maximization (VEM) algorithm based on the assumption of posterior independence between dyads. We also propose an approximate inferential procedure with the aim of testing for the absence of reciprocity effects in the network or for the hypothesis that the level of reciprocity is constant with respect to the blocks. Starting from the lower bound of the likelihood function required for variational inference, we show how an approximate Likelihood Ratio (LR) test statistic, which is simply computed, may be used for inferential purposes on the reciprocity parameters.

Properties of the proposed inferential method, and in particular of the approximate LR test, are investigated via simulation and through the analysis of two benchmark datasets in the dynamic network literature: the Newcomb Fraternity network and the Enron network. The results show the potentialities of the proposed approach. Upon request, we make available the R implementation of the proposed estimation algorithm.

The paper is organized as follows. Section 2 introduces the dynamic SBM according to the initial proposal of Yang et al. (2011) and then illustrates the proposed extension to deal with different forms of reciprocity. Section 3 entails the description of the VEM algorithm for parameter estimation and introduces the approximate LR test for specific hypotheses on the type of reciprocity. The results of the simulation study and of the real data applications are provided in Sections 4 Simulation study, 5 Empirical applications, respectively. The last section contains some concluding remarks and outlines potential future developments.

Section snippets

Dynamic stochastic block models

For a network of n individuals observed at T time occasions, let Yij(t),i,j=1,,n,ji, denote a binary response variable which is equal to 1 if there exists an edge from node i to node j at occasion t and is equal to 0 otherwise; yij(t) is used to denote a realization of Yij(t). Moreover, let Y(t) be the binary adjacency matrix recorded at occasion t=1,,T, which summarizes the relations between nodes. Here, we focus on directed networks without self-loops, so that Y(t) is not constrained to be

Model inference

Let U={Ui,i=1,,n} denote the overall set of latent variables in the model; based on the assumptions introduced so far, the observed network distribution is obtained by marginalizing out all these latent variables from the joint distribution of Y and U. In particular, we have p(Y)=Up(Y,U)=Up(YU)p(U),where U denotes the sum over the support of U and p(YU)=i=1n1j=i+1nt=1Tp(yij(t),yji(t)Ui(t)=ui(t),Uj(t)=uj(t)),p(U)=i=1nλui(t)t=2Tπui(t)ui(t1).Computation of the network distribution

Simulation study

In this section, we illustrate the results of a large scale Monte Carlo simulation study. We focused both on the performance of the approximate LR statistics and on the clustering performance of the proposed approach and considered several distinct experimental scenarios, based on different network sizes and different values of the reciprocity parameter.

Empirical applications

In this section, we describe the application of the proposed methodology to two benchmark datasets in the network literature: the Newcomb Fraternity network and the Enron email network.

Concluding remarks

In this paper, we introduce a class of stochastic block models for dynamic networks where the standard hypothesis of independence between univariate responses is relaxed in favor of less stringent assumptions. In particular, the element of analysis is the set of dyads referred to ordered pairs of nodes and the assumption of conditional independence between them is considered. Obviously, marginal dependence, due for instance to triangulation effects, is not ruled out but can be, instead,

Acknowledgment

We acknowledge the financial support from grant RBFR12SHVV of the Italian Government (FIRB “Mixture and latent variable models for causal inference and analysis of socio-economic data”, 2012).

References (43)

  • Butts, C.T., Leslie-Cook, A., Krivitsky, P.N., Bender-deMoll, S., 2016. networkDynamic: Dynamic Extensions for Network...
  • CoxD.R. et al.

    Theoretical Statistics

    (1979)
  • DaudinJ.-J. et al.

    A mixture model for random graphs

    Stat. Comput.

    (2008)
  • DempsterA.P. et al.

    Maximum likelihood from incomplete data via the EM algorithm

    J. R. Stat. Soc. Ser. B Stat. Methodol.

    (1977)
  • DiggleP. et al.

    Analysis of Longitudinal Data

    (2002)
  • DuranteD. et al.

    Nonparametric Bayes dynamic modelling of relational data

    Biometrika

    (2014)
  • EfronB.

    Bootstrap methods: another look at the jackknife

    Ann. Statist.

    (1979)
  • GoldenbergA. et al.

    A survey of statistical network models

    Found. Trends Mach. Learn.

    (2010)
  • HannekeS. et al.

    Discrete temporal models of social networks

    Electron. J. Stat.

    (2010)
  • Ho, Q., Song, L., Xing, E.P., 2011. Evolving cluster mixed-membership blockmodel for time-evolving networks. In:...
  • HollandP.W. et al.

    Local structure in social networks

    Sociol. Methodol.

    (1976)
  • Cited by (16)

    • Hybrid maximum likelihood inference for stochastic block models

      2022, Computational Statistics and Data Analysis
      Citation Excerpt :

      The analysis of the belligerent network provides further insights into the proposed method. An interesting and straightforward evolution of the approach here illustrated may entail the analysis of dynamic networks (Yang et al., 2011; Matias and Miele, 2017; Bartolucci et al., 2018; Bartolucci and Pandolfi, 2020). Other possible extensions entail the treatment of directed and/or weighted networks, as well as considering nodal attributes as influencing the observed or the latent structure of the model.

    • A stochastic block model approach for the analysis of multilevel networks: An application to the sociology of organizations

      2021, Computational Statistics and Data Analysis
      Citation Excerpt :

      The ICL has since illustrated its efficiency and relevance for various SBMs and their extensions such as multiplex network (Barbillon et al., 2017), dynamic SBM (Matias and Miele, 2017) or degree corrected SBM (Yan, 2016). A further reference for dynamic SBMs is Bartolucci et al. (2018). Besides, a critical issue in sociology is to verify the multilevel interdependence hypothesis in a multilevel network, i.e. if the two levels (inter-individual and inter-organizational) should be analyzed jointly or if a separate analysis is sufficient.

    • An exact algorithm for time-dependent variational inference for the dynamic stochastic block model

      2020, Pattern Recognition Letters
      Citation Excerpt :

      Xu and Hero [6] further extended the dynamic SBM of Yang et al. [5] by introducing time-varying edge probabilities, whereas Matias and Miele [7] developed a more general framework, including the previous two as special cases, discussing in detail issues related to identifiability. Rastelli et al. [8] also assumed a Markov-type structure to model the evolution of groups across time, while another relevant dynamic extension of the SBM, focused in particular on reciprocity, has been introduced by Bartolucci et al. [9]. Finally, Xu [10] proposed the so-called stochastic transition block models in which the presence of a relation at a given occasion directly influences future relation probabilities; see also Rastelli [11] for an extension of this approach in terms of Bayesian inference.

    • Robust stochastic block model

      2020, Neurocomputing
      Citation Excerpt :

      This characteristic makes SBM also usable for network prediction. The flexibility of SBM in network partitioning makes it useful in many tasks, such as community detection [26,27], link prediction, dynamic network analysis [15,28,29], multi-structure detection [11], symbolic network analysis [30], etc. The researchers have proposed a variety of SBM-based models to solve different problems.

    • A dynamic network model with persistent links and node-specific latent variables, with an application to the interbank market

      2020, European Journal of Operational Research
      Citation Excerpt :

      On the other hand, Xu (2015) focuses on a stochastic block transition model in which the probability of future links is influenced both by past links and present communities, but preserving the static SBM marginal distribution for the adjacency matrix of a graph at any given time step. In order to investigate reciprocity in asymmetric networks, Bartolucci, Marino, and Pandolfi (2018) introduce a dyadic sampling of reciprocal pairs of links, based on the dynamic stochastic block model. Recently, Friel, Rastelli, Wyse, and Raftery (2016) study the persistence in company boards of directors of Irish companies by introducing a bipartite model which utilises two Latent Euclidean spaces for the two types of nodes, i.e. directors and boards, and two persistence parameters for conditioning future probabilities on both absent and present past links.

    View all citing articles on Scopus
    View full text