Skip to main content
Log in

A note on parallel sampling in Markov graphs

  • Original Paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

The paper proposes the use of parallel computing for Markov graphs as a subclass of exponential random graph models where the network statistics induce a conditional independence structure amongst the edges of the network. This conditional independence allows simulation of edges in parallel using multiple computing cores. Simulation in Markov models is helpful, since parameter estimation cannot be carried out analytically but requires simulation-based routines such as Markov chain Monte Carlo. In particular in large networks this can be computationally very demanding or even infeasible. Therefore, numerical enhancements are useful to accelerate computation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Bauer V (2016) pergm: parallel exponential random graph model simulation. https://github.com/VerenaMaier/pergm. Accessed 5 Dec 2018

  • Besag J (1972) Nearest-neighbor systems and the auto-logistic model for binary data. J R Stat Soc 34:75–83

    MATH  Google Scholar 

  • Bhamidi S, Bresler G, Sly A (2011) Mixing time of exponential random graphs. Ann Appl Probab 21(6):2146–2170

    Article  MathSciNet  MATH  Google Scholar 

  • Brockwell AE (2006) Parallel processing in Markov chain Monte Carlo simulation by pre-fetching. J Comput Graph Stat 15(1):246–261

    Article  Google Scholar 

  • Caimo A, Friel N (2011) Bayesian inference for exponential random graph models. Soc Netw 33(1):41–55

    Article  Google Scholar 

  • Dagum L, Menon R (1998) OpenMP: an industry-standard API for shared-memory programming. IEEE Comput Sci Eng 5(1):46–55

    Article  Google Scholar 

  • Eddelbuettel D, François R, Allaire J, Chambers J, Bates D, Ushey K (2011) Rcpp: seamless R and C++ integration. J Stat Softw 40(8):1–18

    Article  Google Scholar 

  • Erdős P, Rényi A (1959) On random graphs. Publi Math Debr 6(290):290–297

    MATH  Google Scholar 

  • Fienberg SE (2012) A brief history of statistical models for network analysis and open challenges. J Comput Graph Stat 21(4):825–839

    Article  MathSciNet  Google Scholar 

  • Frank O, Strauss D (1986) Markov graphs. J Am Stat Assoc 81(395):832–842

    Article  MathSciNet  MATH  Google Scholar 

  • Geyer C (1992) Practical Markov chain Monte Carlo. Stat Sci 7(4):473–483

    Article  Google Scholar 

  • Goldenberg A, Zheng AX, Fienberg SE, Airoldi EM (2010) A survey of statistical network models. Found Trends Mach Learn 2(2):129–233

    Article  MATH  Google Scholar 

  • Grama A, Karypis G, Kumar V, Gupta A (2003) Introduction to parallel computing, 2nd edn. Addison-Wesley, Boston

    MATH  Google Scholar 

  • Handcock MS, Hunter DR, Butts CT, Goodreau SM, Krivitsky PN, Morris M (2014) ergm: fit, simulate and diagnose exponential-family models for networks. The Statnet Project (http://www.statnet.org), http://CRAN.R-project.org/package=ergm, R package version 3.6.1. Accessed 11 Feb 2018

  • Holland PW, Leinhardt S (1981) An exponential family of probability distributions for directed graphs. J Am Stat Assoc 76(373):33–50

    Article  MathSciNet  MATH  Google Scholar 

  • Hummel RM, Hunter DR, Handcock MS (2012) Improving simulation-based algorithms for fitting ERGMs. J Comput Graph Stat 21(4):920–939

    Article  MathSciNet  Google Scholar 

  • Hunter DR, Handcock MS (2006) Inference in curved exponential family models for networks. J Comput Graph Stat 15(3):565–583

    Article  MathSciNet  Google Scholar 

  • Hunter DR, Krivitsky PN, Schweinberger M (2012) Computational statistical methods for social network analysis. J Comput Graph Stat 21(4):856–882

    Article  Google Scholar 

  • Kolaczyk ED (2009) Statistical analysis of network data. Springer, New York

    Book  MATH  Google Scholar 

  • Koskinen J, Daraganova G (2013) Exponential random graph model fundamentals. In: Lusher D, Koskinen J, Robins G (eds) Exponential random graph models for social networks. Cambridge University Press, Cambridge, pp 49–76

    Google Scholar 

  • Koskinen J, Wang P, Robins G, Pattison P (2018) Outliers and influential observations in exponential random graph models. Psychometrika 83(4):809–830. https://doi.org/10.1007/s11336-018-9635-8

    Article  MathSciNet  MATH  Google Scholar 

  • Lauritzen SL (1996) Graphical models, vol 17. Clarendon Press, Oxford

    MATH  Google Scholar 

  • Leskovec J, Krevl A (2014) SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data/. Accessed 22 Jan 2016

  • Leskovec J, Mcauley JJ (2012) Learning to discover social circles in ego networks. In: Pereira F, Burges CJC, Bottou L, Weinberger KQ (eds) Advances in neural information processing systems 25, Curran Associates, Inc., pp 539–547. http://papers.nips.cc/paper/4532-learning-to-discover-social-circles-in-ego-networks.pdf. Accessed 12 Dec 2015

  • Lusher D, Koskinen J, Robins G (2013) Exponential random graph models for social networks. Cambridge University Press, Cambridge

    Google Scholar 

  • Marino M, Stawinoga A (2011) Statistical methods for social networks: a focus on parallel computing. Metodol zv 8(1):57–77

    Google Scholar 

  • Morris M, Handcock MS, Hunter D (2008) Specification of exponential-family random graph models: terms and computational aspects. J Stat Softw 24(1):1–24

    Google Scholar 

  • Murray I, Ghahramani Z, MacKay D (2006) MCMC for doubly-intractable distributions. In: Proceedings of the 22nd annual conference on uncertainty in artificial intelligence (UAI-06), AUAI Press, Arlington, Virginia

  • Newman M, Barkema G (1999) Monte Carlo methods in statistical physics. Oxford University Press, New York

    MATH  Google Scholar 

  • OpenMP Architecture Review Board (1998) OpenMP application program interface. http://www.openmp.org. Accessed 7 Sept 2015

  • Pattison PE, Robins GL, Snijders TA, Wang P (2013) Conditional estimation of exponential random graph models from snowball sampling designs. J Math Psychol 57(6):284–296

    Article  MathSciNet  MATH  Google Scholar 

  • R Core Team (2016) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org. Accessed 5 Mar 2016

  • Ripley RM, Snijders TAB, Preciado P (2011) Manual for SIENA version 4.0. University of Oxford, Oxford

    Google Scholar 

  • Robins G, Snijders T, Wang P, Handcock M, Pattison P (2007) Recent developments in exponential random graph (p*) models for social networks. Soc Netw 29(2):192–215

    Article  Google Scholar 

  • Schweinberger M (2011) Instability, sensitivity, and degeneracy of discrete exponential families. J Am Stat Assoc 106(496):1361–1370

    Article  MathSciNet  MATH  Google Scholar 

  • Schweinberger M, Krivitsky PN, Butts CT, Stewart J (2017) Exponential-family models of random graphs: inference in finite-, super-, and infinite population scenarios. arXiv e-prints arXiv:1707.04800

  • Snijders TAB (2002) Markov chain Monte Carlo estimation of exponential random graph models. J Soc Struct 3(2):1–40

    MathSciNet  Google Scholar 

  • Snijders TAB (2010) Conditional marginalization for exponential random graph models. J Math Sociol 34(4):239–252

    Article  MATH  Google Scholar 

  • Snijders TAB, Pattison PE, Robins GL, Handcock MS (2006) New specifications for exponential random graph models. Sociol Methodol 36(1):99–153

    Article  Google Scholar 

  • Thiemichen S, Kauermann G (2017) Stable exponential random graph models with non-parametric components for large dense networks. Soc Netw 49:67–80

    Article  Google Scholar 

  • Tierney L, Rossini AJ, Li N, Sevcikova H (2016) snow: simple network of workstations. R package version 0.4-2. https://CRAN.R-project.org/package=snow. Accessed 10 Mar 2016

  • Wang P, Pattison P, Robins G (2013) Exponential random graph model specifications for bipartite networks—a dependence hierarchy. Soc Netw 35:211–222

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Verena Bauer.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Proof of asymptotic Markovian structure in log changes

The Markovian independence assumption is violated in the example when using log changes instead of simple sums, as in the 2-star or triangle. We can however proof that we obtain an asymptotic independence structure which justifies parallel draws in large networks. We show that the conditional log odds ratio for \(P(Y_{i,j}, Y_{l,m}|y_{rest})\) is of order \(O_p(N^{-1})\) if \(\{i,j\} \ne \{l,m\}\) and \(O_p(1)\) otherwise where \(y_{rest}\) refers to the network except of edges \(Y_{i,j}\) and \(Y_{l,m}\). In other words, we obtain asymptotic independence for large networks. To simplify notation we first look at \(Y_{1,2}\) and \(Y_{3,4}\). The log odds ratio for these ties which do not share a node results through

where

Note that \(a_{i,j}(N)\) and \(b_{i,j}(N)\) are of order \(O_p(N)\) with N as number of nodes. Taylor series approximation yields

$$\begin{aligned} \log \left( 2 + a_{i,j}(N) \right) - \log \left( a_{i,j}(N) \right) = \frac{2}{ a_{i,j}(N)}. \end{aligned}$$
(14)

Similarly, the component multiplied with \(\theta _3\) is of order \(O\left( b_{i,j}^{-1}(N)\right) \) and hence decreases to zero. This implies asymptotic independence of \(Y_{1,2}\) and \(Y_{3,4}\) given the rest of the network.

Let us now look at the pair \(Y_{1,2}\) and \(Y_{1,3}\), that is we look at ties that do share a node. The log odds results through

$$\begin{aligned} \text {Log odds}= & {} \log \left( \frac{P\left( y_{1,2} = 1, y_{1,3}, y_{rest}\right) }{P\left( y_{1,2} = 0, y_{1,3}, y_{rest} \right) } \right) \\= & {} \frac{1}{2} \theta _1 \left( \log \left( y_{1,3} + \sum \limits _{\begin{array}{c} k \ne 2,3 \end{array}} y_{1,k} + \sum \limits _{\begin{array}{c} k \ne 1,2 \end{array}} y_{2,k} \right) \right. \\&\left. +\, \sum \limits _{\begin{array}{c} j>2 \end{array}} y_{1,j} \left[ \log \left( 1 + y_{1,3} + \underbrace{\sum \limits _{\begin{array}{c} k \ne 2,3,j \end{array}} y_{1,k} + \sum \limits _{\begin{array}{c} k \ne j,1 \end{array}} y_{k,j}}_{c_{1,j}(N)} \right) \right. \right. \\&\left. \left. -\log \left( y_{1,3} + \underbrace{\sum _{k \ne 2,3,j} y_{1,k} + \sum _{k \ne j,1} y_{k,j}}_{c_{1,j}(N)} \right) \right] \right) \\&+\,\frac{1}{3} \theta _2 \left( \log \left( y_{1,3} y_{2,3} + \sum _{k \ne 1,2,3} y_{1,k} y_{2,k} \right) \right. \\&\left. +\, \sum _{j> 2} y_{1,j} \log \left( 1 \cdot y_{j,2} + y_{1,3} y_{j,3} + \underbrace{\sum _{k \ne 1,j,2,3} y_{1,k} y_{j,k}}_{d_{1,j}(N)} \right) \right. \\&\left. - \,\sum _{j>2} y_{1,j} \log \left( y_{1,3} y_{j,3} + \underbrace{\sum _{k \ne 1,j,2,3} y_{1,k} y_{j,k}}_{d_{1,j}(N)} \right) \right) \\= & {} \frac{1}{2} \theta _1 \left( \log \left( y_{1,3} + \sum \limits _{\begin{array}{c} k \ne 2,3 \end{array}} y_{1,k} + \sum \limits _{\begin{array}{c} k \ne 1,2 \end{array}} y_{2,k} \right) \right. \\&\left. + \,\sum \limits _{\begin{array}{c} j>2 \end{array}} y_{1,j} \Big [ \log \left( 1 + y_{1,3} + c_{1,j}(N) \right) -\log \left( y_{1,3} + c_{1,j}(N) \right) \Big ] \right) \\&+\frac{1}{3} \theta _3 \left( \log \left( y_{1,3} y_{2,3} + \sum _{k \ne 1,2,3} y_{1,k} y_{2,k} \right) \right. \\&\left. + \,\sum _{j> 2} y_{1,j} \log \left( 1 \cdot y_{j,2} + y_{1,3} y_{j,3} + d_{1,j}(N) \right) \right. \\&\left. -\, \sum _{j>2} y_{1,j} \log \left( y_{1,3} y_{j,3} + d_{1,j}(N) \right) \right) \end{aligned}$$

Hence, the log odds ratio for ties sharing a node results through

$$\begin{aligned}&\text {Log odds (B)} \\&\quad = \log \left( \frac{P\left( y_{1,2} = 1, y_{1,3} = 1, y_{rest}\right) P\left( y_{1,2} = 0, y_{1,3} = 0, y_{rest} \right) }{ P\left( y_{1,2} = 0, y_{1,3} = 1, y_{rest} \right) P\left( y_{1,2} = 1, y_{1,3} = 0, y_{rest} \right) } \right) \\&\quad = \frac{1}{2} \theta _1 \left( \log \left( 1 + \sum _{k \ne 2,3} y_{1,k} + \sum _{k \ne 1,2} y_{2,k} \right) - \log \left( \sum _{k \ne 2,3} y_{1,k} + \sum _{k \ne 1, 2} y_{2,k} \right) \right. \\&\quad \quad \left. +\,\sum _{j>2} y_{1,j} \Big [ \log \left( 2 + c_{1,j}(N)\right) - \log \left( c_{1,j}(N) \right) \Big ] \right) \\&\quad \quad + \,\frac{1}{3} \theta _2 \left( \log \left( 1 \cdot y_{2,3} +\sum _{k \ne 1,2,3} y_{1,k} y_{2,k} \right) - \log \left( \sum _{k \ne 1,2,3}y_{1,k} y_{2,k}\right) \right. \\&\qquad \left. +\, \sum _{j>2} y_{1,j} \Big [ \log \left( 1 \cdot y_{j,2} + 1 \cdot y_{j,3} + d_{1,j}(N) \right) - \log \left( 1 \cdot y_{j,3} + d_{1,j}(N) \right) \right. \\&\qquad \left. -\, \log \left( 1 \cdot y_{j,2} + d_{1,j}(N) \right) + \log \left( d_{1,j}(N) \right) \Big ] \right) \end{aligned}$$

The first component in the bracketed term at \(\theta _2\) is of order \(O_p(N^{-1})\) since

$$\begin{aligned} \log \left( 1\cdot y_{2,3} +\sum _{k \ne 1,2,3} y_{1,k} y_{2,k} \right) = \log \left( \sum _{k \ne 1,2,3} y_{1,k} y_{2,k} \right) + \underbrace{\frac{1}{\sum _{k \ne 1,2,3} y_{1,k} y_{2,k} } y_{2,3}}_{O_p(N^{-1})} \end{aligned}$$

The second component in the bracketed term is however of order O(1), which follows since the components in the sum are of order \(O_p(N^{-1})\) but the outer sum is of order \(O_p(N)\) so that a component of order O(1) results. The bracketed term at \(\theta _1\) shows the same behaviour.

Hence we can conclude with (A) and (B): The conditional log odds ratio results to

$$\begin{aligned}&\log \left( \frac{P\left( y_{i,j} = 1, y_{l,m} = 1 | y_{rest}\right) P\left( y_{i,j} = 0, y_{l,m} = 0 | y_{rest} \right) }{P\left( y_{i,j} = 0, y_{l,m} = 1| y_{rest} \right) P\left( y_{i,j} = 1, y_{l,m} = 0| y_{rest} \right) }\right) \\&\quad ={\left\{ \begin{array}{ll} O_p(N^{-1}) &{} \text {if}\;\; \{i,j\} \ne \{l,m\}\\ O_p(1) &{} \text {otherwise}. \end{array}\right. } \end{aligned}$$

In other words, we obtain asymptotically a Markovian independence structure, which allows to draw parallel draws.

Appendix B: Diagnostic plots

Different diagnostic plots show the quality of the proposed algorithms. We provide traceplots for two parameter constellations, resulting in a sparse and middle dense network with 2200 nodes. Figure 8 shows that the mixing in pergm is about the same as mixing for ergm with the random option of selecting ties. The TNT sampler of the ergm-Package reaches the target distribution faster. The contrary can be seen in Fig. 9 if the network is denser.

Fig. 8
figure 8

Traceplot for a simulated network with edges, 2-stars and triangles parameter \(\theta = (-0.1, -0.01, 0.02)\) for the different algorithms. These parameters correspond to a sparse network with an approximate density of 0.06

Fig. 9
figure 9

Traceplot for a simulated network with edges, 2-stars and triangles parameter \(\theta = (6, -0.01, 0.03)\) for the different algorithms. These parameters correspond to a middle dense network with an approximate density of 0.26

To assess adequate mixing, we simulate 1000 networks (iterations) per algorithm, each network consists of \(N=2200\) nodes and \(10 \cdot N^2\) update steps after a reasonable burnin. We provide the density, traceplot, running mean and autocorrelation of the resulting edge, 2-star and triangle counts (Figs. 10, 11, 12, 13).

Fig. 10
figure 10

Density plots with parameter distribution of edge, 2-star and triangle counts of different algorithms

Fig. 11
figure 11

Traceplots of edge, 2-star and triangle counts of different algorithms

Fig. 12
figure 12

Running means per edge, 2-star and triangle counts and algorithms

Fig. 13
figure 13

Autocorrelation per edge, 2-star and triangle counts and algorithms

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bauer, V., Fürlinger, K. & Kauermann, G. A note on parallel sampling in Markov graphs. Comput Stat 34, 1087–1107 (2019). https://doi.org/10.1007/s00180-019-00880-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-019-00880-4

Keywords

Navigation