Abstract
The paper proposes the use of parallel computing for Markov graphs as a subclass of exponential random graph models where the network statistics induce a conditional independence structure amongst the edges of the network. This conditional independence allows simulation of edges in parallel using multiple computing cores. Simulation in Markov models is helpful, since parameter estimation cannot be carried out analytically but requires simulation-based routines such as Markov chain Monte Carlo. In particular in large networks this can be computationally very demanding or even infeasible. Therefore, numerical enhancements are useful to accelerate computation.
Similar content being viewed by others
References
Bauer V (2016) pergm: parallel exponential random graph model simulation. https://github.com/VerenaMaier/pergm. Accessed 5 Dec 2018
Besag J (1972) Nearest-neighbor systems and the auto-logistic model for binary data. J R Stat Soc 34:75–83
Bhamidi S, Bresler G, Sly A (2011) Mixing time of exponential random graphs. Ann Appl Probab 21(6):2146–2170
Brockwell AE (2006) Parallel processing in Markov chain Monte Carlo simulation by pre-fetching. J Comput Graph Stat 15(1):246–261
Caimo A, Friel N (2011) Bayesian inference for exponential random graph models. Soc Netw 33(1):41–55
Dagum L, Menon R (1998) OpenMP: an industry-standard API for shared-memory programming. IEEE Comput Sci Eng 5(1):46–55
Eddelbuettel D, François R, Allaire J, Chambers J, Bates D, Ushey K (2011) Rcpp: seamless R and C++ integration. J Stat Softw 40(8):1–18
Erdős P, Rényi A (1959) On random graphs. Publi Math Debr 6(290):290–297
Fienberg SE (2012) A brief history of statistical models for network analysis and open challenges. J Comput Graph Stat 21(4):825–839
Frank O, Strauss D (1986) Markov graphs. J Am Stat Assoc 81(395):832–842
Geyer C (1992) Practical Markov chain Monte Carlo. Stat Sci 7(4):473–483
Goldenberg A, Zheng AX, Fienberg SE, Airoldi EM (2010) A survey of statistical network models. Found Trends Mach Learn 2(2):129–233
Grama A, Karypis G, Kumar V, Gupta A (2003) Introduction to parallel computing, 2nd edn. Addison-Wesley, Boston
Handcock MS, Hunter DR, Butts CT, Goodreau SM, Krivitsky PN, Morris M (2014) ergm: fit, simulate and diagnose exponential-family models for networks. The Statnet Project (http://www.statnet.org), http://CRAN.R-project.org/package=ergm, R package version 3.6.1. Accessed 11 Feb 2018
Holland PW, Leinhardt S (1981) An exponential family of probability distributions for directed graphs. J Am Stat Assoc 76(373):33–50
Hummel RM, Hunter DR, Handcock MS (2012) Improving simulation-based algorithms for fitting ERGMs. J Comput Graph Stat 21(4):920–939
Hunter DR, Handcock MS (2006) Inference in curved exponential family models for networks. J Comput Graph Stat 15(3):565–583
Hunter DR, Krivitsky PN, Schweinberger M (2012) Computational statistical methods for social network analysis. J Comput Graph Stat 21(4):856–882
Kolaczyk ED (2009) Statistical analysis of network data. Springer, New York
Koskinen J, Daraganova G (2013) Exponential random graph model fundamentals. In: Lusher D, Koskinen J, Robins G (eds) Exponential random graph models for social networks. Cambridge University Press, Cambridge, pp 49–76
Koskinen J, Wang P, Robins G, Pattison P (2018) Outliers and influential observations in exponential random graph models. Psychometrika 83(4):809–830. https://doi.org/10.1007/s11336-018-9635-8
Lauritzen SL (1996) Graphical models, vol 17. Clarendon Press, Oxford
Leskovec J, Krevl A (2014) SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data/. Accessed 22 Jan 2016
Leskovec J, Mcauley JJ (2012) Learning to discover social circles in ego networks. In: Pereira F, Burges CJC, Bottou L, Weinberger KQ (eds) Advances in neural information processing systems 25, Curran Associates, Inc., pp 539–547. http://papers.nips.cc/paper/4532-learning-to-discover-social-circles-in-ego-networks.pdf. Accessed 12 Dec 2015
Lusher D, Koskinen J, Robins G (2013) Exponential random graph models for social networks. Cambridge University Press, Cambridge
Marino M, Stawinoga A (2011) Statistical methods for social networks: a focus on parallel computing. Metodol zv 8(1):57–77
Morris M, Handcock MS, Hunter D (2008) Specification of exponential-family random graph models: terms and computational aspects. J Stat Softw 24(1):1–24
Murray I, Ghahramani Z, MacKay D (2006) MCMC for doubly-intractable distributions. In: Proceedings of the 22nd annual conference on uncertainty in artificial intelligence (UAI-06), AUAI Press, Arlington, Virginia
Newman M, Barkema G (1999) Monte Carlo methods in statistical physics. Oxford University Press, New York
OpenMP Architecture Review Board (1998) OpenMP application program interface. http://www.openmp.org. Accessed 7 Sept 2015
Pattison PE, Robins GL, Snijders TA, Wang P (2013) Conditional estimation of exponential random graph models from snowball sampling designs. J Math Psychol 57(6):284–296
R Core Team (2016) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org. Accessed 5 Mar 2016
Ripley RM, Snijders TAB, Preciado P (2011) Manual for SIENA version 4.0. University of Oxford, Oxford
Robins G, Snijders T, Wang P, Handcock M, Pattison P (2007) Recent developments in exponential random graph (p*) models for social networks. Soc Netw 29(2):192–215
Schweinberger M (2011) Instability, sensitivity, and degeneracy of discrete exponential families. J Am Stat Assoc 106(496):1361–1370
Schweinberger M, Krivitsky PN, Butts CT, Stewart J (2017) Exponential-family models of random graphs: inference in finite-, super-, and infinite population scenarios. arXiv e-prints arXiv:1707.04800
Snijders TAB (2002) Markov chain Monte Carlo estimation of exponential random graph models. J Soc Struct 3(2):1–40
Snijders TAB (2010) Conditional marginalization for exponential random graph models. J Math Sociol 34(4):239–252
Snijders TAB, Pattison PE, Robins GL, Handcock MS (2006) New specifications for exponential random graph models. Sociol Methodol 36(1):99–153
Thiemichen S, Kauermann G (2017) Stable exponential random graph models with non-parametric components for large dense networks. Soc Netw 49:67–80
Tierney L, Rossini AJ, Li N, Sevcikova H (2016) snow: simple network of workstations. R package version 0.4-2. https://CRAN.R-project.org/package=snow. Accessed 10 Mar 2016
Wang P, Pattison P, Robins G (2013) Exponential random graph model specifications for bipartite networks—a dependence hierarchy. Soc Netw 35:211–222
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Proof of asymptotic Markovian structure in log changes
The Markovian independence assumption is violated in the example when using log changes instead of simple sums, as in the 2-star or triangle. We can however proof that we obtain an asymptotic independence structure which justifies parallel draws in large networks. We show that the conditional log odds ratio for \(P(Y_{i,j}, Y_{l,m}|y_{rest})\) is of order \(O_p(N^{-1})\) if \(\{i,j\} \ne \{l,m\}\) and \(O_p(1)\) otherwise where \(y_{rest}\) refers to the network except of edges \(Y_{i,j}\) and \(Y_{l,m}\). In other words, we obtain asymptotic independence for large networks. To simplify notation we first look at \(Y_{1,2}\) and \(Y_{3,4}\). The log odds ratio for these ties which do not share a node results through
where
Note that \(a_{i,j}(N)\) and \(b_{i,j}(N)\) are of order \(O_p(N)\) with N as number of nodes. Taylor series approximation yields
Similarly, the component multiplied with \(\theta _3\) is of order \(O\left( b_{i,j}^{-1}(N)\right) \) and hence decreases to zero. This implies asymptotic independence of \(Y_{1,2}\) and \(Y_{3,4}\) given the rest of the network.
Let us now look at the pair \(Y_{1,2}\) and \(Y_{1,3}\), that is we look at ties that do share a node. The log odds results through
Hence, the log odds ratio for ties sharing a node results through
The first component in the bracketed term at \(\theta _2\) is of order \(O_p(N^{-1})\) since
The second component in the bracketed term is however of order O(1), which follows since the components in the sum are of order \(O_p(N^{-1})\) but the outer sum is of order \(O_p(N)\) so that a component of order O(1) results. The bracketed term at \(\theta _1\) shows the same behaviour.
Hence we can conclude with (A) and (B): The conditional log odds ratio results to
In other words, we obtain asymptotically a Markovian independence structure, which allows to draw parallel draws.
Appendix B: Diagnostic plots
Different diagnostic plots show the quality of the proposed algorithms. We provide traceplots for two parameter constellations, resulting in a sparse and middle dense network with 2200 nodes. Figure 8 shows that the mixing in pergm is about the same as mixing for ergm with the random option of selecting ties. The TNT sampler of the ergm-Package reaches the target distribution faster. The contrary can be seen in Fig. 9 if the network is denser.
To assess adequate mixing, we simulate 1000 networks (iterations) per algorithm, each network consists of \(N=2200\) nodes and \(10 \cdot N^2\) update steps after a reasonable burnin. We provide the density, traceplot, running mean and autocorrelation of the resulting edge, 2-star and triangle counts (Figs. 10, 11, 12, 13).
Rights and permissions
About this article
Cite this article
Bauer, V., Fürlinger, K. & Kauermann, G. A note on parallel sampling in Markov graphs. Comput Stat 34, 1087–1107 (2019). https://doi.org/10.1007/s00180-019-00880-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-019-00880-4