A note on parallel sampling in Markov graphs

Bauer, Verena; Fürlinger, Karl; Kauermann, Göran

doi:10.1007/s00180-019-00880-4

A note on parallel sampling in Markov graphs

Original Paper
Published: 11 March 2019

Volume 34, pages 1087–1107, (2019)
Cite this article

Computational Statistics Aims and scope Submit manuscript

318 Accesses
Explore all metrics

Abstract

The paper proposes the use of parallel computing for Markov graphs as a subclass of exponential random graph models where the network statistics induce a conditional independence structure amongst the edges of the network. This conditional independence allows simulation of edges in parallel using multiple computing cores. Simulation in Markov models is helpful, since parameter estimation cannot be carried out analytically but requires simulation-based routines such as Markov chain Monte Carlo. In particular in large networks this can be computationally very demanding or even infeasible. Therefore, numerical enhancements are useful to accelerate computation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Convergence Details About k-DPP Monte-Carlo Sampling for Large Graphs

Article 09 April 2021

Auxiliary Parameter MCMC for Exponential Random Graph Models

Article 27 October 2016

Parallel Algorithms for Generating Random Networks with Given Degree Sequences

Article 06 October 2015

References

Bauer V (2016) pergm: parallel exponential random graph model simulation. https://github.com/VerenaMaier/pergm. Accessed 5 Dec 2018
Besag J (1972) Nearest-neighbor systems and the auto-logistic model for binary data. J R Stat Soc 34:75–83
MATH Google Scholar
Bhamidi S, Bresler G, Sly A (2011) Mixing time of exponential random graphs. Ann Appl Probab 21(6):2146–2170
Article MathSciNet MATH Google Scholar
Brockwell AE (2006) Parallel processing in Markov chain Monte Carlo simulation by pre-fetching. J Comput Graph Stat 15(1):246–261
Article Google Scholar
Caimo A, Friel N (2011) Bayesian inference for exponential random graph models. Soc Netw 33(1):41–55
Article Google Scholar
Dagum L, Menon R (1998) OpenMP: an industry-standard API for shared-memory programming. IEEE Comput Sci Eng 5(1):46–55
Article Google Scholar
Eddelbuettel D, François R, Allaire J, Chambers J, Bates D, Ushey K (2011) Rcpp: seamless R and C++ integration. J Stat Softw 40(8):1–18
Article Google Scholar
Erdős P, Rényi A (1959) On random graphs. Publi Math Debr 6(290):290–297
MATH Google Scholar
Fienberg SE (2012) A brief history of statistical models for network analysis and open challenges. J Comput Graph Stat 21(4):825–839
Article MathSciNet Google Scholar
Frank O, Strauss D (1986) Markov graphs. J Am Stat Assoc 81(395):832–842
Article MathSciNet MATH Google Scholar
Geyer C (1992) Practical Markov chain Monte Carlo. Stat Sci 7(4):473–483
Article Google Scholar
Goldenberg A, Zheng AX, Fienberg SE, Airoldi EM (2010) A survey of statistical network models. Found Trends Mach Learn 2(2):129–233
Article MATH Google Scholar
Grama A, Karypis G, Kumar V, Gupta A (2003) Introduction to parallel computing, 2nd edn. Addison-Wesley, Boston
MATH Google Scholar
Handcock MS, Hunter DR, Butts CT, Goodreau SM, Krivitsky PN, Morris M (2014) ergm: fit, simulate and diagnose exponential-family models for networks. The Statnet Project (http://www.statnet.org), http://CRAN.R-project.org/package=ergm, R package version 3.6.1. Accessed 11 Feb 2018
Holland PW, Leinhardt S (1981) An exponential family of probability distributions for directed graphs. J Am Stat Assoc 76(373):33–50
Article MathSciNet MATH Google Scholar
Hummel RM, Hunter DR, Handcock MS (2012) Improving simulation-based algorithms for fitting ERGMs. J Comput Graph Stat 21(4):920–939
Article MathSciNet Google Scholar
Hunter DR, Handcock MS (2006) Inference in curved exponential family models for networks. J Comput Graph Stat 15(3):565–583
Article MathSciNet Google Scholar
Hunter DR, Krivitsky PN, Schweinberger M (2012) Computational statistical methods for social network analysis. J Comput Graph Stat 21(4):856–882
Article Google Scholar
Kolaczyk ED (2009) Statistical analysis of network data. Springer, New York
Book MATH Google Scholar
Koskinen J, Daraganova G (2013) Exponential random graph model fundamentals. In: Lusher D, Koskinen J, Robins G (eds) Exponential random graph models for social networks. Cambridge University Press, Cambridge, pp 49–76
Google Scholar
Koskinen J, Wang P, Robins G, Pattison P (2018) Outliers and influential observations in exponential random graph models. Psychometrika 83(4):809–830. https://doi.org/10.1007/s11336-018-9635-8
Article MathSciNet MATH Google Scholar
Lauritzen SL (1996) Graphical models, vol 17. Clarendon Press, Oxford
MATH Google Scholar
Leskovec J, Krevl A (2014) SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data/. Accessed 22 Jan 2016
Leskovec J, Mcauley JJ (2012) Learning to discover social circles in ego networks. In: Pereira F, Burges CJC, Bottou L, Weinberger KQ (eds) Advances in neural information processing systems 25, Curran Associates, Inc., pp 539–547. http://papers.nips.cc/paper/4532-learning-to-discover-social-circles-in-ego-networks.pdf. Accessed 12 Dec 2015
Lusher D, Koskinen J, Robins G (2013) Exponential random graph models for social networks. Cambridge University Press, Cambridge
Google Scholar
Marino M, Stawinoga A (2011) Statistical methods for social networks: a focus on parallel computing. Metodol zv 8(1):57–77
Google Scholar
Morris M, Handcock MS, Hunter D (2008) Specification of exponential-family random graph models: terms and computational aspects. J Stat Softw 24(1):1–24
Google Scholar
Murray I, Ghahramani Z, MacKay D (2006) MCMC for doubly-intractable distributions. In: Proceedings of the 22nd annual conference on uncertainty in artificial intelligence (UAI-06), AUAI Press, Arlington, Virginia
Newman M, Barkema G (1999) Monte Carlo methods in statistical physics. Oxford University Press, New York
MATH Google Scholar
OpenMP Architecture Review Board (1998) OpenMP application program interface. http://www.openmp.org. Accessed 7 Sept 2015
Pattison PE, Robins GL, Snijders TA, Wang P (2013) Conditional estimation of exponential random graph models from snowball sampling designs. J Math Psychol 57(6):284–296
Article MathSciNet MATH Google Scholar
R Core Team (2016) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org. Accessed 5 Mar 2016
Ripley RM, Snijders TAB, Preciado P (2011) Manual for SIENA version 4.0. University of Oxford, Oxford
Google Scholar
Robins G, Snijders T, Wang P, Handcock M, Pattison P (2007) Recent developments in exponential random graph (p*) models for social networks. Soc Netw 29(2):192–215
Article Google Scholar
Schweinberger M (2011) Instability, sensitivity, and degeneracy of discrete exponential families. J Am Stat Assoc 106(496):1361–1370
Article MathSciNet MATH Google Scholar
Schweinberger M, Krivitsky PN, Butts CT, Stewart J (2017) Exponential-family models of random graphs: inference in finite-, super-, and infinite population scenarios. arXiv e-prints arXiv:1707.04800
Snijders TAB (2002) Markov chain Monte Carlo estimation of exponential random graph models. J Soc Struct 3(2):1–40
MathSciNet Google Scholar
Snijders TAB (2010) Conditional marginalization for exponential random graph models. J Math Sociol 34(4):239–252
Article MATH Google Scholar
Snijders TAB, Pattison PE, Robins GL, Handcock MS (2006) New specifications for exponential random graph models. Sociol Methodol 36(1):99–153
Article Google Scholar
Thiemichen S, Kauermann G (2017) Stable exponential random graph models with non-parametric components for large dense networks. Soc Netw 49:67–80
Article Google Scholar
Tierney L, Rossini AJ, Li N, Sevcikova H (2016) snow: simple network of workstations. R package version 0.4-2. https://CRAN.R-project.org/package=snow. Accessed 10 Mar 2016
Wang P, Pattison P, Robins G (2013) Exponential random graph model specifications for bipartite networks—a dependence hierarchy. Soc Netw 35:211–222
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics, Ludwig-Maximilians-Universität München, Ludwigstraße 33, 80539, Munich, Germany
Verena Bauer & Göran Kauermann
Munich Network Management Team, Ludwig-Maximilians-Universität München, Oettingenstraße 67, 80538, Munich, Germany
Karl Fürlinger

Authors

Verena Bauer
View author publications
You can also search for this author in PubMed Google Scholar
Karl Fürlinger
View author publications
You can also search for this author in PubMed Google Scholar
Göran Kauermann
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Verena Bauer.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Proof of asymptotic Markovian structure in log changes

The Markovian independence assumption is violated in the example when using log changes instead of simple sums, as in the 2-star or triangle. We can however proof that we obtain an asymptotic independence structure which justifies parallel draws in large networks. We show that the conditional log odds ratio for $P(Y_{i,j}, Y_{l,m}|y_{rest})$ is of order $O_p(N^{-1})$ if $\{i,j\} \ne \{l,m\}$ and $O_p(1)$ otherwise where $y_{rest}$ refers to the network except of edges $Y_{i,j}$ and $Y_{l,m}$. In other words, we obtain asymptotic independence for large networks. To simplify notation we first look at $Y_{1,2}$ and $Y_{3,4}$. The log odds ratio for these ties which do not share a node results through

where

Note that $a_{i,j}(N)$ and $b_{i,j}(N)$ are of order $O_p(N)$ with N as number of nodes. Taylor series approximation yields

$$\begin{aligned} \log \left( 2 + a_{i,j}(N) \right) - \log \left( a_{i,j}(N) \right) = \frac{2}{ a_{i,j}(N)}. \end{aligned}$$

(14)

Similarly, the component multiplied with $\theta _3$ is of order $O\left( b_{i,j}^{-1}(N)\right) $ and hence decreases to zero. This implies asymptotic independence of $Y_{1,2}$ and $Y_{3,4}$ given the rest of the network.

Let us now look at the pair $Y_{1,2}$ and $Y_{1,3}$, that is we look at ties that do share a node. The log odds results through

$$\begin{aligned} \text {Log odds}= & {} \log \left( \frac{P\left( y_{1,2} = 1, y_{1,3}, y_{rest}\right) }{P\left( y_{1,2} = 0, y_{1,3}, y_{rest} \right) } \right) \\= & {} \frac{1}{2} \theta _1 \left( \log \left( y_{1,3} + \sum \limits _{\begin{array}{c} k \ne 2,3 \end{array}} y_{1,k} + \sum \limits _{\begin{array}{c} k \ne 1,2 \end{array}} y_{2,k} \right) \right. \\&\left. +\, \sum \limits _{\begin{array}{c} j>2 \end{array}} y_{1,j} \left[ \log \left( 1 + y_{1,3} + \underbrace{\sum \limits _{\begin{array}{c} k \ne 2,3,j \end{array}} y_{1,k} + \sum \limits _{\begin{array}{c} k \ne j,1 \end{array}} y_{k,j}}_{c_{1,j}(N)} \right) \right. \right. \\&\left. \left. -\log \left( y_{1,3} + \underbrace{\sum _{k \ne 2,3,j} y_{1,k} + \sum _{k \ne j,1} y_{k,j}}_{c_{1,j}(N)} \right) \right] \right) \\&+\,\frac{1}{3} \theta _2 \left( \log \left( y_{1,3} y_{2,3} + \sum _{k \ne 1,2,3} y_{1,k} y_{2,k} \right) \right. \\&\left. +\, \sum _{j> 2} y_{1,j} \log \left( 1 \cdot y_{j,2} + y_{1,3} y_{j,3} + \underbrace{\sum _{k \ne 1,j,2,3} y_{1,k} y_{j,k}}_{d_{1,j}(N)} \right) \right. \\&\left. - \,\sum _{j>2} y_{1,j} \log \left( y_{1,3} y_{j,3} + \underbrace{\sum _{k \ne 1,j,2,3} y_{1,k} y_{j,k}}_{d_{1,j}(N)} \right) \right) \\= & {} \frac{1}{2} \theta _1 \left( \log \left( y_{1,3} + \sum \limits _{\begin{array}{c} k \ne 2,3 \end{array}} y_{1,k} + \sum \limits _{\begin{array}{c} k \ne 1,2 \end{array}} y_{2,k} \right) \right. \\&\left. + \,\sum \limits _{\begin{array}{c} j>2 \end{array}} y_{1,j} \Big [ \log \left( 1 + y_{1,3} + c_{1,j}(N) \right) -\log \left( y_{1,3} + c_{1,j}(N) \right) \Big ] \right) \\&+\frac{1}{3} \theta _3 \left( \log \left( y_{1,3} y_{2,3} + \sum _{k \ne 1,2,3} y_{1,k} y_{2,k} \right) \right. \\&\left. + \,\sum _{j> 2} y_{1,j} \log \left( 1 \cdot y_{j,2} + y_{1,3} y_{j,3} + d_{1,j}(N) \right) \right. \\&\left. -\, \sum _{j>2} y_{1,j} \log \left( y_{1,3} y_{j,3} + d_{1,j}(N) \right) \right) \end{aligned}$$

Hence, the log odds ratio for ties sharing a node results through

$$\begin{aligned}&\text {Log odds (B)} \\&\quad = \log \left( \frac{P\left( y_{1,2} = 1, y_{1,3} = 1, y_{rest}\right) P\left( y_{1,2} = 0, y_{1,3} = 0, y_{rest} \right) }{ P\left( y_{1,2} = 0, y_{1,3} = 1, y_{rest} \right) P\left( y_{1,2} = 1, y_{1,3} = 0, y_{rest} \right) } \right) \\&\quad = \frac{1}{2} \theta _1 \left( \log \left( 1 + \sum _{k \ne 2,3} y_{1,k} + \sum _{k \ne 1,2} y_{2,k} \right) - \log \left( \sum _{k \ne 2,3} y_{1,k} + \sum _{k \ne 1, 2} y_{2,k} \right) \right. \\&\quad \quad \left. +\,\sum _{j>2} y_{1,j} \Big [ \log \left( 2 + c_{1,j}(N)\right) - \log \left( c_{1,j}(N) \right) \Big ] \right) \\&\quad \quad + \,\frac{1}{3} \theta _2 \left( \log \left( 1 \cdot y_{2,3} +\sum _{k \ne 1,2,3} y_{1,k} y_{2,k} \right) - \log \left( \sum _{k \ne 1,2,3}y_{1,k} y_{2,k}\right) \right. \\&\qquad \left. +\, \sum _{j>2} y_{1,j} \Big [ \log \left( 1 \cdot y_{j,2} + 1 \cdot y_{j,3} + d_{1,j}(N) \right) - \log \left( 1 \cdot y_{j,3} + d_{1,j}(N) \right) \right. \\&\qquad \left. -\, \log \left( 1 \cdot y_{j,2} + d_{1,j}(N) \right) + \log \left( d_{1,j}(N) \right) \Big ] \right) \end{aligned}$$

The first component in the bracketed term at $\theta _2$ is of order $O_p(N^{-1})$ since

$$\begin{aligned} \log \left( 1\cdot y_{2,3} +\sum _{k \ne 1,2,3} y_{1,k} y_{2,k} \right) = \log \left( \sum _{k \ne 1,2,3} y_{1,k} y_{2,k} \right) + \underbrace{\frac{1}{\sum _{k \ne 1,2,3} y_{1,k} y_{2,k} } y_{2,3}}_{O_p(N^{-1})} \end{aligned}$$

The second component in the bracketed term is however of order O(1), which follows since the components in the sum are of order $O_p(N^{-1})$ but the outer sum is of order $O_p(N)$ so that a component of order O(1) results. The bracketed term at $\theta _1$ shows the same behaviour.

Hence we can conclude with (A) and (B): The conditional log odds ratio results to

$$\begin{aligned}&\log \left( \frac{P\left( y_{i,j} = 1, y_{l,m} = 1 | y_{rest}\right) P\left( y_{i,j} = 0, y_{l,m} = 0 | y_{rest} \right) }{P\left( y_{i,j} = 0, y_{l,m} = 1| y_{rest} \right) P\left( y_{i,j} = 1, y_{l,m} = 0| y_{rest} \right) }\right) \\&\quad ={\left\{ \begin{array}{ll} O_p(N^{-1}) &{} \text {if}\;\; \{i,j\} \ne \{l,m\}\\ O_p(1) &{} \text {otherwise}. \end{array}\right. } \end{aligned}$$

In other words, we obtain asymptotically a Markovian independence structure, which allows to draw parallel draws.

Appendix B: Diagnostic plots

Different diagnostic plots show the quality of the proposed algorithms. We provide traceplots for two parameter constellations, resulting in a sparse and middle dense network with 2200 nodes. Figure 8 shows that the mixing in pergm is about the same as mixing for ergm with the random option of selecting ties. The TNT sampler of the ergm-Package reaches the target distribution faster. The contrary can be seen in Fig. 9 if the network is denser.

To assess adequate mixing, we simulate 1000 networks (iterations) per algorithm, each network consists of $N=2200$ nodes and $10 \cdot N^2$ update steps after a reasonable burnin. We provide the density, traceplot, running mean and autocorrelation of the resulting edge, 2-star and triangle counts (Figs. 10, 11, 12, 13).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bauer, V., Fürlinger, K. & Kauermann, G. A note on parallel sampling in Markov graphs. Comput Stat 34, 1087–1107 (2019). https://doi.org/10.1007/s00180-019-00880-4

Download citation

Received: 18 July 2016
Accepted: 04 March 2019
Published: 11 March 2019
Issue Date: 01 September 2019
DOI: https://doi.org/10.1007/s00180-019-00880-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A note on parallel sampling in Markov graphs

Abstract

Access this article

Similar content being viewed by others

Convergence Details About k-DPP Monte-Carlo Sampling for Large Graphs

Auxiliary Parameter MCMC for Exponential Random Graph Models

Parallel Algorithms for Generating Random Networks with Given Degree Sequences

References