Abstract
The advent and availability of technology has brought us closer than ever through social networks. Consequently, there is a growing emphasis on mining social networks to extract information for knowledge and discovery. However, methods for social network analysis (SNA) have not kept pace with the data explosion. In this review, we describe directed and undirected probabilistic graphical models (PGMs), and highlight recent applications to social networks. PGMs represent a flexible class of models that can be adapted to address many of the current challenges in SNA. In this work, we motivate their use with simple and accessible examples to demonstrate the modeling and connect to theory. In addition, recent applications in modern SNA are highlighted, including the estimation and quantification of importance, propagation of influence, trust (and distrust), link and profile prediction, privacy protection, and news spread through microblogging. Applications are selected to demonstrate the flexibility and predictive capabilities of PGMs in SNA. Finally, we conclude with a discussion of challenges and opportunities for PGMs in social networks.
Similar content being viewed by others
References
Afrasiabi MH, Guérin R, Venkatesh S (2013) Opinion formation in Ising networks. In: Information theory and applications workshop (ITA), 2013, pp 1–10. IEEE
Aggarwal CC (2011) An introduction to social network data analytics. Springer, Berlin
Agliari E, Burioni R, Contucci P (2010) A diffusive strategic dynamics for social systems. J Stat Phys 139(3):478–491
Al Hasan M, Zaki MJ (2011) A survey of link prediction in social networks. In: Social network data analytics. Springer, Berlin, pp 243–275
Anderson RM, May RM et al (1979) Population biology of infectious diseases: Part i. Nature 280(5721):361–367
Ayday E, Fekri F (2010) A belief propagation based recommender system for online services. In: Proceedings of the fourth ACM conference on recommender systems, pp 217–220. ACM
Bach SH, Broecheler M, Getoor L, O’Leary DP (2012) Scaling MPE inference for constrained continuous Markov random fields with consensus optimization. In: NIPS, pp 2663–2671
Berry MJ, Linoff G (1997) Data mining techniques: for marketing, sales, and customer support. Wiley, New York
Bonchi F, Castillo C, Gionis A, Jaimes A (2011) Social network analysis and mining for business applications. ACM Trans Intell Syst Technol (TIST) 2(3):22
Broekel T, Hartog M (2013) Explaining the structure of inter-organizational networks using exponential random graph models. Ind Innov 20(3):277–295
Bromberg F, Margaritis D, Honavar V et al (2009) Efficient Markov network structure discovery using independence tests. J Artif Intell Res 35(2):449
Cha M, Mislove A, Adams B, Gummadi KP (2008) Characterizing social cascades in Flickr. In: Proceedings of the 1st workshop on online social networks (WOSN’08), Seattle, WA
Cha M, Mislove A, Gummadi KP (2009) A measurement-driven analysis of information propagation in the Flickr social network. In: Proceedings of the 18th annual World wide web conference (WWW’09), Madrid, Spain
Chapelle O, Zhang Y (2009) A dynamic Bayesian network click model for web search ranking. In: Proceedings of the 18th international conference on World wide web, pp 1–10. ACM
Chen H, Ku WS, Wang H, Tang L, Sun MT (2013) Linkprobe: probabilistic inference on large-scale social networks. In: 2013 IEEE 29th international conference on data engineering (ICDE), pp 290–301. IEEE
Chickering DM, Heckerman D, Meek C (2001) Large-sample learning of Bayesian networks in NP-hard. J Mach Learn Res 5(2004):1287–1330
Coleman JS, Katz E, Menzel H et al (1966) Medical innovation: a diffusion study. Bobbs-Merrill Company Indianapolis, New York
Cowan R, Jonard N (2004) Network structure and the diffusion of knowledge. J Econ Dyn Control 28(8):1557–1575
Crane R, McDowell LK (2011) Evaluating Markov logic networks for collective classification. In: Proceedings of the 9th MLG workshop at the 17th ACM SIGKDD conference on knowledge discovery and data mining
Cranmer SJ, Desmarais BA (2011) Inferential network analysis with exponential random graph models. Polit Anal 19(1):66–86
Daud A, Li J, Zhou L, Muhammad F (2010) Knowledge discovery through directed probabilistic topic models: a survey. Front Comput Sci China 4(2):280–301
Dielmann A, Renals S (2004) Dynamic Bayesian networks for meeting structuring. In: IEEE international conference on acoustics, speech, and signal processing, 2004. Proceedings (ICASSP’04), vol 5, p V-629. IEEE
Dierkes T, Bichler M, Krishnan R (2011) Estimating the effect of word of mouth on churn and cross-buying in the mobile phone market with Markov logic networks. Decis Support Syst 51(3):361–371
Ding S (2011) Learning undirected graphical models with structure penalty. arXiv:1104.5256
Division NSR (1948) Rand database of worldwide terrorism incidents. http://www.rand.org/nsrd/projects/terrorism-incidents.html
Domingos P, Kok S, Lowd D, Poon H, Richardson M, Singla P (2008) Markov logic. In: Probabilistic inductive logic programming. Springer, Berlin, pp 92–117
Domingos P, Lowd D, Kok S, Nath A, Poon H, Richardson M, Singla P (2010) Markov logic: a language and algorithms for link mining. In: Link mining: models, algorithms, and applications. Springer, New York, pp 135–161
Fang L, LeFevre K (2010) Privacy wizards for social networking sites. In: Proceedings of the 19th international conference on World wide web, pp 351–360. ACM
Fellows I, Handcock MS (2012) Exponential-family random network models (preprint). arXiv:1208.0121
Fienberg SE (2012) A brief history of statistical models for network analysis and open challenges. J Comput Graph Stat 21(4):825–839
Frank O, Strauss D (1986) Markov graphs. J Am Stat Assoc 81(395):832–842
Freeman L (2004) The development of social network analysis. Empirical Press, Vancouver
Friedman N, Murphy K, Russell S (1998) Learning the structure of dynamic probabilistic networks. In: Proceedings of the fourteenth conference on uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc., San Mateo, pp 139–147
Getoor L (2012) Social network datasets. http://www.cs.umd.edu/ sen/lbc-proj/LBC.html
Gjoka M, Kurant M, Butts CT, Markopoulou A (2010) Walking in Facebook: a case study of unbiased sampling of OSNs. In: INFOCOM, 2010 Proceedings IEEE, pp 1–9
Goldenberg A, Moore A (2004) Tractable learning of large Bayes net structures from sparse data. In: Proceedings of the twenty-first international conference on machine learning, p 44. ACM
Goldenberg A, Zheng AX, Fienberg SE, Airoldi EM (2010) A survey of statistical network models. Found Trends Mach Learn 2(2):129–233
Goodreau SM (2007) Advances in exponential random graph (\(p*\)) models applied to a large social network. Soc Netw 29(2):231–248
Goodreau SM, Kitts JA, Morris M (2009) Birds of a feather, or friend of a friend? using exponential random graph models to investigate adolescent social networks*. Demography 46(1):103–125
Grabowski A, Kosiński R (2006) Ising-based model of opinion formation in a complex network of interpersonal interactions. Physica A: Stat Mech Appl 361(2):651–664
Hageman RS, Leduc MS, Korstanje R, Paigen B, Churchill GA (2011) A Bayesian framework for inference of the genotype–phenotype map for segregating populations. Genetics 187:1163–1170
Handcock MS, Robins G, Snijders TA, Moody J, Besag, J (2003) Assessing degeneracy in statistical models of social networks. Technical report, Working paper
Handcock M, Hunter D, Butts C, Goodreau S, Morris M (2006) Statnet: an r package for the statistical analysis and simulation of social networks. Manual. University of Washington
He J, Chu WW, Liu ZV (2006) Inferring privacy information from social networks. In: Intelligence and security informatics. Springer, Berlin, pp 154–165
Heckerman D (2008) A tutorial on learning with Bayesian networks. Springer, Berlin
Humphreys L (2007) Mobile social networks and social practice: a case study of dodgeball. J Comput Mediat Commun 13(1):341–360
Hunter DR, Goodreau SM, Handcock MS (2008) Goodness of fit of social network models. J Am Stat Assoc 103(481):
Jabeur LB, Tamine L, Boughanem M (2012a) Featured tweet search: modeling time and social influence for microblog retrieval. In: 2012 IEEE/WIC/ACM international conferences on Web intelligence and intelligent agent technology (WI-IAT), vol 1, pp 166–173. IEEE
Jabeur LB, Tamine L, Boughanem M (2012b) Uprising microblogs: a Bayesian network retrieval model for tweet search. In: Proceedings of the 27th annual ACM symposium on applied computing, pp 943–948. ACM
Jansen BJ, Zhang M, Sobel K, Chowdury A (2009) Twitter power: tweets as electronic word of mouth. J Am Soc Inf Sci Technol 60(11):2169–2188
Java A, Song X, Finin T, Tseng B (2007) Why we twitter: understanding microblogging usage and communities. In: Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis, pp 56–65. ACM
Kempe D, Kleinberg J, Tardos É (2003) Maximizing the spread of influence through a social network. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, pp 137–146. ACM
Koelle D, Pfautz J, Farry M, Cox Z, Catto G, Campolongo J (2006) Applications of Bayesian belief networks in social network analysis. In: Proceedings of the 4th Bayesian modeling applications workshop, UAI conference
Koller D, Friedman N (2009) Probabilistic graphical models: principles and techniques. Massachusetts Institute of Technology, Cambridge
Krause SM, Böttcher P, Bornholdt S (2012) Mean-field-like behavior of the generalized voter-model-class kinetic Ising model. Phys Rev E 85(3):031126
Krebs VE (2002) Mapping networks of terrorist cells. Connections 24(3):43–52
Kuter U, Golbeck J (2007) Sunny: a new algorithm for trust inference in social networks using probabilistic confidence models. AAAI 7:1377–1382
Kwak H, Lee C, Park H, Moon S (2010) What is twitter, a social network or a news media? In: Proceedings of the 19th international conference on World wide web, pp 591–600. ACM
Lauritzen SL (1996) Graphical models. Oxford University Press, Oxford
Lee SI, Ganapathi V, Koller D (2006) Efficient structure learning of Markov networks using \(l\_1\)-regularization. In: Advances in neural information processing systems, pp 817–824
Lipford HR, Besmer A, Watson J (2008) Understanding privacy settings in Facebook with an audience view. UPSEC 8:1–8
Lusher D, Koskinen J, Robins G (2012) Exponential random graph models for social networks: theory, methods, and applications. Cambridge University Press, Cambridge
Madigan D, York J, Allard D (1995) Bayesian graphical models for discrete data. Int Stat Rev 63(2):215–232
Manyika J, Chui M, Brown B, Bughin J, Dobbs R, Roxburgh C, Byers AH (2011) Big data: the next frontier for innovation, competition, and productivity
Mislove A, Marcon M, Gummadi KP, Druschel P, Bhattacharjee B (2007) Measurement and analysis of online social networks. In: Proceedings of the 5th ACM/USENIX Internet measurement conference (IMC’07), San Diego, CA
Morris M, Handcock MS, Hunter DR (2008) Specification of exponential-family random graph models: terms and computational aspects. J Stat Softw 24(4):1548
Mukherjee S, Speed T (2008) Network inference using informative priors. PNAS 11158:14313–14318
Murphy KP (2002) Dynamic Bayesian networks: representation, inference and learning. PhD thesis, University of California
Murphy KP (2012) Machine learning: a probabilistic perspective. The MIT Press, Cambridge
National Consortium for the Study of Terrorism and Responses to Terrorism (START) (2015) University of Maryland. http://www.start.umd.edu/
Neville J, Jensen D (2007) Relational dependency networks. J Mach Learn Res 8:653–692
Newman ME (2006) Modularity and community structure in networks. Proc Natl Acad Sci 103(23):8577–8582
Newman ME, Watts DJ, Strogatz SH (2002) Random graph models of social networks. Proc Natl Acad Sci 99(suppl 1):2566–2572
Ounis I, Macdonald C, Lin J, Soboroff I (2011) Overview of the TREC-2011 microblog track. In: Proceedings of the 20th Text REtrieval Conference (TREC 2011)
Park J, Newman ME (2004) Statistical mechanics of networks. Phys Rev E 70(6):066117
Pattison P, Wasserman S (1999) Logit models and logistic regressions for social networks: II. Multivariate relations. Br J Math Stat Psychol 52(2):169–193
Ravikumar P, Wainwright MJ, Lafferty JD et al (2010) High-dimensional Ising model selection using l1-regularized logistic regression. Ann Stat 38(3):1287–1319
Richardson M, Domingos P (2006) Markov logic networks. Mach Learn 62(1–2):107–136
Rinaldo A, Fienberg SE, Zhou Y et al (2009) On the geometry of discrete exponential families with application to exponential random graph models. Electr J Stat 3:446–484
Robins G, Pattison P, Wasserman S (1999) Logit models and logistic regressions for social networks: III. Valued relations. Psychometrika 64(3):371–394
Robins G, Pattison P, Elliott P (2001) Network models for social influence processes. Psychometrika 66(2):161–189
Robins G, Pattison P, Kalish Y, Lusher D (2007a) An introduction to exponential random graph (\(p*\)) models for social networks. Soc Netw 29(2):173–191
Robins G, Snijders T, Wang P, Handcock M, Pattison P (2007b) Recent developments in exponential random graph (\(p*\)) models for social networks. Soc Netw 29(2):192–215
Salter-Townshend M, Murphy TB (2014) Role analysis in networks using mixtures of exponential random graph models. J Comput Grap Stat (just-accepted)
Salter-Townshend M, White A, Gollini I, Murphy TB (2012) Review of statistical network analysis: models, algorithms, and software. Stat Anal Data Min ASA Data Sci J 5(4):243–264
Santos FC, Pacheco JM, Lenaerts T (2006) Evolutionary dynamics of social dilemmas in structured heterogeneous populations. Proc Natl Acad Sci USA 103(9):3490–3494
Schaefer DR, Simpkins SD (2014) Using social network analysis to clarify the role of obesity in selection of adolescent friends. Am J Public Health 104(7):1223–1229
Schmidt MW, Murphy K, Fung G, Rosales R (2010) Structure learning in random fields for heart motion abnormality detection. In: IEEE Conference on Computer Vision and Pattern Recognition. IEEE, pp. 1–8
Scott J, Carrington PJ (2011) The SAGE handbook of social network analysis. SAGE Publications, London
Snijders TA, Pattison PE, Robins GL, Handcock MS (2006) New specifications for exponential random graph models. Sociol Methodol 36(1):99–153
Song X, Jiang S, Yan X, Chen H (2014) Collaborative friendship networks in online healthcare communities: an exponential random graph model analysis. In: Smart health, vol 8549. Springer, Switzerland, pp 75–87
Sparrow MK (1991) The application of network analysis to criminal intelligence: an assessment of the prospects. Soc Netw 13(3):251–274
Srihari S (2014) Probabilistic graphical models. In: Alhajj R, Rokne J (eds) Encyclopedia of social network analysis and mining. Springer, Berlin
Stanford (2011) Stanford network analysis package (snap). http://snap.stanford.edu
Taskar B, Abbeel P, Koller D (2002) Discriminative probabilistic models for relational data. In: Proceedings of the eighteenth conference on uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc., USA, pp 485–492
Thiemichen S, Friel N, Caimo A, Kauermann G (2014) Bayesian exponential random graph models with nodal random effects (preprint). arXiv:1407.6895
Tresp V, Nickel M (2013) Relational models. In: Rokne J, Alhajj R (eds) Encyclopedia of social network analysis and mining. Springer, Heidelberg
Uddin S, Hamra J, Hossain L (2013a) Exploring communication networks to understand organizational crisis using exponential random graph models. Comput Math Organ Theory 19(1):25–41
Uddin S, Hossain L, Hamra J, Alam A (2013b) A study of physician collaborations through social network and exponential random graph. BMC Health Serv Res 13(1):234
Van den Bulte C, Lilien GL (2001) Medical innovation revisited: social contagion versus marketing effort1. Am J Sociol 106(5):1409–1435
Vega-Redondo F (2007) Complex social networks, vol 44. Cambridge University Press, Cambridge
Viswanath B, Mislove A, Cha M, Gummadi KP (2009) On the evolution of user interaction in Facebook. In: Proceedings of the 2nd ACM SIGCOMM workshop on social networks (WOSN’09), Barcelona, Spain
Wan HY, Lin YF, Wu ZH, Huang HK (2012) Discovering typed communities in mobile social networks. J Comput Sci Technol 27(3):480–491
Wang Y, Vassileva J (2003) Bayesian network-based trust model. In: IEEE/WIC international conference on Web intelligence, 2003. WI 2003. Proceedings, pp 372–378. IEEE
Wasserman S, Pattison P (1996) Logit models and logistic regressions for social networks: I. An introduction to Markov graphs and p. Psychometrika 61(3):401–425
Wortman, J.: Viral marketing and the diffusion of trends on social networks (2008)
Xiang R, Neville J (2013) Collective inference for network data with copula latent Markov networks. In: Proceedings of the sixth ACM international conference on Web search and data mining, pp 647–656. ACM
Yang X, Guo Y, Liu Y (2013) Bayesian-inference-based recommendation in online social networks. IEEE Trans Parallel Distrib Syst 24(4):642–651
Acknowledgments
A. N. is supported in part by a MURI grant (Number W911NF-09-1-0392) for Unified Research on Network-based Hard/Soft Information Fusion, issued by the US Army Research Office (ARO) under the program management of Dr. John Lavery, in part by the Academy of Finland Grant MineSocMed (Number 268078), and in part by the 2015 U.S. Air Force Summer Faculty Fellowship Program, sponsored by the Air Force Office of Scientific Research. R. H. B. is supported through NSF DMS 1312250.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
1.1 Similarity between MNs and ERGMs
While MNs and ERGMs have been developed in different scientific domains, they both specify exponential family distributions. MN models treat social network nodes as random variables, and hence, their utility is most obvious in modeling processes on networks; ERGMs, on the other hand, have been conceptualized to model network formation, where it is the edge presence indicators that are treated as random variables (these random variables are dependent if their corresponding edges share a node). But in fact, this application-related difference in what to treat as random is not fundamental. This Appendix works to more rigorously disclose the similarity between MNs and ERGMs by re-defining an ERGM as a PGM. We begin, however, by reviewing the branch of literature devoted exclusively to ERGMs.
Similar to MNs, a well-discussed problem of ERGMs for analyzing social networks is related to the challenge of parameters estimation (Robins et al. 2007b) due to the lack of enough observed data. Robins et al. (2007b) outline this and some other problems associated with ERGMs, e.g., degeneracy in model selection and bimodal distribution shapes (see also Handcock et al. 2003; Rinaldo et al. 2009; Snijders et al. 2006; Handcock et al. 2006).
The roots of ERGMs in the Principle of Maximum Entropy (Park and Newman 2004) and the Hammersley–Clifford theorem have been previously pointed out (Robins et al. 2001; Goldenberg et al. 2010). Here, we illustrate how MNs and ERGMs are similar in terms of the form and structure using most popular significant statistics in ERGMs; under the assumption of Markov dependence, for a given social network, one can build a corresponding Markov network via the following conversion: (1) each node in the Markov network will correspond to an edge in the social network [Fienberg called this construct a “usual graphical model” for ERGMs (Fienberg 2012)], (2) when two edges share a node in the social network, a link will be built between two corresponding nodes in the Markov network.
Corresponding to each possible edge in a social network, a node in an MN network is introduced; note the difference between the original social network and the MN network—they are not the same! Consider an ERGM with the significant statistics including the number of edges, \(f_{1}(y)\), the number of k-stars, \(f_{i}(y),i =2,\ldots ,N-1\) and the number of triangles, \(f_{N}(y)\). In an MN, a maximum Entropy (maxent) model proposes the following form for the internal energy of the system, \(E_{c}(x)= -\sum _{i}{\alpha _{ci}g_{ci}}\). Define, \(g_{ci}\) as \(i^{th}\) feature of clique \(c \in \varOmega \) and \(\alpha _{ci}\) is its corresponding weight in G. Thus, \(\psi _{c}(x)=\exp \{\beta _c\sum _{i=1}^N{\alpha _{ci}g_{ci}}\}\). Since there are too many parameters in the MN, they can be deducted by imposing homogeneity constraints similar to that of ERGMs (Robins et al. 2007a). Before imposing such constraints, these following facts are required.
It is straightforward to demonstrate that G encompasses cliques of size \(\{3, \ldots ,N-1\}\). In addition, all substructure in \(G_s\) can be redefined by features in G. Considering these points, we can rewrite the joint probability of all variables represented by the MN, P(X), as follows:
In (4), \(Z(\alpha )\) is the partition function which is a function of parameters. The homogeneity assumption, here, means \(\alpha _{ci}=\theta _i'\; \forall \; c=1,\ldots , C\); then P(X) is:
In (5), let’s \(Z'=Z(\theta ')\). In addition, we assume that \(\sum _{c=1}^C{\beta _cg_{ci}}\) represented by \(f_i'\), means that substructures i in all cliques c are added up by weight \(\beta _c\). Finally, if we replace \(f_i'\) in (5):
Comparing \(P(Y=y)\) and (4) confirms that ERGMs and MNs are similar and under the following conditions they are identical:
-
1.
\(\theta _i=\theta _i'\),
-
2.
\(f_i=f_i'=\sum _{c=1}^C{\beta _cg_{ci}}\).
The following Numerical Example (the same example in the ERGM section) depicts similarities between ERGMs and MNs. The social network has five actors, \(N=5\) (Fig. 8). Considering Markov dependency assumption, there exists an unique corresponding Markov network shown in Fig. 9 with 10 nodes. There are 15 cliques (so-called factors) of size three or four,
As already mentioned, the joint probability function of all variables in each clique is proportional to the internal energy. For instance:
where \(E_{1}(x)=-\sum _{i}{\alpha _{ci}g_{ci}}\) and \(\lambda \) is the distribution parameter. This simple example shows that how ERGMs and MNs are the same in terms of the underlying concept and the expressed probability distribution.
Rights and permissions
About this article
Cite this article
Farasat, A., Nikolaev, A., Srihari, S.N. et al. Probabilistic graphical models in modern social network analysis. Soc. Netw. Anal. Min. 5, 62 (2015). https://doi.org/10.1007/s13278-015-0289-6
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13278-015-0289-6