Abstract
Social networks are known to be assortative with respect to many attributes, such as age, weight, wealth, level of education, ethnicity and gender: Similar people according to these attributes tend to be more connected. This can be explained by influences and homophily. Independently of its origin, this assortativity gives us information about each node given its neighbors. Assortativity can thus be used to improve individual predictions in a broad range of situations, when data are missing or inaccurate. This paper presents a general framework based on probabilistic graphical models to exploit social network structures for improving individual predictions of node attributes. Using this framework, we quantify the assortativity range leading to an accuracy gain in several situations, with various individual prediction profiles. We finally show how specific characteristics of the network can enhance performances further. For instance, the gender assortativity in real-world mobile phone data drastically changes according to some communication attributes. In this case, using the network topology indeed improves local predictions of node labels and moreover enables inferring missing node labels based on a subset of known vertices. In both cases, the performances of the proposed method are statistically significantly superior to the ones achieved by state-of-the-art label propagation and feature extraction schemes in most settings.














Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
A clique is a fully connected subgraph. A maximal clique is a clique which cannot be increased in size with other nodes from the graph.
But with only 25% coverage.
References
Al Zamal F, Liu W, Ruths D (2012) Homophily and latent attribute inference: inferring latent attributes of twitter users from neighbors. In: ICWSM, vol. 270
Aral S, Muchnik L, Sundararajan A (2009) Distinguishing influence-based contagion from homophily-driven diffusion in dynamic networks. Proc Natl Acad Sci 106(51):21544–21549
Baluja S, Seth R, Sivakumar D, Jing Y, Yagnik J, Kumar S, Ravichandran D, Aly M (2008) Video suggestion and discovery for youtube: taking random walks through the view graph. In: Proceedings of the 17th international conference on World Wide Web. ACM, London, pp 895–904
Bengtsson L, Lu X, Thorson A, Garfield R, Von Schreeb J (2011) Improved response to disasters and outbreaks by tracking population movements with mobile phone network data: a post-earthquake geospatial study in Haiti. PLoS Med 8(8):e1001083
Bhagat S, Cormode G, Muthukrishnan S (2011) Node classification in social networks. In: Aggarwal C (ed) Social network data analytics. Springer, Boston, pp 115–148
Blondel VD, Decuyper A, Krings G (2015) A survey of results on mobile phone datasets analysis. EPJ Data Sci 4(1):10
Blumenstock J, Cadamuro G, On R (2015) Predicting poverty and wealth from mobile phone metadata. Science 350(6264):1073–1076
Castellano C, Fortunato S, Loreto V (2009) Statistical physics of social dynamics. Rev Mod Phys 81(2):591
Devroye L (1996) Random variate generation in one line of code. In: Simulation conference, 1996. Proceedings, Winter. IEEE, Washington, pp 265–272
Dong Y, Yang Y, Tang J, Yang Y, Chawla NV (2014) Inferring user demographics and social strategies in mobile social networks. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, London, pp 15–24
Felbo B, Sundsøy P, Lehmann S, de Montjoye YA et al (2017) Modeling the temporal nature of human behavior for demographics prediction. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, Berlin, pp 140–152
Frias-Martinez V, Frias-Martinez E, Oliver N (2010) A gender-centric analysis of calling behavior in a developing economy using call detail records. In: AAAI spring symposium: artificial intelligence for development
Ghahramani Z (2002) Graphical models: parameter learning. Handb Brain Theory Neural Netw 2:486–490
Goyal P, Ferrara E (2018) Graph embedding techniques, applications, and performance: A survey. Knowl Based Syst 151:78–94
Grover A, Leskovec J (2016) node2vec: Scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, London, pp 855–864
Herrera-Yagüe C, Zufiria PJ (2012) Prediction of telephone user attributes based on network neighborhood information. In: International workshop on machine learning and data mining in pattern recognition. Springer, Berlin, pp 645–659
Jahani E, Sundsøy P, Bjelland J, Bengtsson L, de Montjoye YA et al (2017) Improving official statistics in emerging markets using machine learning and mobile phone data. EPJ Data Sci 6(1):3
Jordan MI et al (2004) Graphical models. Stat Sci 19(1):140–155
Kokkos A, Tzouramanis T (2014) A robust gender inference model for online social networks and its application to Linkedin and Twitter. First Monday 19(9):8
Koller D, Friedman N (2009) Probabilistic graphical models: principles and techniques. MIT Press, Cambridge
Liu W, Ruths D (2013) What’s in a name? using first names as features for gender inference in twitter. In: AAAI spring symposium: analyzing microtext, vol 13, p 01
Madan A, Moturu ST, Lazer D, Pentland AS (2010) Social sensing: obesity, unhealthy eating and exercise in face-to-face networks. In: Wireless health 2010. ACM, London, pp 104–110
Magno G, Weber I (2014) International gender differences and gaps in online social networks. In: International conference on social informatics. Springer, Berlin, pp 121–138
McPherson M, Smith-Lovin L, Cook JM (2001) Birds of a feather: homophily in social networks. Annu Rev Sociol 27:415–444
de Montjoye YA, Kendall J, Kerry CF (2014) Enabling humanitarian use of mobile phone data. Brookings Center for Technology and Innovation, Washington
de Montjoye YA, Quoidbach J, Robic F, Pentland AS (2013) Predicting personality using novel mobile phone-based metrics. In: Greenberg AM, Kennedy WG, Bos ND (eds) Social computing, behavioral-cultural modeling and prediction. Springer, Berlin, pp 48–55
de Montjoye YA, Rocher L, Pentland AS (2016) Bandicoot: a python toolbox for mobile phone metadata. J Mach Learn Res 17(175):1–5
Montoliu R, Gatica-Perez D (2010) Discovering human places of interest from multimodal mobile phone data. In: Proceedings of the 9th international conference on mobile and ubiquitous multimedia. ACM, London, p 12
Murphy KP, Weiss Y, Jordan MI (1999) Loopy belief propagation for approximate inference: An empirical study. In: Proceedings of the fifteenth conference on uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc., pp 467–475
Newman ME (2000) Models of the small world. J Stat Phys 101(3–4):819–841
Newman ME (2003) Mixing patterns in networks. Phys Rev E 67(2):026126
Newman ME (2003) The structure and function of complex networks. SIAM Rev 45(2):167–256
Orman GK, Labatut V (2009) A comparison of community detection algorithms on artificial networks. In: International conference on discovery science. Springer, Berlin, pp 242–256
Palchykov V, Kaski K, Kertész J, Barabási AL, Dunbar RI (2012) Sex differences in intimate relationships. Sci Rep 2:370
Peersman C, Daelemans W, Van Vaerenbergh L (2011) Predicting age and gender in online social networks. In: Proceedings of the 3rd international workshop on search and mining user-generated contents. ACM, London, pp 37–44
Rosenquist JN, Murabito J, Fowler JH, Christakis NA (2010) The spread of alcohol consumption behavior in a large social network. Ann Intern Med 152(7):426–433
Sarraute C, Blanc P, Burroni J (2014) A study of age and gender seen through mobile phone usage patterns in Mexico. In: 2014 IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM). IEEE, Washington, pp 836–843
Sarraute C, Brea J, Burroni J, Blanc P (2015) Inference of demographic attributes based on mobile phone usage patterns and social network topology. Soc Netw Anal Min 5(1):39
Šćepanović S, Mishkovski I, Hui P, Nurminen JK, Ylä-Jääski A (2015) Mobile phone call data as a regional socio-economic proxy indicator. PLoS ONE 10(4):e0124160
Sen P, Namata G, Bilgic M, Getoor L, Galligher B, Eliassi-Rad T (2008) Collective classification in network data. AI Mag 29(3):93
Shaffer JP (1995) Multiple hypothesis testing. Annu Rev Psychol 46(1):561–584
Smith JA, McPherson M, Smith-Lovin L (2014) Social distance in the united states: Sex, race, religion, age, and education homophily among confidants, 1985 to 2004. Am Sociol Rev 79(3):432–456
Sundsøy P, Bjelland J, Reme B, Iqbal A, Jahani E (2016) Deep learning applied to mobile phone data for individual income classification. In: ICAITA 2016 international conference on artificial intelligence and applications
Tang J, Lou T, Kleinberg J (2012) Inferring social ties across heterogenous networks. In: Proceedings of the fifth ACM international conference on web search and data mining. ACM, London, pp 743–752
Tatem AJ, Qiu Y, Smith DL, Sabot O, Ali AS, Moonen B et al (2009) The use of mobile phone data for the estimation of the travel patterns and imported plasmodium falciparum rates among Zanzibar residents. Malar J 8(1):10–1186
Traud AL, Mucha PJ, Porter MA (2012) Social structure of Facebook networks. Phys A Stat Mech Appl 391(16):4165–4180
Wainwright MJ, Jordan MI (2008) Graphical models, exponential families, and variational inference. Found Trends Mach Learn 1(1–2):1–305. https://doi.org/10.1561/2200000001
Wang Y, Zang H, Faloutsos M (2013) Inferring cellular user demographic information using homophily on call graphs. In: INFOCOM, 2013 Proceedings IEEE. IEEE, Washington, pp 3363–3368
Weiss Y, Freeman WT (2001) On the optimality of solutions of the max-product belief-propagation algorithm in arbitrary graphs. IEEE Trans Inf Theory 47(2):736–744
Zhou D, Bousquet O, Lal TN, Weston J, Schölkopf B (2003) Learning with local and global consistency. NIPS 16:321–328
Zhu X, Ghahramani Z, Lafferty JD (2003) Semi-supervised learning using Gaussian fields and harmonic functions. In: Proceedings of the 20th international conference on machine learning (ICML-03), pp 912–919
Acknowledgements
DM and CdB are Research Fellows of the Fonds de la Recherche Scientifique - FNRS. The authors gratefully acknowledge Pål Roe Sundsøy for his help with the data.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Mulders, D., de Bodt, C., Bjelland, J. et al. Inference of node attributes from social network assortativity. Neural Comput & Applic 32, 18023–18043 (2020). https://doi.org/10.1007/s00521-018-03967-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-018-03967-z