Skip to main content
Log in

Inference of node attributes from social network assortativity

  • WSOM 2017
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Social networks are known to be assortative with respect to many attributes, such as age, weight, wealth, level of education, ethnicity and gender: Similar people according to these attributes tend to be more connected. This can be explained by influences and homophily. Independently of its origin, this assortativity gives us information about each node given its neighbors. Assortativity can thus be used to improve individual predictions in a broad range of situations, when data are missing or inaccurate. This paper presents a general framework based on probabilistic graphical models to exploit social network structures for improving individual predictions of node attributes. Using this framework, we quantify the assortativity range leading to an accuracy gain in several situations, with various individual prediction profiles. We finally show how specific characteristics of the network can enhance performances further. For instance, the gender assortativity in real-world mobile phone data drastically changes according to some communication attributes. In this case, using the network topology indeed improves local predictions of node labels and moreover enables inferring missing node labels based on a subset of known vertices. In both cases, the performances of the proposed method are statistically significantly superior to the ones achieved by state-of-the-art label propagation and feature extraction schemes in most settings.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Notes

  1. A clique is a fully connected subgraph. A maximal clique is a clique which cannot be increased in size with other nodes from the graph.

  2. But with only 25% coverage.

References

  1. Al Zamal F, Liu W, Ruths D (2012) Homophily and latent attribute inference: inferring latent attributes of twitter users from neighbors. In: ICWSM, vol. 270

  2. Aral S, Muchnik L, Sundararajan A (2009) Distinguishing influence-based contagion from homophily-driven diffusion in dynamic networks. Proc Natl Acad Sci 106(51):21544–21549

    Article  Google Scholar 

  3. Baluja S, Seth R, Sivakumar D, Jing Y, Yagnik J, Kumar S, Ravichandran D, Aly M (2008) Video suggestion and discovery for youtube: taking random walks through the view graph. In: Proceedings of the 17th international conference on World Wide Web. ACM, London, pp 895–904

  4. Bengtsson L, Lu X, Thorson A, Garfield R, Von Schreeb J (2011) Improved response to disasters and outbreaks by tracking population movements with mobile phone network data: a post-earthquake geospatial study in Haiti. PLoS Med 8(8):e1001083

    Article  Google Scholar 

  5. Bhagat S, Cormode G, Muthukrishnan S (2011) Node classification in social networks. In: Aggarwal C (ed) Social network data analytics. Springer, Boston, pp 115–148

    Chapter  Google Scholar 

  6. Blondel VD, Decuyper A, Krings G (2015) A survey of results on mobile phone datasets analysis. EPJ Data Sci 4(1):10

    Article  Google Scholar 

  7. Blumenstock J, Cadamuro G, On R (2015) Predicting poverty and wealth from mobile phone metadata. Science 350(6264):1073–1076

    Article  Google Scholar 

  8. Castellano C, Fortunato S, Loreto V (2009) Statistical physics of social dynamics. Rev Mod Phys 81(2):591

    Article  Google Scholar 

  9. Devroye L (1996) Random variate generation in one line of code. In: Simulation conference, 1996. Proceedings, Winter. IEEE, Washington, pp 265–272

  10. Dong Y, Yang Y, Tang J, Yang Y, Chawla NV (2014) Inferring user demographics and social strategies in mobile social networks. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, London, pp 15–24

  11. Felbo B, Sundsøy P, Lehmann S, de Montjoye YA et al (2017) Modeling the temporal nature of human behavior for demographics prediction. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, Berlin, pp 140–152

  12. Frias-Martinez V, Frias-Martinez E, Oliver N (2010) A gender-centric analysis of calling behavior in a developing economy using call detail records. In: AAAI spring symposium: artificial intelligence for development

  13. Ghahramani Z (2002) Graphical models: parameter learning. Handb Brain Theory Neural Netw 2:486–490

    Google Scholar 

  14. Goyal P, Ferrara E (2018) Graph embedding techniques, applications, and performance: A survey. Knowl Based Syst 151:78–94

    Article  Google Scholar 

  15. Grover A, Leskovec J (2016) node2vec: Scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, London, pp 855–864

  16. Herrera-Yagüe C, Zufiria PJ (2012) Prediction of telephone user attributes based on network neighborhood information. In: International workshop on machine learning and data mining in pattern recognition. Springer, Berlin, pp 645–659

  17. Jahani E, Sundsøy P, Bjelland J, Bengtsson L, de Montjoye YA et al (2017) Improving official statistics in emerging markets using machine learning and mobile phone data. EPJ Data Sci 6(1):3

    Article  Google Scholar 

  18. Jordan MI et al (2004) Graphical models. Stat Sci 19(1):140–155

    Article  MathSciNet  MATH  Google Scholar 

  19. Kokkos A, Tzouramanis T (2014) A robust gender inference model for online social networks and its application to Linkedin and Twitter. First Monday 19(9):8

    Google Scholar 

  20. Koller D, Friedman N (2009) Probabilistic graphical models: principles and techniques. MIT Press, Cambridge

    MATH  Google Scholar 

  21. Liu W, Ruths D (2013) What’s in a name? using first names as features for gender inference in twitter. In: AAAI spring symposium: analyzing microtext, vol 13, p 01

  22. Madan A, Moturu ST, Lazer D, Pentland AS (2010) Social sensing: obesity, unhealthy eating and exercise in face-to-face networks. In: Wireless health 2010. ACM, London, pp 104–110

  23. Magno G, Weber I (2014) International gender differences and gaps in online social networks. In: International conference on social informatics. Springer, Berlin, pp 121–138

  24. McPherson M, Smith-Lovin L, Cook JM (2001) Birds of a feather: homophily in social networks. Annu Rev Sociol 27:415–444

    Article  Google Scholar 

  25. de Montjoye YA, Kendall J, Kerry CF (2014) Enabling humanitarian use of mobile phone data. Brookings Center for Technology and Innovation, Washington

    Google Scholar 

  26. de Montjoye YA, Quoidbach J, Robic F, Pentland AS (2013) Predicting personality using novel mobile phone-based metrics. In: Greenberg AM, Kennedy WG, Bos ND (eds) Social computing, behavioral-cultural modeling and prediction. Springer, Berlin, pp 48–55

    Chapter  Google Scholar 

  27. de Montjoye YA, Rocher L, Pentland AS (2016) Bandicoot: a python toolbox for mobile phone metadata. J Mach Learn Res 17(175):1–5

    MathSciNet  Google Scholar 

  28. Montoliu R, Gatica-Perez D (2010) Discovering human places of interest from multimodal mobile phone data. In: Proceedings of the 9th international conference on mobile and ubiquitous multimedia. ACM, London, p 12

  29. Murphy KP, Weiss Y, Jordan MI (1999) Loopy belief propagation for approximate inference: An empirical study. In: Proceedings of the fifteenth conference on uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc., pp 467–475

  30. Newman ME (2000) Models of the small world. J Stat Phys 101(3–4):819–841

    Article  MATH  Google Scholar 

  31. Newman ME (2003) Mixing patterns in networks. Phys Rev E 67(2):026126

    Article  MathSciNet  Google Scholar 

  32. Newman ME (2003) The structure and function of complex networks. SIAM Rev 45(2):167–256

    Article  MathSciNet  MATH  Google Scholar 

  33. Orman GK, Labatut V (2009) A comparison of community detection algorithms on artificial networks. In: International conference on discovery science. Springer, Berlin, pp 242–256

  34. Palchykov V, Kaski K, Kertész J, Barabási AL, Dunbar RI (2012) Sex differences in intimate relationships. Sci Rep 2:370

    Article  Google Scholar 

  35. Peersman C, Daelemans W, Van Vaerenbergh L (2011) Predicting age and gender in online social networks. In: Proceedings of the 3rd international workshop on search and mining user-generated contents. ACM, London, pp 37–44

  36. Rosenquist JN, Murabito J, Fowler JH, Christakis NA (2010) The spread of alcohol consumption behavior in a large social network. Ann Intern Med 152(7):426–433

    Article  Google Scholar 

  37. Sarraute C, Blanc P, Burroni J (2014) A study of age and gender seen through mobile phone usage patterns in Mexico. In: 2014 IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM). IEEE, Washington, pp 836–843

  38. Sarraute C, Brea J, Burroni J, Blanc P (2015) Inference of demographic attributes based on mobile phone usage patterns and social network topology. Soc Netw Anal Min 5(1):39

    Article  Google Scholar 

  39. Šćepanović S, Mishkovski I, Hui P, Nurminen JK, Ylä-Jääski A (2015) Mobile phone call data as a regional socio-economic proxy indicator. PLoS ONE 10(4):e0124160

    Article  Google Scholar 

  40. Sen P, Namata G, Bilgic M, Getoor L, Galligher B, Eliassi-Rad T (2008) Collective classification in network data. AI Mag 29(3):93

    Article  Google Scholar 

  41. Shaffer JP (1995) Multiple hypothesis testing. Annu Rev Psychol 46(1):561–584

    Article  Google Scholar 

  42. Smith JA, McPherson M, Smith-Lovin L (2014) Social distance in the united states: Sex, race, religion, age, and education homophily among confidants, 1985 to 2004. Am Sociol Rev 79(3):432–456

    Article  Google Scholar 

  43. Sundsøy P, Bjelland J, Reme B, Iqbal A, Jahani E (2016) Deep learning applied to mobile phone data for individual income classification. In: ICAITA 2016 international conference on artificial intelligence and applications

  44. Tang J, Lou T, Kleinberg J (2012) Inferring social ties across heterogenous networks. In: Proceedings of the fifth ACM international conference on web search and data mining. ACM, London, pp 743–752

  45. Tatem AJ, Qiu Y, Smith DL, Sabot O, Ali AS, Moonen B et al (2009) The use of mobile phone data for the estimation of the travel patterns and imported plasmodium falciparum rates among Zanzibar residents. Malar J 8(1):10–1186

    Article  Google Scholar 

  46. Traud AL, Mucha PJ, Porter MA (2012) Social structure of Facebook networks. Phys A Stat Mech Appl 391(16):4165–4180

    Article  Google Scholar 

  47. Wainwright MJ, Jordan MI (2008) Graphical models, exponential families, and variational inference. Found Trends Mach Learn 1(1–2):1–305. https://doi.org/10.1561/2200000001

    Article  MATH  Google Scholar 

  48. Wang Y, Zang H, Faloutsos M (2013) Inferring cellular user demographic information using homophily on call graphs. In: INFOCOM, 2013 Proceedings IEEE. IEEE, Washington, pp 3363–3368

  49. Weiss Y, Freeman WT (2001) On the optimality of solutions of the max-product belief-propagation algorithm in arbitrary graphs. IEEE Trans Inf Theory 47(2):736–744

    Article  MathSciNet  MATH  Google Scholar 

  50. Zhou D, Bousquet O, Lal TN, Weston J, Schölkopf B (2003) Learning with local and global consistency. NIPS 16:321–328

    Google Scholar 

  51. Zhu X, Ghahramani Z, Lafferty JD (2003) Semi-supervised learning using Gaussian fields and harmonic functions. In: Proceedings of the 20th international conference on machine learning (ICML-03), pp 912–919

Download references

Acknowledgements

DM and CdB are Research Fellows of the Fonds de la Recherche Scientifique - FNRS. The authors gratefully acknowledge Pål Roe Sundsøy for his help with the data.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dounia Mulders.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mulders, D., de Bodt, C., Bjelland, J. et al. Inference of node attributes from social network assortativity. Neural Comput & Applic 32, 18023–18043 (2020). https://doi.org/10.1007/s00521-018-03967-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-018-03967-z

Keywords

Navigation