Abstract
Applications of the Social Web are ubiquitous and have become an integral part of everyday life: Users make friends, for example, with the help of online social networks, share thoughts via Twitter, or collaboratively write articles in Wikipedia. All such interactions leave digital traces; thus, users participate in the creation of heterogeneous, distributed, collaborative data collections. In linguistics, the Distributional Hypothesis states that words with similar distributional characteristics tend to be semantically related, i.e., words which occur in similar contexts are assumed to have a similar meaning. Considering users as (social) entities, their distributional characteristics can be observed by collecting interactions in social web applications. Accordingly, we state the social distributional hypothesis: we presume, that users with similar interaction characteristics tend to be related. We conduct a series of experiments on social interaction networks from Twitter, Flickr, and BibSonomy and investigate the relatedness concerning the interactions, their frequency, and the specific interaction characteristics. The results indicate interrelations between structurally similarity of interaction characteristics and semantic relatedness of users, supporting the social distributional hypothesis.
Similar content being viewed by others
Notes
Note: For privacy reasons a user may deactivate this feature.
http://developer.yahoo.com/geo/placemaker/ (November 2011).
References
Atzmueller M, Mitzlaff F (2011) Efficient descriptive community mining. In: Proceedings 24th international FLAIRS conference, AAAI Press, pp 459–464
Becchetti L, Castillo C, Donato D, Fazzone A, Rome I. (2006) A comparison of sampling techniques for web graph characterization. In: Proceedings of the workshop on link analysis (LinkKDD’06), Philadelphia, PA
Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Comput Netw ISDN Syst 30(1):107–117
Butts CT, Carley KM (2005) Some simple algorithms for structural comparison. Comput Math Org Theory 11:291–305. doi:10.1007/s10588-005-5586-6.
Cattuto C, Benz D, Hotho A, Stumme G (2008) Semantic grounding of tag relatedness in social bookmarking systems. In: The Semantic Web—ISWC 2008, Proceedings of international semantic web conference 2008, LNAI, vol 5318. Springer, Heidelberg, pp 615–631
Chiluka N, Andrade N, Pouwelse J (2011) A link prediction approach to recommendations in large-scale user-generated content systems. In: Clough P, Foley C, Gurrin C, Jones G, Kraaij W, Lee H, Mudoch V (eds) Advances in information retrieval. Lecture notes in computer science, vol 6611. Springer, Berlin Heidelberg, pp 189–200
Crandall DJ, Cosley D, Huttenlocher DP, Kleinberg JM, Suri S (2008) Feedback effects between similarity and social influence in online communities. In: Proceedings of 14th ACM SIGKDD internatinal conference on knowledge discovery and data mining, ACM, pp 160–168
de Sá H, Prudencio R (2011) Supervised link prediction in weighted networks. In: The 2011 international joint conference on neural networks (IJCNN), pp 2281–2288. IEEE
Diestel R (2006) Graph theory. Springer, Berlin
Dong Y, Tang J, Wu S, Tian J, Chawla NV, Rao J, Cao H (2012) Link prediction and recommendation across heterogeneous social networks. In: Proceedings of the 2012 IEEE 12th international conference on data mining, ICDM’12. IEEE computer society, Washington, DC, USA, pp 181–190
Eagle N, Pentland A, Lazer D (2009) Inferring friendship network structure by using mobile phone data. Proc Natl Acad Sci 106(36):15274–15278. doi:10.1073/pnas.0900282106
Gaertler M (2004) Clustering. In: Brandes U, Erlebach T (eds) Network analysis, LNCS, vol 3418. Springer, Berlin, pp 178–215
Gjoka M, Kurant M, Butts CT, Markopoulou A (2011) Practical recommendations on crawling online social networks. IEEE J Select Areas Commun 29(9):1872–1892
Harris ZS (1954) Distributional structure. Word
Hornby AS, Cowie AP, Gimson AC, Lewis JW (1974) Oxford advanced learner’s dictionary of current English, vol 1428. Cambridge Univ Press, Cambridge
Islam A, Inkpen D (2006) Second order co-occurrence PMI for determining the semantic similarity of words. In: Proceedings of the international conference on language resources and evaluation (LREC 2006), pp 1033–1038
Kaltenbrunner A, Scellato S, Volkovich Y, Laniado D, Currie D, Jutemar EJ, Mascolo C (2012) Far from the eyes, close on the web: impact of geographic distance on online social interactions. In: Proceedings ACM SIGCOMM workshop on online social networks (WOSN 2012) Helsinki, Finland
Kashoob S, Caverlee J, Kamath K (2010) Community-based ranking of the social web. In: Proceedings of the 21st ACM conference on hypertext and hypermedia
Kolaczyk E (2009) Statistical analysis of network data: methods and models. Springer Series in Statistics, p 386
Krause B, J\(\ddot{a}\)schke R, Hotho A, Stumme G (2008) Logsonomy-social information retrieval with logdata. In: Proceedings 19th conference on hypertext and hypermedia, ACM, pp 157–166
Kwak H, Lee C, Park H, Moon S (2010) What is twitter, a social network or a news media? In: Proceedings of the 19th international conference on world wide web. ACM, pp 591–600
Leroy V, Cambazoglu BB, Bonchi F (2010) Cold start link prediction. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, KDD’10. ACM, New York, NY, USA, pp 393–402
Liben-Nowell D, Kleinberg J (2007) The link-prediction problem for social networks. J Am Soc Inf Sci Technol 58(7):1019–1031
Luhmann N (1993) Gesellschaftsstruktur und Semantik: Studien zur Wissenssoziologie der modernen Gesellschaft, vol 1. Suhrkamp Frankfurt/M
Markines B, Cattuto C, Menczer F, Benz D, Hotho A, Stumme G (2009) Evaluating similarity measures for emergent semantics of social tagging. In: Proceedings of 18th international world wide web conference (WWW’09), pp 641–650
Maslov S, Sneppen K (2002) Specificity and stability in topology of protein networks. Science 296(5569):910
McGee J, Caverlee JA, Cheng Z (2011) A geographic study of tie strength in social media. In: Proceedings of 20th ACM international conference on information and knowledge management, CIKM ’11, ACM, New York, NY, USA, pp 2333–2336
McPherson M, Smith-Lovin L, Cook JM (2001) Birds of a feather: homophily in social networks. Ann Rev Sociol 27(1):415–444. doi:10.1146/annurev.soc.27.1.415
McPherson M, Smith-Lovin L, Cook JM (2001) Birds of a feather: homophily in social networks. Ann Rev Sociol 27, pp 415–444 (2001). http://www.jstor.org/stable/2678628
Mislove A, Marcon M, Gummadi KP, Druschel P, Bhattacharjee B (2007) Measurement and analysis of online social networks. In: Proceedings of the 7th ACM SIGCOMM conference on Internet measurement, ACM, pp 29–42
Mitzlaff F, Atzmueller M, Benz D, Hotho A, Stumme G (2011) Community assessment using evidence networks. In: Atzmueller M, Hotho A, Chin A, Helic D (eds) Analysis of social media and ubiquitous data, LNAI, vol 6904. Springer, Heidelberg, Germany, pp 79–98
Mitzlaff F, Atzmueller M, Benz D, Hotho A, Stumme G (2013) User-relatedness and community structure in social interaction networks. CoRR/abs
Mitzlaff F, Benz D, Stumme G, Hotho A (2010) Visit me, click me, be my friend: an analysis of evidence networks of user relationships in bibsonomy. In: Proceedings of the 21st ACM conference on hypertext and hypermedia. Toronto, Canada
Murata T, Moriyasu S (2007) Link prediction of social networks based on weighted proximity measures. In: Web Intelligence, IEEE/WIC/ACM international conference on, pp 85–88 IEEE
Newman MEJ (2003) The structure and function of complex networks. SIAM Rev 45(2):167–256
Newman ME, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E Stat Nonlinear Soft Matter Phys 69(2):1–15
Scellato S, Noulas A, Lambiotte R, Mascolo C (2011) Socio-spatial properties of online location-based social networks. In: Proceedings of the fifth international conference on weblogs and social media (ICWSM) vol 11, pp 329–336
Schifanella R, Barrat A, Cattuto C, Markines B, Menczer F (2010) Folks in folksonomies: social link prediction from shared metadata. In: Proceedings 3rd ACM international conference on web search and data mining, ACM, New York, NY, USA, pp 271–280
van de Rijt A, Kang SM, Restivo M, Patil A (2014) Field experiments of success-breeds-success dynamics. Proc Natl Acad Sci p 201316836
Yang J, Leskovec J (2011) Patterns of temporal variation in online media. In: Proceedings of the fourth ACM international conference on Web search and data mining, ACM, pp 177–186
Acknowledgments
This work has been partially supported by the Commune project funded by the Hertie foundation.
Author information
Authors and Affiliations
Corresponding author
Additional information
This article is part of the Topical Collection on Social Systems as Complex Networks.
Rights and permissions
About this article
Cite this article
Mitzlaff, F., Atzmueller, M., Hotho, A. et al. The social distributional hypothesis: a pragmatic proxy for homophily in online social networks. Soc. Netw. Anal. Min. 4, 216 (2014). https://doi.org/10.1007/s13278-014-0216-2
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13278-014-0216-2