Skip to main content
Log in

Building rich social network data: a schema to assist in designing, collecting and evaluating social network data

  • Original Article
  • Published:
Social Network Analysis and Mining Aims and scope Submit manuscript

Abstract

Creating a social network dataset requires us to represent a set of empirical observations according to a specific conceptual understanding. This requires a number of design decisions for the conceptual framework, which is then implemented through a data structure. In this paper, we propose a standard schema to describe these decisions. A standard schema allows us to define the conceptual framework, structure, and content of a dataset. Social network datasets may contain many features. Beyond the definition of actors and relations, network data may include: actor or relation attributes; data for multiple observation periods (dynamic data); or parallel event data. The creation of a network dataset may also involve the application of specific boundary conditions, sampling approaches or may include missing data. Our proposed schema is designed to support a scientific approach to social network analysis by making these features and assumptions transparent and easy to communicate. We believe that this will facilitate researchers through the design, creation, communication, and evaluation of social network datasets. To develop this schema, we gathered and analysed the structure, content, and metadata of over 150 publicly available social network datasets drawing from multiple disciplines, including statistics, computer science, sociology, economics, and political science.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. A tangible example of this is the myriad of different social network data storage file formats. There currently exist at least 20 standard social network storage formats (including gexf, gml, GraphML, UNINet DL, Pajek NET, csv, edgelist, etc.) and many non-standard storage approaches. This can make data difficult to access for those unfamiliar with the data format and hence reduce the discovery of these datasets by specific disciplines.

  2. Here, we use the term feature according to its standard interpretation: ‘a distinctive attribute or aspect of something’.

  3. University College Dublin Dynamics Lab http://dl.ucd.ie.

References

  • Adamic LA, Glance N (2005) The political blogosphere and the 2004 U.S. election. In: Proceedings of the 3rd international workshop on Link discovery—LinkKDD’05. ACM Press, New York, pp 36–43. doi:10.1145/1134271.1134277

  • Auer S, Bizer C, Kobilarov G, Lehmann J, Cyganiak C, Ives Z (2007) DBpedia: a nucleus for a web of open data. In: Proceedings of the 6th international ‘The semantic web’ and 2nd Asian conference on Asian semantic web conference, ISWC’07/ASWC’07. Springer, Berlin, pp 722–735

  • Batagelj V, Mrvar A (2006) Pajek datasets. http://vlado.fmf.uni-lj.si/pub/networks/data/

  • Batagelj V, Mrvar A, de Nooy W (2008) Exploratory social network analysis with Pajek. Cambridge University Press, England

    Google Scholar 

  • Blondel VD, Esch M, Chan C, Clerot F, Deville P, Huens E, Morlot F, Smoreda Z, Ziemlicki C (2012) Data for development: the D4D challenge on mobile phone data. arXiv preprint arXiv:1210.0137

  • Boyd D, Ellison N (2007) Social network sites: definition, history and scholarship. J Comput Mediat Commun 13(1):210–230. doi:10.1111/j.1083-6101.2007.00393.x

  • Brozovsky L, Petricek V (2007) Recommender system for online dating service. arXiv preprint cs/0703042

  • Cross RL, Parker A (2004) The hidden power of social networks: understanding how work really gets done in organizations. Harvard Business School Press, US

  • De Nooy W, Mrvar A, Batagelj V (2005) Exploratory social network analysis with Pajek. Cambridge University Press, New York

  • Farrugia M, Hurley N, Payne D, Quigley A (2011) Social network construction in the information age: views and perspectives. In: Ting IH, Hong ZP, Wang LSL (eds) Social network mining, analysis and research trends: techniques and applications. IGI Global, Pennsylvania. doi:10.4018/978-1-61350-513-7

  • Freeman L, Freeman S (1980) A semi-visible college: structural effects on a social networks group. In: Henderson MM, MacNaughton MJ (eds) Electronic communication: technology and impacts. Westview Press Inc, Boulder, pp 77–85

    Google Scholar 

  • Giles L, Smith M, Yen J, Zhang H (eds) (2010) Advances in social network mining and analysis, vol 5498. Springer, Berlin. doi:10.1007/978-3-642-14929-0

  • Gjoka M, Kurant M (2010) Walking in Facebook: a case study of unbiased sampling of OSNs. In: INFOCOM, 2010 Proceedings IEEE. IEEE, pp 1–9

  • Greene D, Cunningham P (2013) Producing a unified graph representation from multiple social network views. In: Proceedings of ACM Web Science

  • Hennig M, Brandes U, Pfeffer J, Mergel I (2013) Studying social networks: a guide to empirical research. Campus Verlag GmBH, Frankfurt

  • Isella L, Stehlé J, Barrat A, Cattuto C, Pinton J-F, Van den Broeck W (2011) What’s in a crowd? Analysis of face-to-face behavioral networks. J Theor Biol 271(1):166–180. doi:10.1016/j.jtbi.2010.11.033

  • Kapferer B (1972) Strategy and transaction in an African factory: African workers and Indian management in a Zambian town. Manchester University Press, London

    Google Scholar 

  • Kossinets G (2006) Effects of missing data in social networks. Soc Netw 28(3):247–268. doi:10.1016/j.socnet.2005.07.002

    Article  Google Scholar 

  • Lazega E (2001) The collegial phenomenon: the social mechanisms of co-operation among peers in a corporate law partnership. Oxford University Press, New York

    Book  Google Scholar 

  • Leskovec J, Huttenlocher D, Kleinberg J (2010) Signed networks in social media. In: Proceedings of the 28th international conference on Human factors in computing systems—CHI’10. ACM Press, New York, p 1361. doi:10.1145/1753326.1753532

  • Loomis CP, Morales JO, Clifford RA, Leonard OE (1953) Turrialba: social systems and the introduction of change. The Free Press, Glencoe

    Google Scholar 

  • Maniu S, Abdessalem T, Cautis B (2011) Casting a web of trust over Wikipedia: an interaction-based approach. In: Proceedings of the 20th international conference companion on World wide web, WWW ‘11. ACM, New York, pp 87–88

  • McAuley J, Leskovec J (2012) Learning to discover social circles in ego networks. Adv Neural Inf Process Syst 25:548–556

    Google Scholar 

  • Narayanan A, Shmatikov V (2009) De-anonymizing social networks. In: 2009 30th IEEE symposium on security and privacy. IEEE, pp 173–187. doi:10.1109/SP.2009.22

  • Newman MEJ (2006) Finding community structure in networks using the eigenvectors of matrices. Phys Rev E 74(3):036104. doi:10.1103/PhysRevE.74.036104

  • Newman MEJ (2010) Networks: an introduction. Oxford University Press, England. doi:10.1093/acprof:oso/9780199206650.001.0001

    Book  Google Scholar 

  • Opsahl T, Panzarasa P (2009) Clustering in weighted networks. Soc Netw 31(2):155–163. doi:10.1016/j.socnet.2009.02.002

    Article  Google Scholar 

  • Palla G, Farkas IJ, Pollner P, Derényi I, Vicsek T (2008) Fundamental statistical features and self-similar properties of tagged networks. New J Phys 10(12):123026. doi:10.1088/1367-2630/10/12/123026

    Article  Google Scholar 

  • Prosper Marketplace (2010) Prosper data export. http://www.prosper.com/tools/DataExport.aspx. October 2010. v1.2.6

  • Read K (1954) Cultures of the central highlands, New Guinea. Southwest J Anthropol 10:1–43

    Google Scholar 

  • Sampson S (1969) Crisis in a cloister. Dissertation, Cornell University

  • Seierstad C, Opsahl T (2011) For the few not the many? The effect of affirmative action on presence, prominence, and social capital of women directors in Norway. Scand J Manag 27(1):44–54

    Article  Google Scholar 

  • Stehlé J, Voirin N, Barrat A, Cattuto C, Isella L, Pinton J-F, Vanhems P (2011) High-resolution measurements of face-to-face contact patterns in a primary school. PloS One 6(8):e23176. doi:10.1371/journal.pone.0023176

  • Van De Bunt GG, Van Duijn MAJ, Snijders TAB (1999) Friendship networks through time: an actor-oriented dynamic statistical network model. Computat Math Organ Theory 5(2):167–192. doi:10.1023/A:1009683123448

    Article  MATH  Google Scholar 

  • Van Duijn MAJ, Zeggelink EPH, Huisman M, Stokman FN, Wasseur FW (2003) Evolution of sociology freshmen into a friendship network. J Math Sociol 27(2–3):153–191. doi:10.1080/00222500305889

    Article  Google Scholar 

  • Wasserman S, Faust K (1994) Social network analysis: methods and applications. Structural analysis in the social sciences, vol 8. Cambridge University Press, London

  • Zhao B, Sen P, Getoor L (2006) Entity and relationship labelling in affiliation networks. In: ICML workshop on Statistical Network Analysis

  • Zheleva E (2011) Prediction, evolution and privacy in social and affiliation networks. PhD Dissertation, University of Maryland, College Park

Download references

Acknowledgments

This research is funded under Irish Government PRTLI Cycle 5 Simulation Sciences Programme and is co-funded under the European Regional Development Fund of the European Union.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Diane Payne.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

O’Loughlin, E., Payne, D. Building rich social network data: a schema to assist in designing, collecting and evaluating social network data. Soc. Netw. Anal. Min. 4, 198 (2014). https://doi.org/10.1007/s13278-014-0198-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13278-014-0198-0

Keywords

Navigation