Skip to main content
Log in

Semantically interconnected social networks

  • Review Article
  • Published:
Social Network Analysis and Mining Aims and scope Submit manuscript

Abstract

Social network analysis aims to identify collaborations and helps people organize themselves through community participation and information sharing. The primary sources for social network modelling are explicit relationships such as co-authoring, citations, friendship, etc. However, to enable the integration of on-line community information and to fully describe the content and structure of community sites, secondary sources of information, such as documents, e-mails, blogs and discussions, can be exploited. In this paper we describe a methodology and a battery of tools to automatically extract from documents the relevant topics shared among community members and to analyse the evolution of the network also in terms of emergence and decay of collaboration themes. Experiments are conducted on a scientific network funded by the European Community, the INTEROP network of excellence, and on the United Kingdom research community in medical image understanding and analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Notes

  1. http://xmlns.com/foaf/spec.

  2. http://www.interop-vlab.eu.

  3. See http://www.iturls.com/English/TechHotspot/TH_DocCluster.asp for a list of text-related clustering applications.

  4. The INTEROP ontology can also be browsed at http://lcl.uniroma1.it/tav/choose.jsp.

  5. http://acs.lbl.gov/~hoschek/colt.

  6. http://www.cs.umn.edu/~karypis/cluto.

  7. http://www.cs.waikato.ac.nz/ml/weka.

  8. http://www.nlm.nih.gov/mesh/.

  9. http://jung.sourceforge.net.

  10. This information is available on the INTEROP-Vlab KMap site http://interop-vlab.eu/backoffice/km.

  11. http://interop-vlab.eu/ei_public_deliverables/interop-noe-deliverables/.

References

  • Baeza-Yates R, Ribeiro-Neto R (1999) Modern Information Retrieval. ACM Press Series/Addison Wesley, New York

    Google Scholar 

  • Berners-Lee T, Hendler J, Lassila O (2001) The semantic web. Scientific American, May

    Google Scholar 

  • Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Machine Learn Res 3:993–1022

    MATH  Google Scholar 

  • Bojars U, Breslin JG, Finn A, Decker S (2008) Using the semantic web for linking and reusing data across Web 2.0 communities. Web Seman Sci Services Agen World Wide Web 6(1):21–28

    Google Scholar 

  • Bollegala D, Matsuo Y, Ishiuka M (2007) Measuring semantic similarity between words using web search engines. In: Proceedings of the 16th international conference on world wide web, Banff, Alberta

  • Budanitsky A, Hirst G (2006) Evaluating WordNet-based measures of semantic distance. Comput Linguist 32(1):13–47

    Article  Google Scholar 

  • Chlia M, De Wilde P (2006) Internet search: subdivision-based interactive query expansion and the soft semantic web. Appl Soft Comput 6(4):372–383

    Article  Google Scholar 

  • Dhiraj J, Gatica-Perez D (2006) Discovering groups of people in google news. In: Proceedings of the 1st ACM International workshop on human-centered multimedia (HCM). Santa Barbara, CA

  • Domeniconi C, Al-Razgan M (2009) Weighted cluster ensembles: methods and analysis. In: ACM transactions on knowledge discovery from data, vol 2, No. 4

  • Eckart C, Young G (1936) The approximation of one matrix by another of lower rank. Psychometrika 1:211–218

    Article  MATH  Google Scholar 

  • Finin T, Ding L, Zhou L, Joshi A (2005) Social networking on the semantic web. In: The learning organization, Emerald pub, New York, pp 418–435

  • Fuhr N (1992) Probabilistic models in information retrieval. Comp J 35(3):243–255

    Article  MATH  Google Scholar 

  • Gruber T (2003) It is what it does: the pragmatics of ontology. Invited presentation to the meeting of the CIDOC Conceptual Reference Model committee, Smithsonian Museum, Washington

  • Hammouda K, Kamel M (2004) Efficient phrase-based document indexing for web document clustering. IEEE Trans Knowl Data Eng (TKDE) 16:1279–1296

    Article  Google Scholar 

  • Hansen M, Yu B (2001) Model selection and the principle of minimum description length. J Am Stat Assoc 96:746–774

    Article  MathSciNet  MATH  Google Scholar 

  • Ha-Tuc V, Srinivasan P (2008) Topic models and a revisit of text-related applications. In: Proceedings of conference on information and knowledge management, Napa Valley, CA, pp 25–32

  • Hirst G, Budanitsky A (2001) Lexical chains and semantic distance. In: Proceedings of EUROLAN-2001, Iasi, Romania

  • Hirst G, St-Onge D (1998) Lexical chains as representations of context for the detection and correction of malapropisms. In: Fellbaum C (ed) WordNet: an electronic lexical database. MIT Press, USA, pp 305–332

    Google Scholar 

  • Jain K, Murty M, Flynn P (1999) Data clustering: a review. In: ACM computing surveys, vol 31, No. 3. pp 264–323

  • Jamali M, Abolhhassani H (2006) Different aspects of social network analysis. In: Proceedings of the 2006 IEEE-WIC-ACM international conference on web intelligence, Hong Kong, pp 66–72

  • Jiang J, Conrath D (1997) Semantic similarity based on corpus statistics and lexical taxonomy. In: Proceedings of international conference on research in computational linguistics, Taiwan

  • Jung J, Euzenat J (2007) Towards semantic social networks. In: Proceedings of the European semantic web conference (ESWC), Innsbruck, Austria, pp 267–280

  • Kang S (2003) Keyword-based document clustering. In: Proceedings of the 6th international workshop on information retrieval with Asian languages, vol 11. Japan, pp 132–137

  • Kanungo T, Mount DM, Netanyahu N, Piatko C, Silverman R, Wu AY (2002) An efficient k-means clustering algorithm: Analysis and implementation. IEEE Trans Pattern Anal Machine Intell 24:881–892

    Article  Google Scholar 

  • Kleinberg J (2002) An impossibility theorem for clustering. In: Advances in neural information processing systems 15: Proceedings of the 2002 conference. Bradford Books, pp 446–453

  • Kovacs F, Legany C, Babos A (2005) Cluster validity measurement techniques. In: Proceedings of 6th international symposium of Hungarian researchers on computational intelligence. Budapest, Hungary

  • Kuhn A, Ducasse S, Girba T (2007) Semantic clustering: identifying topics in source code. In: Journal of Information and software technology, vol 49, no. 3. pp 230–243

  • Landauer TK, McNamara DS, Dennis S, Kintsch W (eds) (2007) Handbook of latent semantic analysis, Lawrence Erlbaum Associates Inc., Mahwah

  • Leacock C, Chodorow M (1998) Combining local context and WordNet similarity for word sense identification. In: Fellbaum C (ed) WordNet: an electronic lexical database. MIT Press, USA, pp 265–283

    Google Scholar 

  • Lin D (1998) An information-theoretic definition of similarity. In: Proceedings of the 15th international conference on machine learning. Madison, USA

  • Macherey W, Viechtbauer J, Ney H (2002) Probabilistic retrieval based on document representations. In: Proceedings of the international conference on spoken language processing, Denver, CO, pp 1481–1484

  • McCallum A, Corrada-Emmanuel A, Wang X (2005) Topic and role discovery in social networks. In: Proceedings of international joint conference on artificial intelligence (IJCAI), Edinburgh, pp 786–791

  • Mei Q, Cai D, Zhang D, Zhai C (2008) Topic modeling with network regularization. In: Proceedings of WWW 2008, April 21–25, 2008 Beijing, China

  • Mika P (2007) Social networks and the semantic web, series in semantic web and beyond, vol 5. Springer, Berlin

  • Nallapati R, Ahmed A, Xing E, Cohen WW (2008) Joint latent topic models for texts and citations. In: Proceedings of KDD 2008, August 24–27, 2008, las Vegas, Nevada, USA

  • Navigli R, Crisafulli G (2010) Inducing word senses to improve web search result clustering. In: Proceedings of the 2010 conference on empirical methods in natural language processing (EMNLP 2010), MIT Stata Center, Massachusets, pp 116–126

  • Navigli R, Velardi P (2008) From glossaries to ontologies: extracting semantic structure from textual definitions. Ontology learning and population: bridging the gap between text and knowledge. In: Buitelaar P, Cimiano P (eds) Series information for frontiers in artificial intelligence and applications, IOS Press, Amsterdam, pp 71–87

  • Nenadic G, Rice S, Spasic I, Ananiadou S, Sy B (2003) Selecting text features for gene name classification: from documents to terms. In: Proceedings of the ACL workshop on NLP in biomedicine, vol 13. Sapporo, Japan, pp 121–128

  • Newman MEJ (2003) The structure and function of complex networks. SIAM Rev 45:167–256

    Article  MathSciNet  MATH  Google Scholar 

  • Pedersen T, Pakhomov SV, Patwardhan S, Chute CG (2007) Measures of semantic similarity and relatedness in the biomedical domain. J Biomed Inform 40(3):288–299

    Article  Google Scholar 

  • Ponzetto SP, Strube M (2007) Knowledge derived from Wikipedia for computing semantic relatedness. J Artificial Intell Res 30(1):181–212

    MATH  Google Scholar 

  • Purandare A, Pedersen T (2004) Word sense discrimination by clustering contexts in vector and similarity spaces. In: Proceedings of the conference on computational natural language learning (CoNLL), May 6–7, 2004, Boston, MA, pp 41–48

  • Resnik P (1999) Semantic similarity in a taxonomy: an information-based measure and its application to problems of ambiguity in natural language. J Artificial Intell Res 11:95–130

    MATH  Google Scholar 

  • Russo V (2007) State of the art of clustering techniques: support vector methods and minimum Bregman information principle, Master Thesis, University of Napoli “Federico II”, Computer Science Dept

  • Salton G, Mcgill M (1983) An Introduction to modern information retrieval. McGraw-Hill, New York

    Google Scholar 

  • Sclano F, Velardi P (2007) TermExtractor: a web application to learn the common terminology of Interest Groups and Research Communities. In: Proceedings of 9th conference on terminology and artificial intelligence (TIA 2007), Sophia Antinopolis

  • Scott J (2000) Social network analysis. SAGE Publications, Chennai

  • Staab S, Studer R (2009) Handbook on ontologies. Springer, Berlin

  • Sussna M (1993) Word sense disambiguation for free-text indexing using a massive semantic network. In: Proceedings of the second international conference on information and knowledge management, Washington, DC, USA, pp 67–74

  • Tagarelli AY, Karypis G (2008) A segment-based approach to clustering multi-topic documents. In: Proceedings of SIAM data mining conference text mining workshop, Atlanta, Georgia, USA

  • Tan P, Steinbach M, Kumar V (2006) Cluster analysis: basic concepts and algorithms. In: Introduction to data mining. Addison-Wensley, New York

  • Terra E, Clarke CL (2003) Frequency estimates for statistical word similarity measures. In: Proceedings of the 2003 Conference of the North American chapter of the ACL on HLT (NAACL ‘03), Morristown, NJ, pp 165–172

  • Velardi P, Cucchiarelli A, Petit M (2007) A taxonomy learning method and its application to characterize a scientific web community. IEEE Trans Data Knowl Eng (TDKE) 19(2):180–191

    Article  Google Scholar 

  • Velardi P, Navigli R, D’Amadio P (2008a) Mining the web to create specialized glossaries. IEEE Intell Syst 23:5

    Article  Google Scholar 

  • Velardi P, Cucchiarelli A, D’Antonio F (2008b) Monitoring the status of a reserach community through a knowledge map, web intelligence, agent systems. Int J 6(3):1–22

    Google Scholar 

  • Wasserman S, Faust K (1994) Social network analysis: methods and applications. Cambridge University Press, UK

    Google Scholar 

  • Weeds J, Weir D (2006) Co-occurrence retrieval: a flexible framework for lexical distributional similarity. Comput Linguist 31(4):439–475

    Article  Google Scholar 

  • Wood M (2005) Bootstrapped confidence intervals as an approach to statistical inference. Organ Res Methods 8(4):454–470

    Article  Google Scholar 

  • Wu Z, Palmer M (1994) Verb semantics and lexical selection. In: Proceedings of 32nd annual meeting of the association for computational linguistics (ACL), Las Cruces, New Mexico, USA, pp 133–138

  • Zhao Y, Karypis G (2004) Empirical and theoretical comparisons of selected criterion functions for document clustering. Machine Learn 55(3):311–331

    Article  MATH  Google Scholar 

  • Zhao Y, Karypis G (2005) Hierarchical clustering algorithms for document datasets. Data Min Knowl Disc 10:141–168

    Article  MathSciNet  Google Scholar 

  • Zhong M, Chen Z, Lin Y, Yao J (2004) Using classification and key phrases extraction for information retrieval. In: Proceedings of 5th World Congress on intelligent control and automation, June 15–19, 2004, Hangzhou, China

  • Zhou D, Ji X, Zha H, Giles CL (2006) Topic evolution and social interactions: how authors effect research. In: Proceedings of CIKM 2006, November 5–11, 2006, Arlington, Virginia, USA

Download references

Acknowledgments

The authors wish to thank Vincenzo Casini for his help in developing the GVI tool.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alessandro Cucchiarelli.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cucchiarelli, A., D’Antonio, F. & Velardi, P. Semantically interconnected social networks. Soc. Netw. Anal. Min. 2, 69–95 (2012). https://doi.org/10.1007/s13278-011-0030-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13278-011-0030-z

Keywords

Navigation