Skip to main content

Paper Co-citation Analysis Using Semantic Similarity Measures

  • Conference paper
  • First Online:

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1181))

Abstract

Co-citation analysis can be exploited as a bibliometric technique used for mining information on the relationships between scientific papers. Proposed methods rely, however, on co-citation counting techniques that slightly take the semantic aspect into consideration. The present study proposes a new technique based on the measure of Semantic Similarity (SS) between the titles of co-cited papers. Several computational measures rely on knowledge resources to quantify the semantic similarity, such as the WordNet «is a» taxonomy. Our proposal analyzes the SS between the titles of co-cited papers using word-based SS measures. Two major analytical experiments are performed: the first includes the benchmarks designed for testing word-based SS measures; the second exploits the dataset DBLP (DBLP: Digital Bibliography & Library Project.) citation network. As a result, we found the SS measures behave the same as human judgement for the lexical similarity and can be consequently used for the automatic assessment of similarity between co-cited papers. The analysis of highly repeated co-citations demonstrates that the different SS measures display almost similar behaviours, with slight differences due to the distribution of the provided SS values. Furthermore, we note a low percentage of similar referred papers into the co-citations.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Stanford CoreNLP provides a set of natural language tools for treating the text and gives the base forms of words, their parts of speech. http://nlp.stanford.edu/software/corenlp.shtml.

  2. 2.

    https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html.

  3. 3.

    http://wordnet.princeton.edu/.

  4. 4.

    http://aminer.org/billboard/DBLP_Citation.

References

  • Ben Aouicha, M., Hadj Taieb, M.A., Ben Hamadou, A.: LWCR: multi-layered Wikipedia representation for computing word relatedness. Neurocomputing 216, 816–843 (2016)

    Google Scholar 

  • Braam, R.R., Moed, H.F., Van Raan, A.F.: Mapping of science by combined co-citation and word analysis I. Structural aspects. J. Am. Soc. Inf. Sci. 42(4), 233 (1991a)

    Google Scholar 

  • Braam, R.R., Moed, H.F., Van Raan, A.F.: Mapping of science by combined co-citation and word analysis II. Dynamical aspects. J. Am. Soc. Inf. Sci. 42(4), 252 (1991b)

    Google Scholar 

  • Chen, C.: Visualising semantic spaces and author co-citation networks in digital libraries. Inf. Process. Manag. 35(3), 401–420 (1999)

    Google Scholar 

  • Chen, C., Song, I.Y., Zhu, W.: Trends in conceptual modeling: citation analysis of the ER conference papers (1979–2005). In: Proceedings of the 11th International Conference on the International Society for Scientometrics and Informetrics, pp. 189–200 (2007)

    Google Scholar 

  • Elkiss, A., Shen, S., Fader, A., Erkan, G., States, D., Radev, D.: Blind men and elephants: what do citation summaries tell us about a research article? J. Assoc. Inf. Sci. Technol. 59(1), 51–62 (2008)

    Google Scholar 

  • Eto, M.: Evaluations of context-based co-citation searching. Scientometrics 94(2), 651–673 (2013)

    Google Scholar 

  • Fellbaum, C.: WordNet: An Electronic Lexical Database (Language, Speech, and Communication), illustrated edn. MIT Press, Cambridge (1998)

    Google Scholar 

  • Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using wikipedia-based explicit semantic analysis. IJCAI 7, 1606–1611 (2007)

    Google Scholar 

  • Gao, J.B., Zhang, B.W., Chen, X.H.: A WordNet-based semantic similarity measurement combining edge-counting and information content theory. Eng. Appl. Artif. Intell. 39, 80–88 (2015)

    Google Scholar 

  • Hadj Taieb, M.A., Ben Aouicha, M., Ben Hamadou, A.: A new semantic relatedness measurement using WordNet features. Knowl. Inf. Syst. 41(2), 467–497 (2014b)

    Google Scholar 

  • Hadj Taieb, M.A., Ben Aouicha, M., Ben Hamadou, A.B.: Ontology-based approach for measuring semantic similarity. Eng. Appl. Artif. Intell. 36, 238–261 (2014a)

    Google Scholar 

  • Haggan, M.: Research paper titles in literature, linguistics and science: dimensions of attraction. J. Pragmat. 36(2), 293–317 (2004)

    Google Scholar 

  • Hao, D., Zuo, W., Peng, T., He, F.: An approach for calculating semantic similarity between words using WordNet. In: 2011 Second International Conference on Digital Manufacturing and Automation (ICDMA), pp. 177–180. IEEE (2011)

    Google Scholar 

  • Hou, J., Yang, X., Chen, C.: Emerging trends and new developments in information science: a document co-citation analysis (2009–2016). Scientometrics 115(2), 869–892 (2018)

    Google Scholar 

  • Jeong, Y.K., Song, M., Ding, Y.: Content-based author co-citation analysis. J. Informetr. 8(1), 197–211 (2014)

    Google Scholar 

  • Letchford, A., Moat, H.S., Preis, T.: The advantage of short paper titles. Roy. Soc. Open Sci. 2(8), 150266 (2015)

    Google Scholar 

  • Li, Y., Bandar, Z.A., McLean, D.: An approach for measuring semantic similarity between words using multiple information sources. IEEE Trans. Knowl. Data Eng. 15(4), 871–882 (2003)

    Google Scholar 

  • Lin, D.: An information-theoretic definition of similarity. In: ICML, pp. 296–304 (1998)

    Google Scholar 

  • Liu, X.Y., Zhou, Y.M., Zheng, R.S.: Measuring semantic similarity in WordNet. In: 2007 International Conference on Machine Learning and Cybernetics, vol. 6, pp. 3431–3435. IEEE (2007)

    Google Scholar 

  • Magerman, T., Van Looy, B., Song, X.: Exploring the feasibility and accuracy of Latent Semantic Analysis based text mining techniques to detect similarity between patent documents and scientific publications. Scientometrics 82(2), 289–306 (2010)

    Google Scholar 

  • Manning, C., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S., McClosky, D.: The Stanford CoreNLP natural language processing toolkit. In: Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55–60 (2014)

    Google Scholar 

  • Meng, L., Gu, J.: A new model for measuring word sense similarity in WordNet. In: Proceedings of the 4th International Conference on Advanced Communication and Networking. SERSC, Jeju, Korea, pp. 18–23 (2012)

    Google Scholar 

  • Meng, L., Gu, J., Zhou, Z.: A new model of information content based on concept’s topology for measuring semantic similarity in WordNet. Int. J. Grid Distrib. Comput. 5(3), 81–94 (2012)

    Google Scholar 

  • Merrill, E., Knipps, A.: What’s in a title? J. Wildlife Manag. 78(5), 761–762 (2014)

    Google Scholar 

  • Robertson, S.E., Sparck Jones, K.: Document retrieval systems. In: Willett, P. (ed.) Relevance Weighting of Search Terms, pp. 143–160. Taylor Graham Publishing, London (1988)

    Google Scholar 

  • Sánchez, D., Batet, M., Isern, D.: Ontology-based information content computation. Knowl.-Based Syst. 24(2), 297–303 (2011)

    Google Scholar 

  • Small, H.: Co-citation context analysis and the structure of paradigms. J. Document. 36(3), 183–196 (1980)

    Google Scholar 

  • Small, H.: Co-citation in the scientific literature: a new measure of the relationship between two documents. J. Assoc. Inf. Sci. Technol. 24(4), 265–269 (1973)

    MathSciNet  Google Scholar 

  • Small, H.G.: A co-citation model of a scientific specialty: a longitudinal study of collagen research. Soc. Stud. Sci. 7(2), 139–166 (1977)

    Google Scholar 

  • Small, H.: Macro-level changes in the structure of co-citation clusters: 1983–1989. Scientometrics 26(1), 5–20 (1993)

    Google Scholar 

  • Small, H.: The synthesis of specialty narratives from co-citation clusters. J. Am. Soc. Inf. Sci. 37(3), 97–110 (1986)

    Google Scholar 

  • Small, H., Sweeney, E.: Clustering the science citation index® using co-citations: I. A comparison of methods. Scientometrics 7(3–6), 391–409 (1985)

    Google Scholar 

  • Small, H., Sweeney, E., Greenlee, E.: Clustering the science citation index using co-citations. II. Mapping science. Scientometrics 8(5–6), 321–340 (1985)

    Google Scholar 

  • Sternitzke, C., Bergmann, I.: Similarity measures for document mapping: a comparative study on the level of an individual scientist. Scientometrics 78(1), 113–130 (2009)

    Google Scholar 

  • Sullivan, D., Koester, D., White, D., Kern, R.: Understanding rapid theoretical change in particle physics: a month-by-month co-citation analysis. Scientometrics 2(4), 309–319 (1980)

    Google Scholar 

  • Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., Su, Z.: Arnetminer: extraction and mining of academic social networks. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 990–998. ACM (2008)

    Google Scholar 

  • Thijs, B., Glänzel, W.: The contribution of the lexical component in hybrid clustering, the case of four decades of “Scientometrics”. Scientometrics 115(1), 21–33 (2018)

    Google Scholar 

  • van Eck, N.J., Waltman, L.: Citation-based clustering of publications using CitNetExplorer and VOSviewer. Scientometrics 111(2), 1053–1070 (2017)

    Google Scholar 

  • Wang, N., Liang, H., Jia, Y., Ge, S., Xue, Y., Wang, Z.: Cloud computing research in the IS discipline: a citation/co-citation analysis. Decis. Supp. Syst. 86, 35–47 (2016)

    Google Scholar 

  • Wang, T., Hirst, G.: Refining the notions of depth and density in wordnet-based semantic similarity measures. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1003–1011. Association for Computational Linguistics (2011)

    Google Scholar 

  • Wang, X., Zhao, Y., Liu, R., Zhang, J.: Knowledge-transfer analysis based on co-citation clustering. Scientometrics 97(3), 859–869 (2013). https://doi.org/10.1007/s11192-013-1077-6

    Article  Google Scholar 

  • Whittaker, J.: Creativity and conformity in science: titles, keywords and co-word analysis. Soc. Stud. Sci. 19(3), 473–496 (1989)

    MathSciNet  Google Scholar 

  • Wu, Z., Palmer, M.: Verbs semantics and lexical selection. In: Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics, pp. 133–138. Association for Computational Linguistics (1994)

    Google Scholar 

  • Zhou, Z., Wang, Y., Gu, J.: New model of semantic similarity measuring in WordNet. In: 3rd International Conference on Intelligent System and Knowledge Engineering, ISKE 2008, vol. 1, pp. 256–261. IEEE (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohamed Ali Hadj Taieb .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hadj Taieb, M.A., Ben Aouicha, M., Turki, H. (2021). Paper Co-citation Analysis Using Semantic Similarity Measures. In: Abraham, A., Siarry, P., Ma, K., Kaklauskas, A. (eds) Intelligent Systems Design and Applications. ISDA 2019. Advances in Intelligent Systems and Computing, vol 1181. Springer, Cham. https://doi.org/10.1007/978-3-030-49342-4_26

Download citation

Publish with us

Policies and ethics