Abstract
Co-citation analysis can be exploited as a bibliometric technique used for mining information on the relationships between scientific papers. Proposed methods rely, however, on co-citation counting techniques that slightly take the semantic aspect into consideration. The present study proposes a new technique based on the measure of Semantic Similarity (SS) between the titles of co-cited papers. Several computational measures rely on knowledge resources to quantify the semantic similarity, such as the WordNet «is a» taxonomy. Our proposal analyzes the SS between the titles of co-cited papers using word-based SS measures. Two major analytical experiments are performed: the first includes the benchmarks designed for testing word-based SS measures; the second exploits the dataset DBLP (DBLP: Digital Bibliography & Library Project.) citation network. As a result, we found the SS measures behave the same as human judgement for the lexical similarity and can be consequently used for the automatic assessment of similarity between co-cited papers. The analysis of highly repeated co-citations demonstrates that the different SS measures display almost similar behaviours, with slight differences due to the distribution of the provided SS values. Furthermore, we note a low percentage of similar referred papers into the co-citations.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
Stanford CoreNLP provides a set of natural language tools for treating the text and gives the base forms of words, their parts of speech. http://nlp.stanford.edu/software/corenlp.shtml.
- 2.
- 3.
- 4.
References
Ben Aouicha, M., Hadj Taieb, M.A., Ben Hamadou, A.: LWCR: multi-layered Wikipedia representation for computing word relatedness. Neurocomputing 216, 816–843 (2016)
Braam, R.R., Moed, H.F., Van Raan, A.F.: Mapping of science by combined co-citation and word analysis I. Structural aspects. J. Am. Soc. Inf. Sci. 42(4), 233 (1991a)
Braam, R.R., Moed, H.F., Van Raan, A.F.: Mapping of science by combined co-citation and word analysis II. Dynamical aspects. J. Am. Soc. Inf. Sci. 42(4), 252 (1991b)
Chen, C.: Visualising semantic spaces and author co-citation networks in digital libraries. Inf. Process. Manag. 35(3), 401–420 (1999)
Chen, C., Song, I.Y., Zhu, W.: Trends in conceptual modeling: citation analysis of the ER conference papers (1979–2005). In: Proceedings of the 11th International Conference on the International Society for Scientometrics and Informetrics, pp. 189–200 (2007)
Elkiss, A., Shen, S., Fader, A., Erkan, G., States, D., Radev, D.: Blind men and elephants: what do citation summaries tell us about a research article? J. Assoc. Inf. Sci. Technol. 59(1), 51–62 (2008)
Eto, M.: Evaluations of context-based co-citation searching. Scientometrics 94(2), 651–673 (2013)
Fellbaum, C.: WordNet: An Electronic Lexical Database (Language, Speech, and Communication), illustrated edn. MIT Press, Cambridge (1998)
Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using wikipedia-based explicit semantic analysis. IJCAI 7, 1606–1611 (2007)
Gao, J.B., Zhang, B.W., Chen, X.H.: A WordNet-based semantic similarity measurement combining edge-counting and information content theory. Eng. Appl. Artif. Intell. 39, 80–88 (2015)
Hadj Taieb, M.A., Ben Aouicha, M., Ben Hamadou, A.: A new semantic relatedness measurement using WordNet features. Knowl. Inf. Syst. 41(2), 467–497 (2014b)
Hadj Taieb, M.A., Ben Aouicha, M., Ben Hamadou, A.B.: Ontology-based approach for measuring semantic similarity. Eng. Appl. Artif. Intell. 36, 238–261 (2014a)
Haggan, M.: Research paper titles in literature, linguistics and science: dimensions of attraction. J. Pragmat. 36(2), 293–317 (2004)
Hao, D., Zuo, W., Peng, T., He, F.: An approach for calculating semantic similarity between words using WordNet. In: 2011 Second International Conference on Digital Manufacturing and Automation (ICDMA), pp. 177–180. IEEE (2011)
Hou, J., Yang, X., Chen, C.: Emerging trends and new developments in information science: a document co-citation analysis (2009–2016). Scientometrics 115(2), 869–892 (2018)
Jeong, Y.K., Song, M., Ding, Y.: Content-based author co-citation analysis. J. Informetr. 8(1), 197–211 (2014)
Letchford, A., Moat, H.S., Preis, T.: The advantage of short paper titles. Roy. Soc. Open Sci. 2(8), 150266 (2015)
Li, Y., Bandar, Z.A., McLean, D.: An approach for measuring semantic similarity between words using multiple information sources. IEEE Trans. Knowl. Data Eng. 15(4), 871–882 (2003)
Lin, D.: An information-theoretic definition of similarity. In: ICML, pp. 296–304 (1998)
Liu, X.Y., Zhou, Y.M., Zheng, R.S.: Measuring semantic similarity in WordNet. In: 2007 International Conference on Machine Learning and Cybernetics, vol. 6, pp. 3431–3435. IEEE (2007)
Magerman, T., Van Looy, B., Song, X.: Exploring the feasibility and accuracy of Latent Semantic Analysis based text mining techniques to detect similarity between patent documents and scientific publications. Scientometrics 82(2), 289–306 (2010)
Manning, C., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S., McClosky, D.: The Stanford CoreNLP natural language processing toolkit. In: Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55–60 (2014)
Meng, L., Gu, J.: A new model for measuring word sense similarity in WordNet. In: Proceedings of the 4th International Conference on Advanced Communication and Networking. SERSC, Jeju, Korea, pp. 18–23 (2012)
Meng, L., Gu, J., Zhou, Z.: A new model of information content based on concept’s topology for measuring semantic similarity in WordNet. Int. J. Grid Distrib. Comput. 5(3), 81–94 (2012)
Merrill, E., Knipps, A.: What’s in a title? J. Wildlife Manag. 78(5), 761–762 (2014)
Robertson, S.E., Sparck Jones, K.: Document retrieval systems. In: Willett, P. (ed.) Relevance Weighting of Search Terms, pp. 143–160. Taylor Graham Publishing, London (1988)
Sánchez, D., Batet, M., Isern, D.: Ontology-based information content computation. Knowl.-Based Syst. 24(2), 297–303 (2011)
Small, H.: Co-citation context analysis and the structure of paradigms. J. Document. 36(3), 183–196 (1980)
Small, H.: Co-citation in the scientific literature: a new measure of the relationship between two documents. J. Assoc. Inf. Sci. Technol. 24(4), 265–269 (1973)
Small, H.G.: A co-citation model of a scientific specialty: a longitudinal study of collagen research. Soc. Stud. Sci. 7(2), 139–166 (1977)
Small, H.: Macro-level changes in the structure of co-citation clusters: 1983–1989. Scientometrics 26(1), 5–20 (1993)
Small, H.: The synthesis of specialty narratives from co-citation clusters. J. Am. Soc. Inf. Sci. 37(3), 97–110 (1986)
Small, H., Sweeney, E.: Clustering the science citation index® using co-citations: I. A comparison of methods. Scientometrics 7(3–6), 391–409 (1985)
Small, H., Sweeney, E., Greenlee, E.: Clustering the science citation index using co-citations. II. Mapping science. Scientometrics 8(5–6), 321–340 (1985)
Sternitzke, C., Bergmann, I.: Similarity measures for document mapping: a comparative study on the level of an individual scientist. Scientometrics 78(1), 113–130 (2009)
Sullivan, D., Koester, D., White, D., Kern, R.: Understanding rapid theoretical change in particle physics: a month-by-month co-citation analysis. Scientometrics 2(4), 309–319 (1980)
Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., Su, Z.: Arnetminer: extraction and mining of academic social networks. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 990–998. ACM (2008)
Thijs, B., Glänzel, W.: The contribution of the lexical component in hybrid clustering, the case of four decades of “Scientometrics”. Scientometrics 115(1), 21–33 (2018)
van Eck, N.J., Waltman, L.: Citation-based clustering of publications using CitNetExplorer and VOSviewer. Scientometrics 111(2), 1053–1070 (2017)
Wang, N., Liang, H., Jia, Y., Ge, S., Xue, Y., Wang, Z.: Cloud computing research in the IS discipline: a citation/co-citation analysis. Decis. Supp. Syst. 86, 35–47 (2016)
Wang, T., Hirst, G.: Refining the notions of depth and density in wordnet-based semantic similarity measures. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1003–1011. Association for Computational Linguistics (2011)
Wang, X., Zhao, Y., Liu, R., Zhang, J.: Knowledge-transfer analysis based on co-citation clustering. Scientometrics 97(3), 859–869 (2013). https://doi.org/10.1007/s11192-013-1077-6
Whittaker, J.: Creativity and conformity in science: titles, keywords and co-word analysis. Soc. Stud. Sci. 19(3), 473–496 (1989)
Wu, Z., Palmer, M.: Verbs semantics and lexical selection. In: Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics, pp. 133–138. Association for Computational Linguistics (1994)
Zhou, Z., Wang, Y., Gu, J.: New model of semantic similarity measuring in WordNet. In: 3rd International Conference on Intelligent System and Knowledge Engineering, ISKE 2008, vol. 1, pp. 256–261. IEEE (2008)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Hadj Taieb, M.A., Ben Aouicha, M., Turki, H. (2021). Paper Co-citation Analysis Using Semantic Similarity Measures. In: Abraham, A., Siarry, P., Ma, K., Kaklauskas, A. (eds) Intelligent Systems Design and Applications. ISDA 2019. Advances in Intelligent Systems and Computing, vol 1181. Springer, Cham. https://doi.org/10.1007/978-3-030-49342-4_26
Download citation
DOI: https://doi.org/10.1007/978-3-030-49342-4_26
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-49341-7
Online ISBN: 978-3-030-49342-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)