Abstract
Investigating the intricate relationship between citation similarity and the citation interval offers vital insights for refining citation recommendation systems and enhancing citation evaluation models. This is also a new perspective for understanding citation patterns. In this study, we used the Library and Information Science (LIS) field as an example to determine and discuss the correlation between citation similarity and the citation interval. Using the methods of data collection, paper title preprocessing, text vectorization based on simCSE, calculation of citation similarity and the citation interval, and calculation of the index per citing paper, this study found the following LIS domain-based results: (i) there is a significant negative correlation between citation similarity and the citation interval, but the correlation coefficient is low. (ii) The citation intervals of the least relevant series of cited papers exhibit a more pronounced susceptibility to citation similarity than the most relevant series of cited papers. (iii) The citation intervals of the most relevant cited papers are more concentrated within 12 years and more likely to be published within the average citation interval, typically from the newer half of the cited paper list and published later within 5 years of the citation half-life. This study concludes that researchers usually pay more attention to the latest and most cutting-edge and strongly relevant existing research than to weakly relevant existing research. Continuous attention and timely incorporation of knowledge into the research direction will promote a more rapid and specialized diffusion of knowledge. These findings are influenced by the accelerated dissemination of information via Internet, heightened academic competition, and the concentration of research endeavors in specialized disciplines. This study not only contributes to the scholarly discussion of citation analysis but also lays the foundation for future exploration and understanding of citation patterns.






Similar content being viewed by others
References
Aistleitner, M., Kapeller, J., & Steinerberger, S. (2019). Citation patterns in economics and beyond. Science in Context, 32(4), 361–380.
Aksnes, D. W., Langfeldt, L., & Wouters, P. (2019). Citations, citation indicators, and research quality: An overview of basic concepts and theories. Sage Open, 9(1). https://doi.org/10.1177/2158244019829575
Ali, Z., Qi, G., Kefalas, P., Khusro, S., Khan, I., & Muhammad, K. (2022). SPR-SMN: Scientific paper recommendation employing SPECTER with memory network. Scientometrics, 127(11), 6763–6785.
Beel, J., Gipp, B., Langer, S., & Breitinger, C. (2016). Paper recommender systems: A literature survey. International Journal on Digital Libraries, 17, 305–338.
Bornmann, L., Haunschild, R., & Leydesdorff, L. (2018). Reference publication year spectroscopy (RPYS) of Eugene Garfield’s publications. Scientometrics, 114, 439–448.
Bornmann, L., Tekles, A., Zhang, H. H., & Fred, Y. Y. (2019). Do we measure novelty when we analyze unusual combinations of cited references? A validation study of bibliometric novelty indicators based on F1000Prime data. Journal of Informetrics, 13(4), 100979.
Buscaldi, D., Dessí, D., Motta, E., Murgia, M., Osborne, F., & Recupero, D. R. (2024). Citation prediction by leveraging transformers and natural language processing heuristics. Information Processing and Management, 61(1), 103583.
Chen, C. (2006). CiteSpace II: Detecting and visualizing emerging trends and transient patterns in scientific literature. Journal of the American Society for Information, Science and Technology, 57(3), 359–377.
Chen, L. (2017). Do patent citations indicate knowledge linkage? The evidence from text similarities between patents and their citations. Journal of Informetrics, 11(1), 63–79.
Cui, Y., Wang, Y., Liu, X., Wang, X., & Zhang, X. (2023). Multidimensional scholarly citations: Characterizing and understanding scholars’ citation behaviors. Journal of the Association for Information Science and Technology, 74(1), 115–127.
Ding, J., Liu, C., & Yuan, Y. (2023). The characteristics of knowledge diffusion of library and information science—From the perspective of citation. Library Hi Tech, 41(4), 1099–1118.
Dixon, W. J. (1950). Analysis of extreme values. The Annals of Mathematical Statistics, 21(4), 488–506.
Ethayarajh, K. (2019). How contextual are contextualized word representations? Comparing the geometry of BERT, ELMo, and GPT-2 embeddings. arXiv preprint arXiv:1909.00512
Fleming, L. (2001). Recombinant uncertainty in technological search. Management Science, 47(1), 117–132.
Gao, T., Yao, X., & Chen, D. (2021). SimCSE: Simple contrastive learning of sentence embeddings. arXiv preprint arXiv:2104.08821
Garfield, E., & Merton, R. K. (1979). Citation indexing: Its theory and application in science, technology, and humanities (Vol. 8). Wiley.
Hwa, R. (2004). Sample selection for statistical parsing. Computational Linguistics, 30(3), 253–276.
Jatnika, D., Bijaksana, M. A., & Suryani, A. A. (2019). Word2Vec model analysis for semantic similarities in English words. Procedia Computer Science, 157, 160–167.
Järvelin, K., Chang, Y. W., & Vakkari, P. (2023). Characteristics of LIS research articles affecting their citation impact. Journal of Librarianship and Information Science. https://doi.org/10.1177/09610006231196344
Jurgens, D., Kumar, S., Hoover, R., McFarland, D., & Jurafsky, D. (2018). Measuring the evolution of a scientific field through citation frames. Transactions of the Association for Computational Linguistics, 6, 391–406.
Kammari, M. (2023). Time-stamp based network evolution model for citation networks. Scientometrics, 128(6), 3723–3741.
Kim, M., Baek, I., & Song, M. (2018). Topic diffusion analysis of a weighted citation network in biomedical literature. Journal of the Association for Information Science and Technology, 69(2), 329–342.
Kuhn, T. S. (1970). The structure of scientific revolutions (Vol. 111). University of Chicago Press.
Liang, G., Hou, H., Ding, Y., & Hu, Z. (2020). Knowledge recency to the birth of Nobel Prize-winning articles: Gender, career stage, and country. Journal of Informetrics, 14(3), 101053.
Liu, Y., & Chen, M. (2021). Applying text similarity algorithm to analyze the triangular citation behavior of scientists. Applied Soft Computing, 107, 107362.
Lu, Y., Yuan, M., Liu, J., & Chen, M. (2023). Research on semantic representation and citation recommendation of scientific papers with multiple semantics fusion. Scientometrics, 128(2), 1367–1393.
Marx, W., Bornmann, L., Barth, A., & Leydesdorff, L. (2014). Detecting the historical roots of research fields by reference publication year spectroscopy (RPYS). Journal of the Association for Information Science and Technology, 65(4), 751–764.
Nassiri, I., Masoudi-Nejad, A., Jalili, M., & Moeini, A. (2013). Normalized similarity index: An adjusted index to prioritize article citations. Journal of Informetrics, 7(1), 91–98.
Niraula, N., Banjade, R., Ştefănescu, D., & Rus, V. (2013). Experiments with semantic similarity measures based on LDA and LSA. In Statistical language and speech processing: First international conference, SLSP 2013: Proceedings 1, Tarragona, Spain, July 29–31, 2013 (pp. 188–199). Springer.
Pagani, R. N., Kovaleski, J. L., & Resende, L. M. (2015). Methodi Ordinatio: A proposed methodology to select and rank relevant scientific papers encompassing the impact factor, number of citations, and year of publication. Scientometrics, 105, 2109–2135.
Petruzzelli, A. M., Ardito, L., & Savino, T. (2018). Maturity of knowledge inputs and innovation value: The moderating effect of firm age and size. Journal of Business Research, 86, 190–201.
Pornprasit, C., Liu, X., Kiattipadungkul, P., Kertkeidkachorn, N., Kim, K. S., Noraset, T., ... & Tuarob, S. (2022). Enhancing citation recommendation using citation network embedding. Scientometrics, 127(9), 1–32.
Rodriguez-Prieto, O., Araujo, L., & Martinez-Romo, J. (2019). Discovering related scientific literature beyond semantic similarity: A new co-citation approach. Scientometrics, 120, 105–127.
Rohde, D. L., Gonnerman, L. M., & Plaut, D. C. (2006). An improved model of semantic similarity based on lexical co-occurrence. Communications of the ACM, 8(627–633), 116.
Rubin, R. E., & Rubin, R. G. (2020). Foundations of library and information science. American Library Association.
Sharma, R., Gopalani, D., & Meena, Y. (2023). An anatomization of research paper recommender system: Overview, approaches and challenges. Engineering Applications of Artificial Intelligence, 118, 105641.
Sheng, L., Lyu, D., Ruan, X., Shen, H., & Cheng, Y. (2023). The association between prior knowledge and the disruption of an article. Scientometrics, 128(8), 1–21.
Slyder, J. B., Stein, B. R., Sams, B. S., Walker, D. M., Jacob Beale, B., Feldhaus, J. J., & Copenheaver, C. A. (2011). Citation pattern and lifespan: A comparison of discipline, institution, and individual. Scientometrics, 89(3), 955–966.
Smith, T. B., Vacca, R., Krenz, T., & McCarty, C. (2021). Great minds think alike, or do they often differ? Research topic overlap and the formation of scientific teams. Journal of Informetrics, 15(1), 101104.
Su, W. H., Chen, K. Y., Lu, L. Y., & Huang, Y. C. (2021). Identification of technology diffusion by citation and main paths analysis: The possibility of measuring open innovation. Journal of Open Innovation: Technology, Market, and Complexity, 7(1), 104.
Synnestvedt, M. B., Chen, C., & Holmes, J. H. (2005). CiteSpace II: Visualization and knowledge discovery in bibliographic databases. In AMIA annual symposium proceedings, 2005 (Vol. 2005, p. 724). American Medical Informatics Association.
Tantanasiriwong, S., & Haruechaiyasak, C. (2014, May). Cross-domain citation recommendation based on co-citation selection. In 2014 11th International conference on electrical engineering/electronics, computer, telecommunications and information technology (ECTI-CON), 2014 (pp. 1–4). IEEE.
Tata, S., & Patel, J. M. (2007). Estimating the selectivity of TF–IDF based cosine similarity predicates. ACM Sigmod Record, 36(2), 7–12.
Thor, A., Marx, W., Leydesdorff, L., & Bornmann, L. (2016). Introducing CitedReferencesExplorer (CRExplorer): A program for reference publication year spectroscopy with cited references standardization. Journal of Informetrics, 10(2), 503–515.
West, J. D., Wesley-Smith, I., & Bergstrom, C. T. (2016). A recommendation system based on hierarchical clustering of an article-level citation network. IEEE Transactions on Big Data, 2(2), 113–123.
Wu, X., Gao, C., Zang, L., Han, J., Wang, Z., & Hu, S. (2021). ESimCSE: Enhanced sample building method for contrastive learning of unsupervised sentence embedding. arXiv preprint arXiv:2109.04380
Yang, A. J. (2024). Unveiling the impact and dual innovation of funded research. Journal of Informetrics, 18(1), 101480.
Zhang, J., & Hou, J. (2023). Knowledge diffusion for individual literature from the perspective of Altmetrics: Models, measurement and features. Journal of Information Science. https://doi.org/10.1177/01655515231174387
Zhang, J., & Zhu, L. (2022). Citation recommendation using semantic representation of cited papers’ relations and content. Expert Systems with Applications, 187, 115826.
Zhang, X., Xie, Q., & Song, M. (2021). Measuring the impact of novelty, bibliometric, and academic-network factors on citation count using a neural network. Journal of Informetrics, 15(2), 101140.
Zhou, H., Dong, K., & Xia, Y. (2023). Knowledge inheritance in disciplines: Quantifying the successive and distant reuse of references. Journal of the Association for Information Science and Technology, 74(13), 1515–1531.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
No conflicts of interest exist in the submission of this manuscript, and the manuscript has been approved for publication by all authors. We declare that the work described here is original research that has not been published previously and is not under consideration for publication elsewhere, in whole or in part. All the authors listed have approved the enclosed manuscript.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Cheng, W., Zheng, D., Fu, S. et al. Closer in time and higher correlation: disclosing the relationship between citation similarity and citation interval. Scientometrics 129, 4495–4512 (2024). https://doi.org/10.1007/s11192-024-05080-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-024-05080-6