Abstract
This study outlines the intellectual structure of Library and Information Science in terms of the venues with RefCit2vec, an embedding method inspired by word2vec. The reference lists or cited-by lists of 62,077 articles in 35 venues (journals and proceedings) between 1928 and 2022 are converted into real number vectors by four independent models of RefCit2vec. The document similarities measured by the two models of RefCit2vec exhibit moderate correlations with bibliographical coupling metrics. In contrast, the similarities from the other two models moderately or strongly correlate with co-citation metrics. Each venue is represented by its centroid, the average vector of its constituent documents. By applying hierarchical agglomerative clustering on the venue centroids, 69% of venues robustly emerge in 6 out of 8 clusters. Four clusters consistently form the library-related branch. The bibliometrics/scientometrics branch contains only 1 cluster, whereas the information-related branch contains 3 clusters. 43% of venues are in six subgroups of consistent tree structures. An article is defined as SCIM-alike for it is closer to the SCIM centroid than half of SCIM articles are. 10% of JASIST articles are SCIM-alike upon their reference lists, and 5% of JASIST articles are SCIM-alike in terms of their cited-by lists. The percentage of SCIM-alike articles in JASIST hiked above the average between 2008 and 2018 but has dropped below the average since 2019. As we demonstrate the dynamics in LIS, citation embedding methods like RefCit2vec can incorporate citation-based, text-based, or authorship features to contribute to varied scenarios in investigating or exploring research fronts and scientific knowledge transfer.













Similar content being viewed by others
Data availability
Data are retrieved via Elsevier Research Products APIs by Elsevier B.V.
References
Ahlgren, P., & Colliander, C. (2009). Document–document similarity approaches and science mapping: Experimental comparison of five approaches. Journal of Informetrics, 3(1), 49–63. https://doi.org/10.1016/j.joi.2008.11.003
Ali, Z., Qi, G., Muhammad, K., Khalil, A., Ullah, I., & Khan, A. (2021). Global citation recommendation employing multi-view heterogeneous network embedding. In 2021 55th Annual Conference on Information Sciences and Systems (CISS), (pp. 1–6). https://doi.org/10.1109/ciss50987.2021.9400311
Ali, Z., Ullah, I., Khan, A., Ullah Jan, A., & Muhammad, K. (2021b). An overview and evaluation of citation recommendation models. Scientometrics, 126(5), 4083–4119. https://doi.org/10.1007/s11192-021-03909-y
Åström, F. (2007). Changes in the LIS research front: Time-sliced cocitation analyses of LIS journal articles, 1990–2004. Journal of the American Society for Information Science and Technology, 58(7), 947–957. https://doi.org/10.1002/asi.20567
Barkan, O., & Koenigstein, N. (2016). ITEM2VEC: Neural item embedding for collaborative filtering. In IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP), (pp. 1–6). https://doi.org/10.1109/MLSP.2016.7738886
Berger, M., McDonough, K., & Seversky, L. M. (2017). Cite2vec: Citation-driven document exploration via word embeddings. IEEE Transactions on Visualization and Computer Graphics, 23(1), 691–700. https://doi.org/10.1109/TVCG.2016.2598667
Chen, T., Li, G., Deng, Q., & Wang, X. (2021). Using network embedding to obtain a richer and more stable network layout for a large scale bibliometric network. Journal of Data and Information Science, 6(1), 154–177. https://doi.org/10.2478/jdis-2021-0006
Choi, J., & Yoon, J. (2022). Measuring knowledge exploration distance at the patent level: Application of network embedding and citation analysis. Journal of Informetrics. https://doi.org/10.1016/j.joi.2022.101286
Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6), 391–407. https://doi.org/10.1002/(SICI)1097-4571(199009)41:6%3c391::AID-ASI1%3e3.0.CO;2-9
Egghe, L., & Rousseau, R. (1990). Introduction to informetrics: Quantitative methods in library, documentation and information science. Elsevier Science Publishers.
Egghe, L., & Rousseau, R. (2002). Co-citation, bibliographic coupling and a characterization of lattice citation networks. Scientometrics, 55(3), 349–361. https://doi.org/10.1023/A:1020458612014
Ganguly, S., & Pudi, V. (2017). Paper2vec: Combining graph and text information for scientific paper representation. In 39th European conference on IR Research, Aberdeen, UK.
Glänzel, W. (2015). Bibliometrics-aided retrieval: where information retrieval meets scientometrics. Scientometrics, 102(3), 2215–2222. https://doi.org/10.1007/s11192-014-1480-7
Good, B. H., De Montjoye, Y.-A., & Clauset, A. (2010). Performance of modularity maximization in practical contexts. Physical Review E, 81(4), 046106. https://doi.org/10.1103/PhysRevE.81.046106
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning http://www.deeplearningbook.org
Grohe, M. (2020, June). word2vec, node2vec, graph2vec, x2vec: Towards a theory of vector embeddings of structured data. In Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (pp. 1-16).
Grover, A., & Leskovec, J. (2016, Aug). node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, (pp. 855–864).
Hamilton, W., Ying, Z., & Leskovec, J. (2017). Inductive representation learning on large graphs. Advances in Neural Information Processing Systems, 30.
He, J., & Chen, C. (2017). Understanding the changing roles of scientific publications via citation embeddings. In Proceedings of the second workshop on mining scientific papers: computational linguistics and bibliometrics (CLBib-2017), Wuhan, China.
Kobayashi, Y., Shimbo, M., & Matsumoto, Y. (2018). Citation recommendation using distributed representation of discourse facets in scientific articles. In Proceedings of the 18th ACM/IEEE on joint conference on digital libraries, (pp. 243–251).
Leydesdorff, L., Bornmann, L., Marx, W., & Milojević, S. (2014). Referenced Publication Years Spectroscopy applied to iMetrics: Scientometrics, Journal of Informetrics, and a relevant subset of JASIST. Journal of Informetrics, 8(1), 162–174. https://doi.org/10.1016/j.joi.2013.11.006
Leydesdorff, L., & Cozzens, S. (1993). The delineation of specialties in terms of journals using the dynamic journal set of the SCI. Scientometrics, 26(1), 135–156.
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013a). Efficient estimation of word representations in vector space. Proceeding of the International Conference on Learning Representations Workshop. https://doi.org/10.48550/arXiv.1301.3781
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013b). Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems, 26, 3111–3119.
Mikolov, T., Yih, W. T., & Zweig, G. (2013). Linguistic regularities in continuous space word representations. In Proceedings of the 2013 conference of the north american chapter of the association for computational linguistics: Human language technologies, (pp. 746–751). https://www.aclweb.org/anthology/N13-1090/
Milojević, S., & Leydesdorff, L. (2013). Information metrics (iMetrics): A research specialty with a socio-cognitive identity? Scientometrics, 95(1), 141–157. https://doi.org/10.1007/s11192-012-0861-z
Milojević, S., Sugimoto, C. R., Yan, E., & Ding, Y. (2011). The cognitive structure of Library and Information Science: Analysis of article title words. Journal of the American Society for Information Science and Technology, 62(10), 1933–1953. https://doi.org/10.1002/asi.21602
Pan, V. Y., & Chen, Z. Q. (1999). The complexity of the matrix eigenproblem. In Proceedings of the thirty-first annual ACM symposium on Theory of computing, (pp. 507–516). https://doi.org/10.1145/301250.301389
Pennington, J., Socher, R., & Manning, C. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing, (pp. 1532–1543). https://www.aclweb.org/anthology/D14-1162/
Perozzi, B., Al-Rfou, R., & Skiena, S. (2014). DeepWalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, (pp. 701–710).
Pornprasit, C., Liu, X., Kiattipadungkul, P., Kertkeidkachorn, N., Kim, K.-S., Noraset, T., Hassan, S.-U., & Tuarob, S. (2022). Enhancing citation recommendation using citation network embedding. Scientometrics, 127(1), 233–264. https://doi.org/10.1007/s11192-021-04196-3
Russell, S. J., & Norvig, P. (2010). Artificial intelligence: A modern approach (3rd ed.). Pearson Education, Inc.
Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J., & Mei, Q. (2015). LINE: Large-scale information network embedding. Proceedings of the 24th International Conference on World Wide Web, pp. 1067–1077. https://doi.org/10.1145/2736277.2741093
Tian, H., & Zhuo, H. H. (2017). Paper2vec: Citation-context based document distributed representation for scholar recommendation. arXiv preprint. https://arxiv.org/abs/1703.06587
Wang, W., Xia, F., Wu, J., Gong, Z., Tong, H., & Davison, B. D. (2021). Scholar2vec: Vector representation of scholars for lifetime collaborator prediction. ACM Transactions on Knowledge Discovery from Data, 15(3), 1–19. https://doi.org/10.1145/3442199
White, H. D., & McCain, K. W. (1998). Visualizing a discipline: An author co-citation analysis of information science, 1972–1995. Journal of the American Society for Information Science, 49(4), 327–355. https://doi.org/10.1002/(SICI)1097-4571(19980401)49:4%3c327::AID-ASI4%3e3.0.CO;2-4
Xu, J., Shen, S., Li, D., & Fu, Y. (2018). A network-embedding based method for author disambiguation. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, (pp. 1735–1738).
Young, F. W., & Hamer, R. M. (1987). Multidimensional scaling: History, theory, and applications. Lawrence Erlbaum Associates, Inc.
Zhang, Y., & Ma, Q. (2020). DocCit2Vec: Citation recommendation via embedding of content and structural contexts. IEEE Access, 8, 115865–115875. https://doi.org/10.1109/access.2020.3004599
Funding
This work was partially supported by National Science and Technology Council of the Republic of China (Grant No. MOST 110-2410-H-002-232-MY2).
Author information
Authors and Affiliations
Contributions
All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Kuang-hua Chen and Chien-chih Huang. The first draft of the manuscript was written by Chien-chih Huang and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors have no conflicts of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Huang, Cc., Chen, Kh. RefCit2vec: embedding models considering references and citations for measuring document similarity. Scientometrics 129, 4669–4693 (2024). https://doi.org/10.1007/s11192-024-05067-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-024-05067-3