Abstract
With the exponential increase in the number of published articles, recommending them on the basis of the citation context (also called local or citation-aware citation recommendation) has attracted many researchers in the last few years. Recently, some papers have been devoted to reviewing previous works about scientific paper recommendation. As far as can be discerned, none of the previous review papers has carried out an in-depth study to explain citation context and compare previous studies. This paper presents a comparative analysis of recent studies about context-aware citation recommendation. Moreover, four gaps related to citation context extraction, citation context classification, temporal and structural aspects of a citation context, and benchmarking datasets are identified. This comparative study can assist researchers interested in further exploring these four gaps.
Similar content being viewed by others
Notes
https://scite.ai/
References
Abdullatif, M. (2013). Making the h-index more relevant: A step towards standard classes for citation classification. In IEEE 29th international conference on data engineering workshops (ICDEW), Brisbane, Australia (pp. 330–333).
Abu-Jbara, A., Ezra, J., & Radev, D. (2013). Purpose and polarity of citation: Towards nlp-based bibliometrics. In Proceedings of the North American association for computational linguistics (NAACL-HLT 2013), Atlanta, Georgia, United States (pp. 9–14).
Abu-Jbara, A., & Radev, D. (2012). Reference scope identification in citing sentences. In Conference of the North American chapter of the association for computational linguistics: Human language technologies, Montreal, Canada (pp. 80–90).
Aditya, G., & Jure, L. (2016). node2vec: Scalable feature learning for networks. In KDD (pp. 855–864).
Ali, Z., Kefalas, P., Muhammad, K., Ali, B., & Imran, M. (2020a). Deep learning in citation recommendation models survey. Expert Systems with Applications, 162, 113790. https://doi.org/10.1016/j.eswa.2020.113790
Ali, Z., Qi, G., Kefalas, P., Abro, W. A., & Ali, B. (2020b). A graph-based taxonomy of citation recommendation models. Artificial Intelligence Review, 53, 5217–5260. https://doi.org/10.1007/s10462-020-09819-4
Ali, Z., Qi, G., Muhammad, K., Kefalas, P., & Khusro, S. (2021a). Global citation recommendation employing generative adversarial network. Expert Systems with Applications, 180, 114888. https://doi.org/10.1016/j.eswa.2021.114888
Ali, Z., Ullah, I., Khan, A., Jan, A., & Muhammad, K. (2021b). An overview and evaluation of citation recommendation models. Scientometrics, 126, 4083–4119. https://doi.org/10.1007/s11192-021-03909-y
Athar, A. (2011). Sentiment analysis of citations using sentence structure-based features. In Proceedings of the ACL 2011 student session (pp. 81–87).
Athar, A., & Teufel, S. (2012). Context-enhanced citation sentiment detection. In Proceedings of the conference of the North American chapter of the association for computational linguistics: Human language technologies, Montreal, Canada (pp. 587–601).
Beel, J., Gipp, B., Langer, S., & Breitinger, C. (2016). Research-paper recommender systems: A literature survey. International Journal on Digital Libraries, 17(4), 305–338.
Beltagy, I., Lo, K., & Cohan, A. (2019). SciBERT: A pretrained language model for scientific text. In Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP) (pp. 3615–3620). https://doi.org/10.18653/v1/D19-1371
Bertin, M., Atanassova, I., Gingras, Y., & Larivière, V. (2016). The invariant distribution of references in scientific articles. Journal of the Association for Information Science and Technology, 67(1), 164–177.
Bryan, P., Rami, A., & Steven, S. (2014). Deepwalk: Online learning of social representations. In KDD (pp. 701–710).
Buckley, C., & Voorhees, E. M. (2000). Evaluating evaluation measure stability. In SIGIR (pp. 33–40).
Cai, X., Zheng, Y., Yang, L., Dai, T., & Guo, L. (2019). Bibliographic network representation based personalized citation recommendation. IEEE Access, 7, 457–467.
Chakrabarti, S., Khanna, R., Sawant, U., & Bhattacharyya, C. (2008). Structured learning for non-smooth ranking losses. In KDD (pp. 88–96).
Chen, X., Zhao, H., Zhao, S., Chen, J., & Zhang, Y. (2019). Citation recommendation based on citation tendency. Scientometrics, 121(2), 937–956.
Connor, J. (1982). Citing statements: Computer recognition and use to improve retrieval. Information Processing Management, 18(3), 125–131.
Councill, I., Giles, C., & Kan, M. (2008). Parscit: An open-source crf reference string parsing package. In Proceedings of the sixth international conference on language resources and evaluation (LREC’08), Marrakech, Morocco.
Dai, T., Zhu, L., Wang, Y., & Carley, K. M. (2019). Attentive stacked denoising autoencoder with bi-lstm for personalized context-aware citation recommendation. Transactions on Audio, Speech, and Language Processing, 28, 1–15.
Dalianis, H. (2018). Clinical text mining: Secondary use of electronic patient records, chap. 6. Springer.
David, M., Andrew, Y. N., & Michael, I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022.
Devlin, J., Chang, M., Lee, K., & Toutanova, K. (2019). Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL-HLT’2019, Minneapolis, USA (pp. 4171–4186).
Dong, C., & Schäfer, U. (2011). Ensemble-style self-training on citation classification. In Wong, K.-F. (Eds.) Proceedings of the 5th international joint conference on natural language processing, Chiang Mai, Thailand (pp. 623–631).
Dong, Y., Chawla, N., & Swami, A. (2017). metapath2vec: Scalable representation learning for heterogeneous networks. In Proceedings of the 23rd ACM SIGKDD conference, Halifax, NS, Canada (pp. 135–144).
Ebesu, T., & Fang, Y. (2017). Neural citation network for context-aware citation recommendation. In Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval (pp. 1093–1096).
Färber, F., & Ashwath, S. (2020). Hybridcite: A hybrid model for context-aware citation recommendation. In Proceedings of the ACM/IEEE joint conference on digital libraries, China (pp. 117–126).
Färber, M., & Jatowt, A. (2020). Citation recommendation: Approaches and datasets. International Journal on Digital Libraries, 21, 375–405. https://doi.org/10.1007/s00799-020-00288-2
Färber, M., Thiemann, A., & Jatowt, A. (2018). To cite, or not to cite? Detecting citation contexts in text. In 40th European conference on information retrieval, Grenoble, France.
Fortunato, S., Bergstrom, C., Börner, K., Evans, J., Helbing, D., Milojević, S., Petersen, A., Radicchi, F., Sinatra, R., Uzzi, B., Vespignani, A., Waltman, L., Wang, D., & Barabási, A. (2018). Science of science. Science, 359, 1. https://doi.org/10.1126/science.aao018
Fu, T., Lee, W., & Lei, Z. (2017). Hin2vec: Explore meta-paths in heterogeneous information networks for representation learning. In Proceedings of the 2017 ACM on conference on information and knowledge management, Singapore, Singapore (pp. 1797–1806).
Garfield, E. (1965). Can citation indexing be automated. In Statistical association methods for mechanized documentation, symposium proceedings, 1965 (pp. 189–192). National Bureau of Standards, Miscellaneous Publication 269.
Garfield, E. (1972). Citation analysis as a tool in journal evaluation. Science, 178(4060), 471–479.
Garfield, E. (1979). Is citation analysis a legitimate evaluation tool? Scientometrics, 1(4), 359–375.
Goldberg, Y., & Hirst, G. (2017). Neural network methods in natural language processing. Morgan & Claypool Publishers.
Grover, A., & Leskovec, J. (2016). node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, San Francisco, California, USA (pp. 855–864).
Han, J., Pei, J., & Kamber, M. (2011). Data mining: Concepts and techniques. Elsevier.
Han, J., Song, Y., Zhao, W., Shi, S., & Zhang, H. (2018). hyperdoc2vec: Distributed representations of hypertext documents. In 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia.
Hancock, J., & Khoshgoftaar, T. (2020). Survey on categorical data for neural networks. Journal of Big Data, 7(28), 1–41.
Hartley, J. (2014). Current findings from research on structured abstracts: An update. Journal of the Medical Library Association, 102(3), 146.
Hassan, S., Imran, M., Iqbal, S., Aljohani, N. R., & Nawaz, R. (2018). Deep context of citations using machine-learning models in scholarly full-text articles. Scientometrics, 117, 1645–1662.
He, J., & Chen, C. (2018). Temporal representations of citations for understanding the changing roles of scientific publications. Frontiers in Research Metrics and Analytics, 3, 27.
He, Q., Pei, J., Kifer, D., Mitra, P., & Giles, L. (2010). Context-aware citation recommendation. In Proceedings of the 19th international conference on World wide web (pp. 421–430).
Hernandez-Alvarez, M., & Gomez, J. (2015). Survey about citation context analysis: Tasks, techniques, and resources. Natural Language Engineering, 22(3), 327–349.
Hernandez-Alvarez, M., Gomez Soriano, J., & Martinez-Barco, P. (2017). Citation function, polarity and influence classification. Natural Language Engineering, 23(4), 561–588.
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.
Huang, W., Kataria, S., Caragea, C., Mitra, P., Giles, C. L., & Rokach, L. (2012). Recommending citations: Translating papers into references. In Proceedings of the 21st ACM international conference on information and knowledge management, CIKM ’12 (pp. 1910–1914).
Huang, W., Wu, Z., Liang, C., Mitra, P., & Giles, C. (2015). A neural probabilistic model for context based citation recommendation. In Twenty-Ninth AAAI conference on artificial intelligence.
Jacob, D., Ming-Wei, C., Kenton, L., & Kristina, T. (2019). Bert: Pre-training of deep bidirectional transformers for language understanding. In NAACL-HLT (1) (pp. 4171–4186).
Jebari, C., Cobo, M. J., & Herrera-Viedma, E. (2018). A new approach for implicit citation extraction. In 19th international conference on intelligent data engineering and automated learning, Madrid, Spain (pp. 121–129).
Jebari, C., Herrera-Viedma, E., & Cobo, M. J. (2021). The use of citation context to detect the evolution of research topics: A large-scale analysis. Scientometrics, 126(4), 2971–2989.
Jeong, C., Jang, S., Shin, H., Park, E., & Choi, S. (2020). A context-aware citation recommendation model with bert and graph convolutional networks. Scientometrics, 124, 1907–1922.
Jian, T., Meng, Q., Mingzhe, W., Ming, Z., Jun, Y., & Qiaozhu, M. (2015). Line: Large-scale information network embedding. In WWW (pp. 1067–1077).
Jochim, C., & Schutze, H. (2014). Improving citation polarity classification with product reviews. In Proceedings of 52nd annual meeting of the association for computational linguistics, Baltimore, MD, USA (pp. 42–48).
Joulin, A., Grave, E., Bojanowski, P., & Mikolov, T. (2017). Bag of tricks for efficient text classification. In Proceedings of the 15th conference of the European chapter of the Association for Computational Linguistics: Volume 2, short papers (pp. 427–431). Association for Computational Linguistics.
Kaplan, D., Iida, R., & Tokunaga, T. (2009). Automatic extraction of citation contexts for research paper summarization: A coreference-chain based approach. In Proceedings of the 2009 workshop on text and citation analysis for scholarly digital libraries, Singapore (pp. 88–95).
Kipf, T., & Welling, M. (2017). Semi-supervised classification with graph convolutional networks. In Proceedings of ICLR’2017, Toulon, France.
Kobayashi, Y., Shimbo, M., & Matsumoto, Y. (2018). Citation recommendation using distributed representation of discourse facets in scientific articles. In Proceedings of the 18th ACM/IEEE on joint conference on digital libraries, New York, NY, USA (pp. 243–251).
Le, Q., & Mikolov, T. (2014). Distributed representations of sentences and documents. In International conference on machine learning, Beijing, China (pp. 1188–1196).
Le, Q., & Mikolov, T. (2014). Distributed representations of sentences and documents. In Proceedings of the 31st international conference on machine learning, proceedings of machine learning research (vol. 32, 1188–1196).
LeCun, Y. (1989). Generalization and network design strategies. Tech. rep. University of Toronto.
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. In Proceeding of IEEE.
Lu, Y., He, J., Shan, D., & Yan, H. (2011). Recommending citations with translation model. In Proceedings of the 20th ACM international conference on information and knowledge management (CIKM’11), Glasgow, Scotland (pp. 2017–2020).
Ma, S., Zhang, C., & Liu, X. (2020). A review of citation recommendation: From textual content to enriched context. Scientometrics, 122, 1445–1472.
Małgorzata, S., Antonio, B., Ruben, B., Joao, R., & Chakaveh, S. (2019). Whom to learn from? Graph- vs. text-based word embeddings. In Proceedings of recent advances in natural language processing.
Mandic, D., & Chambers, J. (2001). Recurrent neural networks for prediction: Learning algorithms, architectures and stability. Wiley.
Mikolov, T., Chen, K., Corrado, G., Dean, J. (2013). Efficient estimation of word representations in vector space. In 1st International conference on learning representations. Arizona.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems, 26, 3111–3119.
Moravcsik, M., & Murugesan, P. (1975). Some results on the function and quality of citations. Social Studies of Science, 5, 86–92. https://doi.org/10.1177/030631277500500106
Mu, D., Guo, L., Cai, X., & Hao, F. (2018). Query-focused personalized citation recommendation with mutually reinforced ranking. IEEE Access, 6, 3107–3119.
Nakagawa, T., Inui, K., & Kurohashi, S. (2010). Dependency tree-based sentiment classification using crfs with hidden variables. In 2010 human language technologies conference of theNorth American chapter of the association for computational linguistics, NAACL HLT, Los Angeles, CA, United States (pp. 786–794).
Nanba, T., & Okumura, M. (1999). Towards multi-paper summarization using reference information. In IJCAI ’99: Proceedings of the Six605 teenth international joint conference on artificial intelligence (pp. 926–931).
Nicholson, J. M., Mordaunt, M., Lopez, P., Uppala, A., Rosati, D., Rodrigues, N. P., Grabitz, P., & Rife, S. C. (2021). scite: A smart citation index that displays the context of citations and classifies their intent using deep learning. Quantitative Science Studies, 2(3), 882–898. https://doi.org/10.1162/qss_a_00146
Pedregosa, F. E. A. (2011). Scikit-learn: Machine learning in python. Journal of Machine Learning Research, 12, 2825–2830.
Perozzi, B., Al-Rfou, R., & Skiena, S. (2014). Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, New York, New York, USA (pp. 701–710).
Qazvinian, V., & Radev, D. (2010). Identifying non-explicit citing sentences for citation-based summarization. In Proceedings of 48th annual meeting of the association for computational linguistics, Uppsala, Sweden (pp. 555–564).
Quoc, V. L., & Tomás, M. (2014). Distributed representations of sentences and documents. In ICML (pp. 1188–1196).
Robertson, S., & Walker, S. (1994). Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In SIGIR94 (pp. 232–241).
Rosenblatt, F. (1958). The perceptron: A probalistic model for information storage and organization in the brain. Psychological Review, 65, 386.
Sarker, I. (2021a). Deep learning: A comprehensive overview on techniques, taxonomy, applications and research directions. SN Computer Science, 2, 420.
Sarker, I. H. (2021b). Deep cybersecurity: A comprehensive overview from neural network and deep learning perspective. SN Computer Science, 2(3), 1–16.
Siami-Namini, S., Tavakoli, N., & Namin, A. (2019). lstm and bilstm in forecasting time series. In 2019 IEEE international conference on big data (big data).
Small, H. (2011). Interpreting maps of science using citation context sentiments: A preliminary investigation. Scientometrics, 87, 373–388.
Sondhi, P., & Zhai, C. (2014). A constrained hidden Markov model approach for non- explicit citation context extraction. In Proceedings of the 2014 SIAM international conference on data mining (pp. 361–369).
Soumyajit, G., & Vikram, P. (2017). Paper2vec: Combining graph and text information for scientific paper representation. In Proceedings of the 39th European conference on information retrieval (ECIR’17) (pp. 383–395).
Spiegel-Rosing, I. (1977). Science studies: Bibliometric and content analysis. Social Studies of Science, 7(1), 97–113.
Sugiyama, K., Kumar, T., Kan, M., & Tripathi, R. (2010). Identifying citing sentences in research papers using supervised learning. In Proceedings of the 2010 international conference on information retrieval and knowledge management, Malaysia.
Swales, J. (2004). Research genres: Explorations and applications. Cambridge University Press.
Tang, J., & Zhang, J. (2009). A discriminative approach to topic-based citation recommendation. In The 13th Pacific-Asia conference on advances in knowledge discovery and data mining, PAKDD09 (pp. 572–579).
Teufel, S., Carletta, J., & Moens, M. (1999). An annotation scheme for discourse-level argumentation in research articles. In Henry, S., & Thompson, A. L. (Eds.) Proceedings of the ninth conference on European chapter of the association for computational linguistics, Bergen, Norway (pp. 110–117). https://doi.org/10.3115/977035.977051
Teufel, S., Siddharthan, A., & Tidhar, D. (2006). Automatic classification of citation function. In Proceeding of EMNLP-06.
Thelwall, M. (2019). Should citations be counted separately from each originating section? Journal of Informetrics, 13(2), 658–678. https://doi.org/10.1016/j.joi.2019.03.009
Tomás, M., Kai, C., Greg, C., & Jeffrey, D. (2013). Efficient estimation of word representations in vector space. In ICLR (Workshop Poster).
Valcarce, D., Bellogín, A., Parapar, J., & Castells, P. (2020). Assessing ranking metrics in top-n recommendation. Information Retrieval Journal, 23, 411–448.
Velez-Estevez, A., Perez, I., García-Sánchez, P., Moral-Munoz, J., & Cobo, M. (2023). New trends in bibliometric apis: A comparative analysis. Information Processing & Management, 60(4), 103385. https://doi.org/10.1016/j.ipm.2023.103385
Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., & Manzagol, P. (2010). A stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research, 11, 3371–3408.
Wang, H., Shi, X., & Yeung, D. (2017). Relational deep learning: A deep latent variable model for link prediction. In Thirty-First AAAI conference on artificial intelligence.
Yang, L., Zhang, Z., Cai, X., & Dai, T. (2019). Attention-based personalized encoder–decoder model for local citation recommendation. Computational Intelligence and Neuroscience, 2019, 1–17.
Yang, L., Zheng, Y., Cai, X., Dai, H., Mu, D., Guo, L., & Dai, T. (2018). A lstm based model for personalized context-aware citation recommendation. IEEE Access, 6, 59618–59627.
Yilmaz, E., Kanoulas, E., & Aslam, J. (2008). A simple and efficient sampling method for estimating ap and ndcg. In SIGIR (pp. 603–610).
Yousif, A. (2019). A survey on sentiment analysis of scientific citations. Artificial Intelligence Review, 52, 1805–1838.
Zhang, Y., & Ma, Q. (2020). Citation recommendations considering content and structural context embedding. In Proceeding of the 2020 IEEE international conference on big data and smart computing (BigComp) (pp. 1–7).
Zhigang, H., Chaomei, C., & Zeyuan, L. (2013). Where are citations located in the body of scientific articles? A study of the distributions of citation locations. Journal of Informetrics, 7(4), 887–896. https://doi.org/10.1016/j.joi.2013.08.005
Zoran, M., & Jan, S. (2020). Improved local citation recommendation based on context enhanced with global information. In Proceedings of the first workshop on scholarly document processing (pp. 97–103).
Acknowledgements
This work has been supported by the Spanish State Research Agency through the project PID2019-105381 GA-I00 /AEI/10.13039/501100011033 (iScience).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare they have no conflict of interest.
Research involving human and animal rights
This article does not contain any studies with human participants or animals performed by any of the authors.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Jebari, C., Herrera-Viedma, E. & Cobo, M.J. Context-aware citation recommendation of scientific papers: comparative study, gaps and trends. Scientometrics 128, 4243–4268 (2023). https://doi.org/10.1007/s11192-023-04773-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-023-04773-8