Skip to main content
Log in

Context-aware citation recommendation of scientific papers: comparative study, gaps and trends

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

With the exponential increase in the number of published articles, recommending them on the basis of the citation context (also called local or citation-aware citation recommendation) has attracted many researchers in the last few years. Recently, some papers have been devoted to reviewing previous works about scientific paper recommendation. As far as can be discerned, none of the previous review papers has carried out an in-depth study to explain citation context and compare previous studies. This paper presents a comparative analysis of recent studies about context-aware citation recommendation. Moreover, four gaps related to citation context extraction, citation context classification, temporal and structural aspects of a citation context, and benchmarking datasets are identified. This comparative study can assist researchers interested in further exploring these four gaps.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. https://dblp.org/statistics/index.html.

  2. https://github.com/knmnyn/ParsCit.

  3. https://scite.ai/

References

  • Abdullatif, M. (2013). Making the h-index more relevant: A step towards standard classes for citation classification. In IEEE 29th international conference on data engineering workshops (ICDEW), Brisbane, Australia (pp. 330–333).

  • Abu-Jbara, A., Ezra, J., & Radev, D. (2013). Purpose and polarity of citation: Towards nlp-based bibliometrics. In Proceedings of the North American association for computational linguistics (NAACL-HLT 2013), Atlanta, Georgia, United States (pp. 9–14).

  • Abu-Jbara, A., & Radev, D. (2012). Reference scope identification in citing sentences. In Conference of the North American chapter of the association for computational linguistics: Human language technologies, Montreal, Canada (pp. 80–90).

  • Aditya, G., & Jure, L. (2016). node2vec: Scalable feature learning for networks. In KDD (pp. 855–864).

  • Ali, Z., Kefalas, P., Muhammad, K., Ali, B., & Imran, M. (2020a). Deep learning in citation recommendation models survey. Expert Systems with Applications, 162, 113790. https://doi.org/10.1016/j.eswa.2020.113790

    Article  Google Scholar 

  • Ali, Z., Qi, G., Kefalas, P., Abro, W. A., & Ali, B. (2020b). A graph-based taxonomy of citation recommendation models. Artificial Intelligence Review, 53, 5217–5260. https://doi.org/10.1007/s10462-020-09819-4

    Article  Google Scholar 

  • Ali, Z., Qi, G., Muhammad, K., Kefalas, P., & Khusro, S. (2021a). Global citation recommendation employing generative adversarial network. Expert Systems with Applications, 180, 114888. https://doi.org/10.1016/j.eswa.2021.114888

    Article  Google Scholar 

  • Ali, Z., Ullah, I., Khan, A., Jan, A., & Muhammad, K. (2021b). An overview and evaluation of citation recommendation models. Scientometrics, 126, 4083–4119. https://doi.org/10.1007/s11192-021-03909-y

    Article  Google Scholar 

  • Athar, A. (2011). Sentiment analysis of citations using sentence structure-based features. In Proceedings of the ACL 2011 student session (pp. 81–87).

  • Athar, A., & Teufel, S. (2012). Context-enhanced citation sentiment detection. In Proceedings of the conference of the North American chapter of the association for computational linguistics: Human language technologies, Montreal, Canada (pp. 587–601).

  • Beel, J., Gipp, B., Langer, S., & Breitinger, C. (2016). Research-paper recommender systems: A literature survey. International Journal on Digital Libraries, 17(4), 305–338.

    Article  Google Scholar 

  • Beltagy, I., Lo, K., & Cohan, A. (2019). SciBERT: A pretrained language model for scientific text. In Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP) (pp. 3615–3620). https://doi.org/10.18653/v1/D19-1371

  • Bertin, M., Atanassova, I., Gingras, Y., & Larivière, V. (2016). The invariant distribution of references in scientific articles. Journal of the Association for Information Science and Technology, 67(1), 164–177.

    Article  Google Scholar 

  • Bryan, P., Rami, A., & Steven, S. (2014). Deepwalk: Online learning of social representations. In KDD (pp. 701–710).

  • Buckley, C., & Voorhees, E. M. (2000). Evaluating evaluation measure stability. In SIGIR (pp. 33–40).

  • Cai, X., Zheng, Y., Yang, L., Dai, T., & Guo, L. (2019). Bibliographic network representation based personalized citation recommendation. IEEE Access, 7, 457–467.

    Article  Google Scholar 

  • Chakrabarti, S., Khanna, R., Sawant, U., & Bhattacharyya, C. (2008). Structured learning for non-smooth ranking losses. In KDD (pp. 88–96).

  • Chen, X., Zhao, H., Zhao, S., Chen, J., & Zhang, Y. (2019). Citation recommendation based on citation tendency. Scientometrics, 121(2), 937–956.

    Article  Google Scholar 

  • Connor, J. (1982). Citing statements: Computer recognition and use to improve retrieval. Information Processing Management, 18(3), 125–131.

    Article  Google Scholar 

  • Councill, I., Giles, C., & Kan, M. (2008). Parscit: An open-source crf reference string parsing package. In Proceedings of the sixth international conference on language resources and evaluation (LREC’08), Marrakech, Morocco.

  • Dai, T., Zhu, L., Wang, Y., & Carley, K. M. (2019). Attentive stacked denoising autoencoder with bi-lstm for personalized context-aware citation recommendation. Transactions on Audio, Speech, and Language Processing, 28, 1–15.

    Google Scholar 

  • Dalianis, H. (2018). Clinical text mining: Secondary use of electronic patient records, chap. 6. Springer.

    Book  Google Scholar 

  • David, M., Andrew, Y. N., & Michael, I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022.

    MATH  Google Scholar 

  • Devlin, J., Chang, M., Lee, K., & Toutanova, K. (2019). Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL-HLT’2019, Minneapolis, USA (pp. 4171–4186).

  • Dong, C., & Schäfer, U. (2011). Ensemble-style self-training on citation classification. In Wong, K.-F. (Eds.) Proceedings of the 5th international joint conference on natural language processing, Chiang Mai, Thailand (pp. 623–631).

  • Dong, Y., Chawla, N., & Swami, A. (2017). metapath2vec: Scalable representation learning for heterogeneous networks. In Proceedings of the 23rd ACM SIGKDD conference, Halifax, NS, Canada (pp. 135–144).

  • Ebesu, T., & Fang, Y. (2017). Neural citation network for context-aware citation recommendation. In Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval (pp. 1093–1096).

  • Färber, F., & Ashwath, S. (2020). Hybridcite: A hybrid model for context-aware citation recommendation. In Proceedings of the ACM/IEEE joint conference on digital libraries, China (pp. 117–126).

  • Färber, M., & Jatowt, A. (2020). Citation recommendation: Approaches and datasets. International Journal on Digital Libraries, 21, 375–405. https://doi.org/10.1007/s00799-020-00288-2

    Article  Google Scholar 

  • Färber, M., Thiemann, A., & Jatowt, A. (2018). To cite, or not to cite? Detecting citation contexts in text. In 40th European conference on information retrieval, Grenoble, France.

  • Fortunato, S., Bergstrom, C., Börner, K., Evans, J., Helbing, D., Milojević, S., Petersen, A., Radicchi, F., Sinatra, R., Uzzi, B., Vespignani, A., Waltman, L., Wang, D., & Barabási, A. (2018). Science of science. Science, 359, 1. https://doi.org/10.1126/science.aao018

    Article  Google Scholar 

  • Fu, T., Lee, W., & Lei, Z. (2017). Hin2vec: Explore meta-paths in heterogeneous information networks for representation learning. In Proceedings of the 2017 ACM on conference on information and knowledge management, Singapore, Singapore (pp. 1797–1806).

  • Garfield, E. (1965). Can citation indexing be automated. In Statistical association methods for mechanized documentation, symposium proceedings, 1965 (pp. 189–192). National Bureau of Standards, Miscellaneous Publication 269.

  • Garfield, E. (1972). Citation analysis as a tool in journal evaluation. Science, 178(4060), 471–479.

    Article  Google Scholar 

  • Garfield, E. (1979). Is citation analysis a legitimate evaluation tool? Scientometrics, 1(4), 359–375.

    Article  Google Scholar 

  • Goldberg, Y., & Hirst, G. (2017). Neural network methods in natural language processing. Morgan & Claypool Publishers.

    Book  Google Scholar 

  • Grover, A., & Leskovec, J. (2016). node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, San Francisco, California, USA (pp. 855–864).

  • Han, J., Pei, J., & Kamber, M. (2011). Data mining: Concepts and techniques. Elsevier.

    MATH  Google Scholar 

  • Han, J., Song, Y., Zhao, W., Shi, S., & Zhang, H. (2018). hyperdoc2vec: Distributed representations of hypertext documents. In 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia.

  • Hancock, J., & Khoshgoftaar, T. (2020). Survey on categorical data for neural networks. Journal of Big Data, 7(28), 1–41.

    Google Scholar 

  • Hartley, J. (2014). Current findings from research on structured abstracts: An update. Journal of the Medical Library Association, 102(3), 146.

    Article  Google Scholar 

  • Hassan, S., Imran, M., Iqbal, S., Aljohani, N. R., & Nawaz, R. (2018). Deep context of citations using machine-learning models in scholarly full-text articles. Scientometrics, 117, 1645–1662.

    Article  Google Scholar 

  • He, J., & Chen, C. (2018). Temporal representations of citations for understanding the changing roles of scientific publications. Frontiers in Research Metrics and Analytics, 3, 27.

    Article  Google Scholar 

  • He, Q., Pei, J., Kifer, D., Mitra, P., & Giles, L. (2010). Context-aware citation recommendation. In Proceedings of the 19th international conference on World wide web (pp. 421–430).

  • Hernandez-Alvarez, M., & Gomez, J. (2015). Survey about citation context analysis: Tasks, techniques, and resources. Natural Language Engineering, 22(3), 327–349.

    Article  Google Scholar 

  • Hernandez-Alvarez, M., Gomez Soriano, J., & Martinez-Barco, P. (2017). Citation function, polarity and influence classification. Natural Language Engineering, 23(4), 561–588.

    Article  Google Scholar 

  • Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.

    Article  Google Scholar 

  • Huang, W., Kataria, S., Caragea, C., Mitra, P., Giles, C. L., & Rokach, L. (2012). Recommending citations: Translating papers into references. In Proceedings of the 21st ACM international conference on information and knowledge management, CIKM ’12 (pp. 1910–1914).

  • Huang, W., Wu, Z., Liang, C., Mitra, P., & Giles, C. (2015). A neural probabilistic model for context based citation recommendation. In Twenty-Ninth AAAI conference on artificial intelligence.

  • Jacob, D., Ming-Wei, C., Kenton, L., & Kristina, T. (2019). Bert: Pre-training of deep bidirectional transformers for language understanding. In NAACL-HLT (1) (pp. 4171–4186).

  • Jebari, C., Cobo, M. J., & Herrera-Viedma, E. (2018). A new approach for implicit citation extraction. In 19th international conference on intelligent data engineering and automated learning, Madrid, Spain (pp. 121–129).

  • Jebari, C., Herrera-Viedma, E., & Cobo, M. J. (2021). The use of citation context to detect the evolution of research topics: A large-scale analysis. Scientometrics, 126(4), 2971–2989.

    Article  Google Scholar 

  • Jeong, C., Jang, S., Shin, H., Park, E., & Choi, S. (2020). A context-aware citation recommendation model with bert and graph convolutional networks. Scientometrics, 124, 1907–1922.

    Article  Google Scholar 

  • Jian, T., Meng, Q., Mingzhe, W., Ming, Z., Jun, Y., & Qiaozhu, M. (2015). Line: Large-scale information network embedding. In WWW (pp. 1067–1077).

  • Jochim, C., & Schutze, H. (2014). Improving citation polarity classification with product reviews. In Proceedings of 52nd annual meeting of the association for computational linguistics, Baltimore, MD, USA (pp. 42–48).

  • Joulin, A., Grave, E., Bojanowski, P., & Mikolov, T. (2017). Bag of tricks for efficient text classification. In Proceedings of the 15th conference of the European chapter of the Association for Computational Linguistics: Volume 2, short papers (pp. 427–431). Association for Computational Linguistics.

  • Kaplan, D., Iida, R., & Tokunaga, T. (2009). Automatic extraction of citation contexts for research paper summarization: A coreference-chain based approach. In Proceedings of the 2009 workshop on text and citation analysis for scholarly digital libraries, Singapore (pp. 88–95).

  • Kipf, T., & Welling, M. (2017). Semi-supervised classification with graph convolutional networks. In Proceedings of ICLR’2017, Toulon, France.

  • Kobayashi, Y., Shimbo, M., & Matsumoto, Y. (2018). Citation recommendation using distributed representation of discourse facets in scientific articles. In Proceedings of the 18th ACM/IEEE on joint conference on digital libraries, New York, NY, USA (pp. 243–251).

  • Le, Q., & Mikolov, T. (2014). Distributed representations of sentences and documents. In International conference on machine learning, Beijing, China (pp. 1188–1196).

  • Le, Q., & Mikolov, T. (2014). Distributed representations of sentences and documents. In Proceedings of the 31st international conference on machine learning, proceedings of machine learning research (vol. 32, 1188–1196).

  • LeCun, Y. (1989). Generalization and network design strategies. Tech. rep. University of Toronto.

  • LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. In Proceeding of IEEE.

  • Lu, Y., He, J., Shan, D., & Yan, H. (2011). Recommending citations with translation model. In Proceedings of the 20th ACM international conference on information and knowledge management (CIKM’11), Glasgow, Scotland (pp. 2017–2020).

  • Ma, S., Zhang, C., & Liu, X. (2020). A review of citation recommendation: From textual content to enriched context. Scientometrics, 122, 1445–1472.

    Article  Google Scholar 

  • Małgorzata, S., Antonio, B., Ruben, B., Joao, R., & Chakaveh, S. (2019). Whom to learn from? Graph- vs. text-based word embeddings. In Proceedings of recent advances in natural language processing.

  • Mandic, D., & Chambers, J. (2001). Recurrent neural networks for prediction: Learning algorithms, architectures and stability. Wiley.

    Book  Google Scholar 

  • Mikolov, T., Chen, K., Corrado, G., Dean, J. (2013). Efficient estimation of word representations in vector space. In 1st International conference on learning representations. Arizona.

  • Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems, 26, 3111–3119.

    Google Scholar 

  • Moravcsik, M., & Murugesan, P. (1975). Some results on the function and quality of citations. Social Studies of Science, 5, 86–92. https://doi.org/10.1177/030631277500500106

    Article  Google Scholar 

  • Mu, D., Guo, L., Cai, X., & Hao, F. (2018). Query-focused personalized citation recommendation with mutually reinforced ranking. IEEE Access, 6, 3107–3119.

    Article  Google Scholar 

  • Nakagawa, T., Inui, K., & Kurohashi, S. (2010). Dependency tree-based sentiment classification using crfs with hidden variables. In 2010 human language technologies conference of theNorth American chapter of the association for computational linguistics, NAACL HLT, Los Angeles, CA, United States (pp. 786–794).

  • Nanba, T., & Okumura, M. (1999). Towards multi-paper summarization using reference information. In IJCAI ’99: Proceedings of the Six605 teenth international joint conference on artificial intelligence (pp. 926–931).

  • Nicholson, J. M., Mordaunt, M., Lopez, P., Uppala, A., Rosati, D., Rodrigues, N. P., Grabitz, P., & Rife, S. C. (2021). scite: A smart citation index that displays the context of citations and classifies their intent using deep learning. Quantitative Science Studies, 2(3), 882–898. https://doi.org/10.1162/qss_a_00146

    Article  Google Scholar 

  • Pedregosa, F. E. A. (2011). Scikit-learn: Machine learning in python. Journal of Machine Learning Research, 12, 2825–2830.

    MathSciNet  MATH  Google Scholar 

  • Perozzi, B., Al-Rfou, R., & Skiena, S. (2014). Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, New York, New York, USA (pp. 701–710).

  • Qazvinian, V., & Radev, D. (2010). Identifying non-explicit citing sentences for citation-based summarization. In Proceedings of 48th annual meeting of the association for computational linguistics, Uppsala, Sweden (pp. 555–564).

  • Quoc, V. L., & Tomás, M. (2014). Distributed representations of sentences and documents. In ICML (pp. 1188–1196).

  • Robertson, S., & Walker, S. (1994). Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In SIGIR94 (pp. 232–241).

  • Rosenblatt, F. (1958). The perceptron: A probalistic model for information storage and organization in the brain. Psychological Review, 65, 386.

    Article  Google Scholar 

  • Sarker, I. (2021a). Deep learning: A comprehensive overview on techniques, taxonomy, applications and research directions. SN Computer Science, 2, 420.

    Article  Google Scholar 

  • Sarker, I. H. (2021b). Deep cybersecurity: A comprehensive overview from neural network and deep learning perspective. SN Computer Science, 2(3), 1–16.

    Article  MathSciNet  Google Scholar 

  • Siami-Namini, S., Tavakoli, N., & Namin, A. (2019). lstm and bilstm in forecasting time series. In 2019 IEEE international conference on big data (big data).

  • Small, H. (2011). Interpreting maps of science using citation context sentiments: A preliminary investigation. Scientometrics, 87, 373–388.

    Article  Google Scholar 

  • Sondhi, P., & Zhai, C. (2014). A constrained hidden Markov model approach for non- explicit citation context extraction. In Proceedings of the 2014 SIAM international conference on data mining (pp. 361–369).

  • Soumyajit, G., & Vikram, P. (2017). Paper2vec: Combining graph and text information for scientific paper representation. In Proceedings of the 39th European conference on information retrieval (ECIR’17) (pp. 383–395).

  • Spiegel-Rosing, I. (1977). Science studies: Bibliometric and content analysis. Social Studies of Science, 7(1), 97–113.

    Article  Google Scholar 

  • Sugiyama, K., Kumar, T., Kan, M., & Tripathi, R. (2010). Identifying citing sentences in research papers using supervised learning. In Proceedings of the 2010 international conference on information retrieval and knowledge management, Malaysia.

  • Swales, J. (2004). Research genres: Explorations and applications. Cambridge University Press.

    Book  Google Scholar 

  • Tang, J., & Zhang, J. (2009). A discriminative approach to topic-based citation recommendation. In The 13th Pacific-Asia conference on advances in knowledge discovery and data mining, PAKDD09 (pp. 572–579).

  • Teufel, S., Carletta, J., & Moens, M. (1999). An annotation scheme for discourse-level argumentation in research articles. In Henry, S., & Thompson, A. L. (Eds.) Proceedings of the ninth conference on European chapter of the association for computational linguistics, Bergen, Norway (pp. 110–117). https://doi.org/10.3115/977035.977051

  • Teufel, S., Siddharthan, A., & Tidhar, D. (2006). Automatic classification of citation function. In Proceeding of EMNLP-06.

  • Thelwall, M. (2019). Should citations be counted separately from each originating section? Journal of Informetrics, 13(2), 658–678. https://doi.org/10.1016/j.joi.2019.03.009

    Article  Google Scholar 

  • Tomás, M., Kai, C., Greg, C., & Jeffrey, D. (2013). Efficient estimation of word representations in vector space. In ICLR (Workshop Poster).

  • Valcarce, D., Bellogín, A., Parapar, J., & Castells, P. (2020). Assessing ranking metrics in top-n recommendation. Information Retrieval Journal, 23, 411–448.

    Article  Google Scholar 

  • Velez-Estevez, A., Perez, I., García-Sánchez, P., Moral-Munoz, J., & Cobo, M. (2023). New trends in bibliometric apis: A comparative analysis. Information Processing & Management, 60(4), 103385. https://doi.org/10.1016/j.ipm.2023.103385

    Article  Google Scholar 

  • Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., & Manzagol, P. (2010). A stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research, 11, 3371–3408.

    MathSciNet  MATH  Google Scholar 

  • Wang, H., Shi, X., & Yeung, D. (2017). Relational deep learning: A deep latent variable model for link prediction. In Thirty-First AAAI conference on artificial intelligence.

  • Yang, L., Zhang, Z., Cai, X., & Dai, T. (2019). Attention-based personalized encoder–decoder model for local citation recommendation. Computational Intelligence and Neuroscience, 2019, 1–17.

    Google Scholar 

  • Yang, L., Zheng, Y., Cai, X., Dai, H., Mu, D., Guo, L., & Dai, T. (2018). A lstm based model for personalized context-aware citation recommendation. IEEE Access, 6, 59618–59627.

    Article  Google Scholar 

  • Yilmaz, E., Kanoulas, E., & Aslam, J. (2008). A simple and efficient sampling method for estimating ap and ndcg. In SIGIR (pp. 603–610).

  • Yousif, A. (2019). A survey on sentiment analysis of scientific citations. Artificial Intelligence Review, 52, 1805–1838.

    Article  Google Scholar 

  • Zhang, Y., & Ma, Q. (2020). Citation recommendations considering content and structural context embedding. In Proceeding of the 2020 IEEE international conference on big data and smart computing (BigComp) (pp. 1–7).

  • Zhigang, H., Chaomei, C., & Zeyuan, L. (2013). Where are citations located in the body of scientific articles? A study of the distributions of citation locations. Journal of Informetrics, 7(4), 887–896. https://doi.org/10.1016/j.joi.2013.08.005

    Article  Google Scholar 

  • Zoran, M., & Jan, S. (2020). Improved local citation recommendation based on context enhanced with global information. In Proceedings of the first workshop on scholarly document processing (pp. 97–103).

Download references

Acknowledgements

This work has been supported by the Spanish State Research Agency through the project PID2019-105381 GA-I00 /AEI/10.13039/501100011033 (iScience).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chaker Jebari.

Ethics declarations

Conflict of interest

The authors declare they have no conflict of interest.

Research involving human and animal rights

This article does not contain any studies with human participants or animals performed by any of the authors.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jebari, C., Herrera-Viedma, E. & Cobo, M.J. Context-aware citation recommendation of scientific papers: comparative study, gaps and trends. Scientometrics 128, 4243–4268 (2023). https://doi.org/10.1007/s11192-023-04773-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-023-04773-8

Keywords

Navigation