Skip to main content
Log in

Towards employing native information in citation function classification

  • Published:
Scientometrics Aims and scope Submit manuscript

A Correction to this article was published on 18 July 2022

This article has been updated

Abstract

Citations play a fundamental role in supporting authors’ contribution claims throughout a scientific paper. Labelling citation instances with different function labels is indispensable for understanding a scientific text. A single citation is the linkage between two scientific papers in the citation network. These citations encompass rich native information, including context of the citation, citation location, citing and cited paper titles, DOI, and the website’s URL. Nevertheless, previous studies have ignored such rich native information during the process of datasets’ accumulation, thereby resulting in a lack of comprehensive yet significantly valuable features for the citation function classification task. In this paper, we argue that such important information should not be ignored, and accordingly, we extract and integrate all of the native information features into different neural text representation models via trainable embeddings and free text. We first construct a new dataset entitled, NI-Cite, comprising a large number of labelled citations with five key native features (Citation Context, Section Name, Title, DOI, Web URL) against each dataset instance. In addition, we propose to exploit the recently developed text representation models integrated with such information to evaluate the performance of citation function classification task. The experimental results demonstrate that the native information features suggested in this paper enhance the overall classification performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Change history

Notes

  1. https://github.com/young1010/nativeinformation.

  2. https://link.springer.com.

  3. https://ieeexplore.ieee.org.

  4. https://github.com/allenai/science-parse.

  5. https://www.semanticscholar.org/.

  6. https://dblp.uni-trier.de/.

  7. https://pubmed.ncbi.nlm.nih.gov/.

  8. https://pubmed.ncbi.nlm.nih.gov/.

  9. https://www.semanticscholar.org/.

References

  • Abu-Jbara, A., & Radev, D. (2012). Reference scope identification in citing sentences. In Proceedings of the 2012 conference of the North American Chapter of the Association for Computational Linguistics: Human language technologies (pp. 80–90).

  • Agarwal, S., Choubey, L., & Yu, H. (2010). Automatically classifying the role of citations in biomedical articles. In Proceedings of American Medical Informatics Association fall symposium (pp. 11–15).

  • Alikaniotis, D., Yannakoudakis, H., & Rei, M. (2016). Automatic text scoring using neural networks. In Proceedings of the 54th annual meeting of the Association for Computational Linguistics (pp. 715–725).

  • Beltagy, I., Lo, K., & Cohan, A. (2019). SciBERT: A pretrained language model for scientific text. Retrieved from arXiv:1903.10676

  • Bertin, M., Atanassova, I., Gingras, Y., & Lariviere, V. (2016). The invariant distribution of references in scientific articles. Journal of the American Society for Information Science and Technology, 67(1), 164–177.

    Google Scholar 

  • Bornmann, L., & Daniel, H. (2008). What do citation counts measure? A review of studies on citing behavior. Journal of Documentation, 64(1), 45–80.

    Article  Google Scholar 

  • Cohan, A., Ammar, W., van Zuylen, M., & Cady, F. (2019). Structural scaffolds for citation intent classification in scientific publications. In Proceedings of 2019 conference of the North American Chapter of the Association for Computational Linguistics (pp. 3586–3596).

  • Cohan, A., & Goharian, N. (2018). Scientific document summarization via citation contextualization and scientific discourse. International Journal on Digital Libraries, 19(2), 287–303.

    Article  Google Scholar 

  • Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805

  • Dong, C., & Schafer, U. (2011). Ensemble-style self-training on citation classification. In Proceedings of the 5th international joint conference on natural language processing (pp. 623–631).

  • Färber, M., & Jatowt, A. (2020). Citation recommendation: Approaches and datasets. International Journal on Digital Libraries, 21(1), 375–405.

    Article  Google Scholar 

  • Garfield, E. (1965). Can citation indexing be automated? In M. E. Stevens, V. E. Giuliano, & L. B. Heilprin (Eds.), Statistical association methods for mechanical documentation. National Bureau of Standards.

    Google Scholar 

  • Garzone, M., & Mercer, R. E. (2000). Towards an automated citation classifier. In Proceedings the conference of the Canadian society for computational studies of intelligence (pp. 337–346). Springer.

  • Hassan, S., Akram, A., & Haddawy, P. (2017). Identifying important citations using contextual information from full text. In Proceedings of 2017 ACM/IEEE joint conference on digital libraries (pp. 1–8).

  • Hernández-Alvarez, M., & Gomez, M. J. (2016). Survey about citation context analysis: Tasks, techniques, and resources. Natural Language Engineering, 22(3), 327–349.

    Article  Google Scholar 

  • Jochim, C., & Schiitz, H. (2012). Towards a generic and flexible citation classifier based on a faceted classification scheme. In Proceedings of the 2012 international conference on computational linguistics (pp. 1343–1358).

  • Johnson, J. M., & Khoshgoftaar, T. M. (2019). Survey on deep learning with class imbalance. Journal of Big Data, 6(27), 1–54. https://doi.org/10.1186/s40537-019-0192-5.

    Article  Google Scholar 

  • Joshi, M., Chen, D., Liu, Y., Weld, D. S., Zettlemoyer, L., & Levy, O. (2020). Spanbert: Improving pre-training by representing and predicting spans. Transactions of the Association for Computational Linguistics, 8, 64–77.

    Article  Google Scholar 

  • Jurgens, D., Kumar, S., Hoover, R., McFarland, D., & Jurafsky, D. (2018). Measuring the evolution of a scientific field through citation frame. Transactions of the Association for Computational Linguistics, 6, 391–406.

    Article  Google Scholar 

  • Kim, Y. (2014). Convolutional neural networks for sentence classification. In Proceedings of the 2014 conference on empirical methods in natural language processing (pp. 1746–1751).

  • Lai, S., Xu, L., Liu, K., & Zhao, J. (2015). Recurrent convolutional neural networks for text classification. In Proceedings of twenty-ninth AAAI conference on artificial intelligence (pp. 2267–2273).

  • Lauscher, A., Ko, B., Kuehl, B., Johnson, S., Jurgens, D., Cohan, A., & Lo, K. (2021). MultiCite: Modeling realistic citations requires moving beyond the single-sentence single-label setting. arXiv preprint arXiv:2107.00414

  • Moed, H. F. (2006). Citation analysis in research evaluation (Vol. 9). Springer.

    Google Scholar 

  • Moravcsik, M. J., & Murugesan, P. (1975). Some results of the function and quality of citations. Social Studies of Science, 5(1), 86–92.

    Article  Google Scholar 

  • Narin, F. (1976). Evaluative bibliometrics: The use of publication and citation analysis in the evaluation of scientific activity (pp. 334–337). Computer Horizons.

    Google Scholar 

  • Pennington, J., Socher, R., & Manning, C. D. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (pp. 1532–1543).

  • Pires, T., Schlinger, E., & Garrette, D. (2019). How multilingual is multilingual BERT? In Proceedings of the 57th annual meeting of the Association for Computational Linguistics (pp. 4996–5001).

  • Pride, D., & Knoth, P. (2017). Incidental or influential?—Challenges in automatically detecting citation importance using publication full texts. In Research and advanced technology for digital libraries (pp. 572–578). https://doi.org/10.1007/978-3-319-67008-9_48

  • Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., & Matena, M., Zhou, Y., Li, W., & Liu, P. J. (2019). Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv:1910.10683

  • Roman, M., Shahid, A., Khan, S., Koubaa, A., & Yu, L. (2021). Citation intent classification using word embedding. IEEE Access, 9, 9982–9995.

    Article  Google Scholar 

  • Safder, I., Hassan, S. U., Visvizi, A., Noraset, T., Nawaz, R., & Tuarob, S. (2020). Deep learning-based extraction of algorithmic metadata in full-text scholarly documents. Information Processing & Management, 57, 102269.

    Article  Google Scholar 

  • Smith, L. C. (1981). Citation analysis. Library Trends, 30(1), 83–106.

    Google Scholar 

  • Taylor, W. L. (1953). Cloze procedure: A new tool for measuring readability. Journalism Quarterly, 30, 415–433.

    Article  Google Scholar 

  • Teufel, S., Siddharthan, A., & Tidhar, D. (2006). An annotation scheme for citation function. In Proceedings of the 7th SIGdial workshop on discourse and dialogue (pp. 80–87).

  • Teufel, S., Siddharthan, A., & Tidhar, D. (2019). Automatic classification of citation function. In Proceedings of 2006 conference on empirical methods in natural language processing (pp. 103–110).

  • Tuarob, S., Kang, S. W., Wettayakorn, P., Pornprasit, C., Sachati, T., Hassan, S. U., & Haddawy, P. (2019). Automatic classification of algorithm citation functions in scientific literature. IEEE Transactions on Knowledge and Data Engineering, 32(10), 1881–1896.

    Article  Google Scholar 

  • Tuarob, S., Mitra, P., & Giles, C. L. (2013). A classification scheme for algorithm citation function in scholarly works. In Proceedings of the 13th ACM/IEEE-CS joint conference on digital libraries (pp. 367–368).

  • Tuarob, S., Mitra, P., & Giles, L. C. (2015). A hybrid approach to discover semantic hierarchical sections in scholarly documents. In Proceedings of the 13th international conference on document analysis and recognition (pp. 1081–1085).

  • Valenzuela, M., Ha, V., & Etzioni, O. (2015). Identifying meaningful citations. In Proceedings of AAAI workshop: Scholarly big data (pp. 13–18).

  • Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention is all you need. In Proceedings of the 31st international conference advances in neural information processing systems (pp. 5998–6008).

  • Wang, Y., Johnson, M., Wan, S., Sun, Y., & Wang, W. (2019). How to best use syntax in semantic role labelling. In Proceedings of the 57th annual meeting of the Association for Computational Linguistics (pp. 5338–5343).

  • Weinstock, M. (1971). Citation indexes. In M. Drake (Ed.), Encyclopedia of library and information science (Vol. 5). Dekker.

    Google Scholar 

  • Yan, J. (2009). Text representation. In L. Liu & M. T. Özsu (Eds.), Encyclopedia of database systems (pp. 3069–3072). Springer.

    Chapter  Google Scholar 

  • Yousif, A., Niu, Z., Tarus, J. K., & Ahmad, A. (2019). A survey on sentiment analysis of scientific citations. Artificial Intelligence Review, 52(1), 1805–1838. https://doi.org/10.1007/s10462-017-9597-8.

    Article  Google Scholar 

  • Zhang, Y., Wang, Y., Sheng, Q. Z., Mahmood, A., Emma Zhang, W., & Zhao, R. (2021). TDM-CFC: Towards document-level multi-label citation function classification. In Proceedings of international conference on web information systems engineering (pp. 363–376).

  • Zhao, H., Luo, Z., Feng, C., Zheng, A., & Liu, X. (2019). A context-based framework for modeling the role and function of on-line resource citations in scientific literature. In Proceedings of 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (pp. 5209–5218).

  • Zhu, X., Turney, P., Lemire, D., & Vellino, A. (2014). Measuring academic influence. Journal of the Association for Information Science and Technology, 66, 408–427.

    Article  Google Scholar 

Download references

Acknowledgements

This research is funded by Australian Research Council (ARC) Discovery Project DP200102298 and the National Social Science Fund of China (No. 18ZDA325).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rongying Zhao.

Additional information

The original online version of this article was revised: In the original version the first affiliation was incorrectly linked to the author name, Adnan Mahmood.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, Y., Zhao, R., Wang, Y. et al. Towards employing native information in citation function classification. Scientometrics 127, 6557–6577 (2022). https://doi.org/10.1007/s11192-021-04242-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-021-04242-0

Keywords

Navigation