Abstract
Citations play a significant role in the evaluation of scientific literature and researchers. Citation intent analysis is essential for academic literature understanding. Meanwhile, it is useful for enriching semantic information representation for the citation intent classification task because of the rapid growth of publicly accessible full-text literature. However, some useful information that is readily available in citation context and facilitates citation intent analysis has not been fully explored. Furthermore, some deep learning models may not be able to learn relevant features effectively due to insufficient training samples of citation intent analysis tasks. Multi-task learning aims to exploit useful information between multiple tasks to help improve learning performance and exhibits promising results on many natural language processing tasks. In this paper, we propose a joint semantic representation model, which consists of pretrained language models and heterogeneous features of citation intent texts. Considering the correlation between citation intents, citation section and citation worthiness classification tasks, we build a multi-task citation classification framework with soft parameter sharing constraint and construct independent models for multiple tasks to improve the performance of citation intent classification. The experimental results demonstrate that the heterogeneous features and the multi-task framework with soft parameter sharing constraint proposed in this paper enhance the overall citation intent classification performance.
Similar content being viewed by others
References
Beltagy, I., Lo, K., & Cohan, A. (2019). SciBERT: A pretrained language model for scientific text. Preprint at http://arXiv.org/arXiv:1903.10676
Cohan, A., Ammar, W., Van Zuylen, M., & Cady, F. (2019). Structural scaffolds for citation intent classification in scientific publications. Preprint at http://arXiv.org/arXiv:1904.01608
de Andrade, C. M. V., & Gonçalves, M. A. (2020). Combining representations for effective citation classification. In Proceedings of the 8th International Workshop on Mining Scientific Publications: 54–58.
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. Preprint at http://arXiv.org/arXiv:1810.04805
Dong, C., Schäfer, U.(2011). Ensemble-style self-training on citation classification, Proceedings of the 5th International Joint Conference on Natural Language Processing. 623–631.
Garfield, E. (1972). Citation analysis as a tool in journal evaluation: Journals can be ranked by frequency and impact of citations for science policy studies. Science, 178(4060), 471–479.
Hassan, N. R., & Serenko, A. (2019). Patterns of citations for the growth of knowledge: A Foucauldian perspective. Journal of Documentation., 75(3), 593–611.
Hassan, S. U., Imran, M., Iqbal, S., Aljohani, N. R., & Nawaz, R. (2018). Deep context of citations using machine-learning models in scholarly full-text articles. Scientometrics, 117(3), 1645–1662.
Hu, T., Li, J., Fukumoto, F., & Zhou, R. (2022). A multi-task based Bilateral-Branch Network for imbalanced citation intent classification. In 2022 16th International Conference on Ubiquitous Information Management and Communication. 1–8.
Jiang, X., & Chen, J. (2023). Contextualised segment-wise citation function classification. Scientometrics, 1–42.
Jochim, C., & Schütze, H. (2012). Towards a generic and flexible citation classifier based on a faceted classification scheme. In Proceedings of COLING. 1343-1358
Jurgens, D., Kumar, S., Hoover, R., McFarland, D., & Jurafsky, D. (2018). Measuring the evolution of a scientific field through citation frames. Transactions of the Association for Computational Linguistics, 6, 391–406.
Lauscher, A., Ko, B., Kuehl, B., Johnson, S., Jurgens, D., Cohan, A., & Lo, K. (2021). MultiCite: Modeling realistic citations requires moving beyond the single-sentence single-label setting. Preprint at http://arXiv.org/arXiv-2107
Lyu, D., Ruan, X., Xie, J., & Cheng, Y. (2021). The classification of citing motivations: A meta-synthesis. Scientometrics, 126(4), 3243–3264.
Maheshwari, H., Singh, B., & Varma, V. (2021). Scibert sentence representation for citation context classification. In Proceedings of the Second Workshop on Scholarly Document Processing. 130–133.
Oesterling, A., Ghosal, A., Yu, H., Xin, R., Baig, Y., Semenova, L., & Rudin, C. (2021). Multitask learning for citation purpose classification. Preprint at http://arXiv.org/arXiv:2106.13275
Paice, C. D. (1990). Constructing literature abstracts by computer: Techniques and prospects. Information Processing & Management, 26(1), 171–186.
Prester, J., Wagner, G., Schryen, G., & Hassan, N. R. (2021). Classifying the ideational impact of information systems review articles: A content-enriched deep learning approach. Decision Support Systems, 140, 113432.
Pride, D., Knoth, P., & Harag, J. (2019). ACT: an annotation platform for citation typing at scale. In ACM/IEEE Joint Conference on Digital Libraries. 329–330.
Qayyum, F., & Afzal, M. T. (2019). Identification of important citations by exploiting research articles’ metadata and cue-terms from content. Scientometrics, 118(1), 21–43.
Qi, R. H., Wei, J., Shao Z., Guo X., Chen H. (2022b). Domain Sentiment Lexicon Representation Learning Based on Multi-source Knowledge Fusion. In Proceedings of the 21st Chinese National Conference on Computational Linguistics, 684–693. https://aclanthology.org/2022.ccl-1.61/
Qi, R. H., Yang, M. X., Jian, Y., Li, Z. G., & Chen, H. (2022a). A Local context focus learning model for joint multi-task using syntactic dependency relative distance. Applied Intelligence. https://doi.org/10.1007/s10489-022-03684-0
Roman, M., Shahid, A., Khan, S., Koubaa, A., & Yu, L. (2021). Citation intent classification using word embedding. IEEE Access, 9, 9982–9995.
Ruder, S. (2017). An Overview of Multi-Task Learning in Deep Neural Networks. Preprint at http://arXiv.org/arXiv1706.05098
Su, X., Prasad, A., Kan, M. Y., & Sugiyama, K. (2019). Neural multi-task learning for citation function and provenance. In ACM/IEEE Joint Conference on Digital Libraries. 394–395.
Teufel, S., & Moens, M. (2002). Summarizing scientific articles: Experiments with relevance and rhetorical status. Computational Linguistics, 28(4), 409–445.
Teufel, S., Siddharthan, A., & Tidhar, D. (2006). Automatic classification of citation function. In Proceedings of the 2006 conference on empirical methods in natural language processing. 103–110.
Tuarob, S., Kang, S. W., Wettayakorn, P., Pornprasit, C., Sachati, T., Hassan, S. U., & Haddawy, P. (2019). Automatic classification of algorithm citation functions in scientific literature. IEEE Transactions on Knowledge and Data Engineering, 32(10), 1881–1896.
Valenzuela, M., Ha, V., & Etzioni, O. (2015). Identifying meaningful citations. In Workshops at the twenty-ninth AAAI conference on artificial intelligence (15): 13
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.
Xu, H., Martin, E., & Mahidadia, A. (2013). Using heterogeneous features for scientific citation classification. In Proceedings of the 13th conference of the Pacific Association for Computational Linguistics.
Yousif, A., Niu, Z., Chambua, J., & Khan, Z. Y. (2019). Multi-task learning model based on recurrent convolutional neural networks for citation sentiment and purpose classification. Neurocomputing, 335, 195–205.
Zhang, Y., Wang, Y., Sheng, Q. Z., Mahmood, A., Emma Zhang, W., & Zhao, R. (2021). TDM-CFC: Towards Document-Level Multi-label Citation Function Classification. In International Conference on Web Information Systems Engineering (pp. 363–376).
Zhang, Y., & Yang, Q. (2018). An overview of multi-task learning. National Science Review, 5(1), 30–43.
Zhang, Y., Zhao, R., Wang, Y., Chen, H., Mahmood, A., Zaib, M., Zhang, W. E., & Sheng, Q. Z. (2022). Towards employing native information in citation function classification. Scientometrics. https://doi.org/10.1007/s11192-021-04242-0
Zhu, X., Turney, P., Lemire, D., & Vellino, A. (2015). Measuring academic influence: Not all citations are equal. Journal of the Association for Information Science and Technology, 66(2), 408–427.
Acknowledgements
This work is partially supported by grant from the Applied Basic Research Project of Liaoning Province (No. 2022JH2/101300270), the Scientific Research Innovation Team Project of Dalian University of Foreign Languages (No. 2016CXTD06)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Qi, R., Wei, J., Shao, Z. et al. Multi-task learning model for citation intent classification in scientific publications. Scientometrics 128, 6335–6355 (2023). https://doi.org/10.1007/s11192-023-04858-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-023-04858-4