Abstract
Citation function classification is an indispensable constituent of the citation content analysis, which has numerous applications, ranging from improving informative citation indexers to facilitating resource search. Existing research works primarily simply treat citation function classification as a sentence-level single-label task, ignoring some essential realistic phenomena thereby creating problems like data bias and noise information. For instance, one scientific paper contains many citations, and each citation context may contain rich discussions of the cited paper, which may reflect multiple citation functions. In this paper, we propose a novel task of Document-level Multi-label Citation Function Classification in a bid to considerably extend the previous research works from a sentence-level single-label task to a document-level multi-label task. Given the complicated nature of the document-level citation function analysis, we propose a novel two-stage fine-tuning approach of large scale pre-trained language model. Specifically, we represent a citation as an independent token and propose a novel two-stage fine-tuning approach to better represent it in the document context. To enable this task, we accordingly introduce a new benchmark, i.e., TDMCite, encompassing 9594 citations (annotated for their function) from online scientific papers by leveraging a three-aspect citation function annotation scheme. Experimental results suggest that our approach results in a considerable improvement in contrast to the state-of-the-art BERT classification fine-tuning approaches.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Abu-Jbara, A., Ezra, J., Radev, D.: Purpose and polarity of citation: Towards nlp-based bibliometrics. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 596–606 (2013)
Beltagy, I., Lo, K., Cohan, A.: Scibert: a pretrained language model for scientific text. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3615–3620 (2019)
Cohan, A., Ammar, W., van Zuylen, M., Cady, F.: Structural scaffolds for citation intent classification in scientific publications. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 3586–3596 (2019)
Croce, D., Castellucci, G., Basili, R.: Gan-bert: generative adversarial learning for robust text classification with a bunch of labeled examples. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 2114–2119 (2020)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 4171–4186 (2019)
Garfield, E., et al.: Can citation indexing be automated. In: Statistical Association Methods for Mechanized Documentation, Symposium Proceedings, vol. 269, pp. 189–192. Washington (1965)
Garg, S., Ramakrishnan, G.: BAE: BERT-based adversarial examples for text classification. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, pp. 6174–6181 (2020)
He, J., Li, C., Ye, J., Qiao, Y., Gu, L.: Multi-label ocular disease classification with a dense correlation deep neural network. Biomed. Signal Process. Control 63, 102167 (2021)
Hernández, M., Gómez, J.M.: Survey in sentiment, polarity and function analysis of citation. In: Proceedings of the First Workshop on Argumentation Mining, pp. 102–103 (2014)
Hernández-Alvarez, M., Gomez, J.M.: Survey about citation context analysis: tasks, techniques, and resources. Nat. Lang. Eng. 22(3), 327–349 (2016)
Huang, X., Paul, M.J.: Neural temporality adaptation for document classification: diachronic word embeddings and domain adaptation models. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence, Italy (2019)
Jha, R., Abu-Jbara, A., Qazvinian, V., Radev, D.R.: Nlp-driven citation analysis for scientometrics. Nat. Lang. Eng. 23(1), 93–130 (2017)
Jochim, C., Schütze, H.: Towards a generic and flexible citation classifier based on a faceted classification scheme. In: Proceedings of International Conference on Computational Linguistics 2012, pp. 1343–1358 (2012)
Jurgens, D., Kumar, S., Hoover, R., McFarland, D., Jurafsky, D.: Measuring the evolution of a scientific field through citation frames. Trans. Assoc. Comput. Linguist. 6, 391–406 (2018)
Li, X., He, Y., Meyers, A., Grishman, R.: Towards fine-grained citation function classification. In: Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013, pp. 402–407 (2013)
Lyu, D., Ruan, X., Xie, J., Cheng, Y.: The classification of citing motivations: a meta-synthesis. Scientometrics 126(4), 3243–3264 (2021). https://doi.org/10.1007/s11192-021-03908-z
Manning, C., Klein, D.: Optimization, maxent models, and conditional estimation without magic. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: Tutorials, vol. 5, p. 8 (2003)
Moravcsik, M.J., Murugesan, P.: Some results on the function and quality of citations. Soc. Stud. Sci. 5(1), 86–92 (1975)
Ohashi, S., Takayama, J., Kajiwara, T., Chu, C., Arase, Y.: Text classification with negative supervision. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 351–357 (2020)
Pride, D., Knoth, P.: An authoritative approach to citation classification. In: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020, pp. 337–340 (2020)
Qin, Q., Hu, W., Liu, B.: Feature projection for improved text classification. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 8161–8171 (2020)
Roman, M., Shahid, A., Khan, S., Koubaa, A., Yu, L.: Citation intent classification using word embedding. IEEE Access 9, 9982–9995 (2021)
Roman, M., Shahid, A., Uddin, M.I., Hua, Q., Maqsood, S.: Exploiting contextual word embedding of authorship and title of articles for discovering citation intent classification. Complexity 2021 (2021)
Sinha, K., Jia, R., Hupkes, D., Pineau, J., Williams, A., Kiela, D.: Masked language modeling and the distributional hypothesis: Order word matters pre-training for little. arXiv preprint arXiv:2104.06644 (2021)
Teufel, S., Siddharthan, A., Tidhar, D.: Automatic classification of citation function. In: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, pp. 103–110 (2006)
Valenzuela, M., Ha, V., Etzioni, O.: Identifying meaningful citations. In: Workshops at the Twenty-Ninth AAAI Conference on Artificial Intelligence, vol. 15, p. 13 (2015)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
Wang, A., Cho, K.: Bert has a mouth, and it must speak: Bert as a Markov random field language model. In: Proceedings of the Workshop on Methods for Optimizing and Evaluating Neural Language Generation, pp. 30–36 (2019)
Weinatoek, M.: Citation indexes. Encycl. Libr. Inf. Sci. 5, 16–40 (1971)
Wu, Y., et al.: Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 (2016)
Zhang, G., Ding, Y., Milojević, S.: Citation content analysis (CCA): a framework for syntactic and semantic analysis of citation content. J. Am. Soc. Inf. Sci. Technol. 64(7), 1490–1503 (2013)
Zhang, S., Huang, H., Liu, J., Li, H.: Spelling error correction with soft-masked Bert. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 882–890 (2020)
Zhao, H., Luo, Z., Feng, C., Zheng, A., Liu, X.: A context-based framework for modeling the role and function of on-line resource citations in scientific literature. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, pp. 5209–5218 (2019)
Zhao, R., Zhang, Y., et al.: Evolution study of sentiment analysis based on bibliometrics of time and space dimensions. Inf. Sci. 36(10), 171–177 (2018). in Chinese
Acknowledgements
The research work of Yang Zhang is funded under the auspices of the Macquarie University’s Cotutelle-International Macquarie Research Excellence Scholarship. The research work of Yufei Wang is funded by a MQ Research Excellence Scholarship and a CSIRO’s DATA61 Top-up Scholarship. This research is further funded in part by Australian Research Council (ARC) Discovery Project DP200102298 and National Social Science Fund Major Project of China (No. 18ZDA325).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhang, Y., Wang, Y., Sheng, Q.Z., Mahmood, A., Emma Zhang, W., Zhao, R. (2021). TDM-CFC: Towards Document-Level Multi-label Citation Function Classification. In: Zhang, W., Zou, L., Maamar, Z., Chen, L. (eds) Web Information Systems Engineering – WISE 2021. WISE 2021. Lecture Notes in Computer Science(), vol 13081. Springer, Cham. https://doi.org/10.1007/978-3-030-91560-5_26
Download citation
DOI: https://doi.org/10.1007/978-3-030-91560-5_26
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-91559-9
Online ISBN: 978-3-030-91560-5
eBook Packages: Computer ScienceComputer Science (R0)