TDM-CFC: Towards Document-Level Multi-label Citation Function Classification

Zhang, Yang; Wang, Yufei; Sheng, Quan Z.; Mahmood, Adnan; Emma Zhang, Wei; Zhao, Rongying

doi:10.1007/978-3-030-91560-5_26

TDM-CFC: Towards Document-Level Multi-label Citation Function Classification

Yang Zhang^12,13,
Yufei Wang¹³,
Quan Z. Sheng¹³,
Adnan Mahmood¹³,
Wei Emma Zhang¹⁴ &
…
Rongying Zhao¹²

Conference paper
First Online: 01 January 2022

1246 Accesses
7 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 13081))

Abstract

Citation function classification is an indispensable constituent of the citation content analysis, which has numerous applications, ranging from improving informative citation indexers to facilitating resource search. Existing research works primarily simply treat citation function classification as a sentence-level single-label task, ignoring some essential realistic phenomena thereby creating problems like data bias and noise information. For instance, one scientific paper contains many citations, and each citation context may contain rich discussions of the cited paper, which may reflect multiple citation functions. In this paper, we propose a novel task of Document-level Multi-label Citation Function Classification in a bid to considerably extend the previous research works from a sentence-level single-label task to a document-level multi-label task. Given the complicated nature of the document-level citation function analysis, we propose a novel two-stage fine-tuning approach of large scale pre-trained language model. Specifically, we represent a citation as an independent token and propose a novel two-stage fine-tuning approach to better represent it in the document context. To enable this task, we accordingly introduce a new benchmark, i.e., TDMCite, encompassing 9594 citations (annotated for their function) from online scientific papers by leveraging a three-aspect citation function annotation scheme. Experimental results suggest that our approach results in a considerable improvement in contrast to the state-of-the-art BERT classification fine-tuning approaches.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

References

Abu-Jbara, A., Ezra, J., Radev, D.: Purpose and polarity of citation: Towards nlp-based bibliometrics. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 596–606 (2013)
Google Scholar
Beltagy, I., Lo, K., Cohan, A.: Scibert: a pretrained language model for scientific text. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3615–3620 (2019)
Google Scholar
Cohan, A., Ammar, W., van Zuylen, M., Cady, F.: Structural scaffolds for citation intent classification in scientific publications. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 3586–3596 (2019)
Google Scholar
Croce, D., Castellucci, G., Basili, R.: Gan-bert: generative adversarial learning for robust text classification with a bunch of labeled examples. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 2114–2119 (2020)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 4171–4186 (2019)
Google Scholar
Garfield, E., et al.: Can citation indexing be automated. In: Statistical Association Methods for Mechanized Documentation, Symposium Proceedings, vol. 269, pp. 189–192. Washington (1965)
Google Scholar
Garg, S., Ramakrishnan, G.: BAE: BERT-based adversarial examples for text classification. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, pp. 6174–6181 (2020)
Google Scholar
He, J., Li, C., Ye, J., Qiao, Y., Gu, L.: Multi-label ocular disease classification with a dense correlation deep neural network. Biomed. Signal Process. Control 63, 102167 (2021)
Google Scholar
Hernández, M., Gómez, J.M.: Survey in sentiment, polarity and function analysis of citation. In: Proceedings of the First Workshop on Argumentation Mining, pp. 102–103 (2014)
Google Scholar
Hernández-Alvarez, M., Gomez, J.M.: Survey about citation context analysis: tasks, techniques, and resources. Nat. Lang. Eng. 22(3), 327–349 (2016)
Article Google Scholar
Huang, X., Paul, M.J.: Neural temporality adaptation for document classification: diachronic word embeddings and domain adaptation models. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence, Italy (2019)
Google Scholar
Jha, R., Abu-Jbara, A., Qazvinian, V., Radev, D.R.: Nlp-driven citation analysis for scientometrics. Nat. Lang. Eng. 23(1), 93–130 (2017)
Article Google Scholar
Jochim, C., Schütze, H.: Towards a generic and flexible citation classifier based on a faceted classification scheme. In: Proceedings of International Conference on Computational Linguistics 2012, pp. 1343–1358 (2012)
Google Scholar
Jurgens, D., Kumar, S., Hoover, R., McFarland, D., Jurafsky, D.: Measuring the evolution of a scientific field through citation frames. Trans. Assoc. Comput. Linguist. 6, 391–406 (2018)
Article Google Scholar
Li, X., He, Y., Meyers, A., Grishman, R.: Towards fine-grained citation function classification. In: Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013, pp. 402–407 (2013)
Google Scholar
Lyu, D., Ruan, X., Xie, J., Cheng, Y.: The classification of citing motivations: a meta-synthesis. Scientometrics 126(4), 3243–3264 (2021). https://doi.org/10.1007/s11192-021-03908-z
Manning, C., Klein, D.: Optimization, maxent models, and conditional estimation without magic. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: Tutorials, vol. 5, p. 8 (2003)
Google Scholar
Moravcsik, M.J., Murugesan, P.: Some results on the function and quality of citations. Soc. Stud. Sci. 5(1), 86–92 (1975)
Article Google Scholar
Ohashi, S., Takayama, J., Kajiwara, T., Chu, C., Arase, Y.: Text classification with negative supervision. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 351–357 (2020)
Google Scholar
Pride, D., Knoth, P.: An authoritative approach to citation classification. In: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020, pp. 337–340 (2020)
Google Scholar
Qin, Q., Hu, W., Liu, B.: Feature projection for improved text classification. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 8161–8171 (2020)
Google Scholar
Roman, M., Shahid, A., Khan, S., Koubaa, A., Yu, L.: Citation intent classification using word embedding. IEEE Access 9, 9982–9995 (2021)
Article Google Scholar
Roman, M., Shahid, A., Uddin, M.I., Hua, Q., Maqsood, S.: Exploiting contextual word embedding of authorship and title of articles for discovering citation intent classification. Complexity 2021 (2021)
Google Scholar
Sinha, K., Jia, R., Hupkes, D., Pineau, J., Williams, A., Kiela, D.: Masked language modeling and the distributional hypothesis: Order word matters pre-training for little. arXiv preprint arXiv:2104.06644 (2021)
Teufel, S., Siddharthan, A., Tidhar, D.: Automatic classification of citation function. In: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, pp. 103–110 (2006)
Google Scholar
Valenzuela, M., Ha, V., Etzioni, O.: Identifying meaningful citations. In: Workshops at the Twenty-Ninth AAAI Conference on Artificial Intelligence, vol. 15, p. 13 (2015)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
Google Scholar
Wang, A., Cho, K.: Bert has a mouth, and it must speak: Bert as a Markov random field language model. In: Proceedings of the Workshop on Methods for Optimizing and Evaluating Neural Language Generation, pp. 30–36 (2019)
Google Scholar
Weinatoek, M.: Citation indexes. Encycl. Libr. Inf. Sci. 5, 16–40 (1971)
Google Scholar
Wu, Y., et al.: Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 (2016)
Zhang, G., Ding, Y., Milojević, S.: Citation content analysis (CCA): a framework for syntactic and semantic analysis of citation content. J. Am. Soc. Inf. Sci. Technol. 64(7), 1490–1503 (2013)
Google Scholar
Zhang, S., Huang, H., Liu, J., Li, H.: Spelling error correction with soft-masked Bert. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 882–890 (2020)
Google Scholar
Zhao, H., Luo, Z., Feng, C., Zheng, A., Liu, X.: A context-based framework for modeling the role and function of on-line resource citations in scientific literature. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, pp. 5209–5218 (2019)
Google Scholar
Zhao, R., Zhang, Y., et al.: Evolution study of sentiment analysis based on bibliometrics of time and space dimensions. Inf. Sci. 36(10), 171–177 (2018). in Chinese
Google Scholar

Download references

Acknowledgements

The research work of Yang Zhang is funded under the auspices of the Macquarie University’s Cotutelle-International Macquarie Research Excellence Scholarship. The research work of Yufei Wang is funded by a MQ Research Excellence Scholarship and a CSIRO’s DATA61 Top-up Scholarship. This research is further funded in part by Australian Research Council (ARC) Discovery Project DP200102298 and National Social Science Fund Major Project of China (No. 18ZDA325).

Author information

Authors and Affiliations

School of Information Management, Wuhan University, Wuhan, Hubei, China
Yang Zhang & Rongying Zhao
Department of Computing, Faculty of Science and Engineering, Macquarie University, Sydney, NSW, 2109, Australia
Yang Zhang, Yufei Wang, Quan Z. Sheng & Adnan Mahmood
School of Computer Science, The University of Adelaide, North Terrace, Adelaide, SA, 5005, Australia
Wei Emma Zhang

Authors

Yang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yufei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Quan Z. Sheng
View author publications
You can also search for this author in PubMed Google Scholar
Adnan Mahmood
View author publications
You can also search for this author in PubMed Google Scholar
Wei Emma Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Rongying Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yang Zhang .

Editor information

Editors and Affiliations

School of Computer Science and Engineering, The University of New South Wales, Sydney, NSW, Australia
Wenjie Zhang
Peking University, Beijing, China
Lei Zou
Zayed University, Dubai, United Arab Emirates
Zakaria Maamar
Swinburne University of Technology, Melbourne, VIC, Australia
Lu Chen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, Y., Wang, Y., Sheng, Q.Z., Mahmood, A., Emma Zhang, W., Zhao, R. (2021). TDM-CFC: Towards Document-Level Multi-label Citation Function Classification. In: Zhang, W., Zou, L., Maamar, Z., Chen, L. (eds) Web Information Systems Engineering – WISE 2021. WISE 2021. Lecture Notes in Computer Science(), vol 13081. Springer, Cham. https://doi.org/10.1007/978-3-030-91560-5_26

Download citation

DOI: https://doi.org/10.1007/978-3-030-91560-5_26
Published: 01 January 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-91559-9
Online ISBN: 978-3-030-91560-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics