Skip to main content

TDM-CFC: Towards Document-Level Multi-label Citation Function Classification

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 13081))

Abstract

Citation function classification is an indispensable constituent of the citation content analysis, which has numerous applications, ranging from improving informative citation indexers to facilitating resource search. Existing research works primarily simply treat citation function classification as a sentence-level single-label task, ignoring some essential realistic phenomena thereby creating problems like data bias and noise information. For instance, one scientific paper contains many citations, and each citation context may contain rich discussions of the cited paper, which may reflect multiple citation functions. In this paper, we propose a novel task of Document-level Multi-label Citation Function Classification in a bid to considerably extend the previous research works from a sentence-level single-label task to a document-level multi-label task. Given the complicated nature of the document-level citation function analysis, we propose a novel two-stage fine-tuning approach of large scale pre-trained language model. Specifically, we represent a citation as an independent token and propose a novel two-stage fine-tuning approach to better represent it in the document context. To enable this task, we accordingly introduce a new benchmark, i.e., TDMCite, encompassing 9594 citations (annotated for their function) from online scientific papers by leveraging a three-aspect citation function annotation scheme. Experimental results suggest that our approach results in a considerable improvement in contrast to the state-of-the-art BERT classification fine-tuning approaches.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://github.com/young1010/TDM-CFC.

  2. 2.

    https://www.aclweb.org/anthology.

  3. 3.

    https://huggingface.co/transformers/master/index.html.

References

  1. Abu-Jbara, A., Ezra, J., Radev, D.: Purpose and polarity of citation: Towards nlp-based bibliometrics. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 596–606 (2013)

    Google Scholar 

  2. Beltagy, I., Lo, K., Cohan, A.: Scibert: a pretrained language model for scientific text. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3615–3620 (2019)

    Google Scholar 

  3. Cohan, A., Ammar, W., van Zuylen, M., Cady, F.: Structural scaffolds for citation intent classification in scientific publications. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 3586–3596 (2019)

    Google Scholar 

  4. Croce, D., Castellucci, G., Basili, R.: Gan-bert: generative adversarial learning for robust text classification with a bunch of labeled examples. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 2114–2119 (2020)

    Google Scholar 

  5. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 4171–4186 (2019)

    Google Scholar 

  6. Garfield, E., et al.: Can citation indexing be automated. In: Statistical Association Methods for Mechanized Documentation, Symposium Proceedings, vol. 269, pp. 189–192. Washington (1965)

    Google Scholar 

  7. Garg, S., Ramakrishnan, G.: BAE: BERT-based adversarial examples for text classification. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, pp. 6174–6181 (2020)

    Google Scholar 

  8. He, J., Li, C., Ye, J., Qiao, Y., Gu, L.: Multi-label ocular disease classification with a dense correlation deep neural network. Biomed. Signal Process. Control 63, 102167 (2021)

    Google Scholar 

  9. Hernández, M., Gómez, J.M.: Survey in sentiment, polarity and function analysis of citation. In: Proceedings of the First Workshop on Argumentation Mining, pp. 102–103 (2014)

    Google Scholar 

  10. Hernández-Alvarez, M., Gomez, J.M.: Survey about citation context analysis: tasks, techniques, and resources. Nat. Lang. Eng. 22(3), 327–349 (2016)

    Article  Google Scholar 

  11. Huang, X., Paul, M.J.: Neural temporality adaptation for document classification: diachronic word embeddings and domain adaptation models. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence, Italy (2019)

    Google Scholar 

  12. Jha, R., Abu-Jbara, A., Qazvinian, V., Radev, D.R.: Nlp-driven citation analysis for scientometrics. Nat. Lang. Eng. 23(1), 93–130 (2017)

    Article  Google Scholar 

  13. Jochim, C., Schütze, H.: Towards a generic and flexible citation classifier based on a faceted classification scheme. In: Proceedings of International Conference on Computational Linguistics 2012, pp. 1343–1358 (2012)

    Google Scholar 

  14. Jurgens, D., Kumar, S., Hoover, R., McFarland, D., Jurafsky, D.: Measuring the evolution of a scientific field through citation frames. Trans. Assoc. Comput. Linguist. 6, 391–406 (2018)

    Article  Google Scholar 

  15. Li, X., He, Y., Meyers, A., Grishman, R.: Towards fine-grained citation function classification. In: Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013, pp. 402–407 (2013)

    Google Scholar 

  16. Lyu, D., Ruan, X., Xie, J., Cheng, Y.: The classification of citing motivations: a meta-synthesis. Scientometrics 126(4), 3243–3264 (2021). https://doi.org/10.1007/s11192-021-03908-z

  17. Manning, C., Klein, D.: Optimization, maxent models, and conditional estimation without magic. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: Tutorials, vol. 5, p. 8 (2003)

    Google Scholar 

  18. Moravcsik, M.J., Murugesan, P.: Some results on the function and quality of citations. Soc. Stud. Sci. 5(1), 86–92 (1975)

    Article  Google Scholar 

  19. Ohashi, S., Takayama, J., Kajiwara, T., Chu, C., Arase, Y.: Text classification with negative supervision. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 351–357 (2020)

    Google Scholar 

  20. Pride, D., Knoth, P.: An authoritative approach to citation classification. In: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020, pp. 337–340 (2020)

    Google Scholar 

  21. Qin, Q., Hu, W., Liu, B.: Feature projection for improved text classification. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 8161–8171 (2020)

    Google Scholar 

  22. Roman, M., Shahid, A., Khan, S., Koubaa, A., Yu, L.: Citation intent classification using word embedding. IEEE Access 9, 9982–9995 (2021)

    Article  Google Scholar 

  23. Roman, M., Shahid, A., Uddin, M.I., Hua, Q., Maqsood, S.: Exploiting contextual word embedding of authorship and title of articles for discovering citation intent classification. Complexity 2021 (2021)

    Google Scholar 

  24. Sinha, K., Jia, R., Hupkes, D., Pineau, J., Williams, A., Kiela, D.: Masked language modeling and the distributional hypothesis: Order word matters pre-training for little. arXiv preprint arXiv:2104.06644 (2021)

  25. Teufel, S., Siddharthan, A., Tidhar, D.: Automatic classification of citation function. In: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, pp. 103–110 (2006)

    Google Scholar 

  26. Valenzuela, M., Ha, V., Etzioni, O.: Identifying meaningful citations. In: Workshops at the Twenty-Ninth AAAI Conference on Artificial Intelligence, vol. 15, p. 13 (2015)

    Google Scholar 

  27. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)

    Google Scholar 

  28. Wang, A., Cho, K.: Bert has a mouth, and it must speak: Bert as a Markov random field language model. In: Proceedings of the Workshop on Methods for Optimizing and Evaluating Neural Language Generation, pp. 30–36 (2019)

    Google Scholar 

  29. Weinatoek, M.: Citation indexes. Encycl. Libr. Inf. Sci. 5, 16–40 (1971)

    Google Scholar 

  30. Wu, Y., et al.: Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 (2016)

  31. Zhang, G., Ding, Y., Milojević, S.: Citation content analysis (CCA): a framework for syntactic and semantic analysis of citation content. J. Am. Soc. Inf. Sci. Technol. 64(7), 1490–1503 (2013)

    Google Scholar 

  32. Zhang, S., Huang, H., Liu, J., Li, H.: Spelling error correction with soft-masked Bert. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 882–890 (2020)

    Google Scholar 

  33. Zhao, H., Luo, Z., Feng, C., Zheng, A., Liu, X.: A context-based framework for modeling the role and function of on-line resource citations in scientific literature. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, pp. 5209–5218 (2019)

    Google Scholar 

  34. Zhao, R., Zhang, Y., et al.: Evolution study of sentiment analysis based on bibliometrics of time and space dimensions. Inf. Sci. 36(10), 171–177 (2018). in Chinese

    Google Scholar 

Download references

Acknowledgements

The research work of Yang Zhang is funded under the auspices of the Macquarie University’s Cotutelle-International Macquarie Research Excellence Scholarship. The research work of Yufei Wang is funded by a MQ Research Excellence Scholarship and a CSIRO’s DATA61 Top-up Scholarship. This research is further funded in part by Australian Research Council (ARC) Discovery Project DP200102298 and National Social Science Fund Major Project of China (No. 18ZDA325).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yang Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, Y., Wang, Y., Sheng, Q.Z., Mahmood, A., Emma Zhang, W., Zhao, R. (2021). TDM-CFC: Towards Document-Level Multi-label Citation Function Classification. In: Zhang, W., Zou, L., Maamar, Z., Chen, L. (eds) Web Information Systems Engineering – WISE 2021. WISE 2021. Lecture Notes in Computer Science(), vol 13081. Springer, Cham. https://doi.org/10.1007/978-3-030-91560-5_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-91560-5_26

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-91559-9

  • Online ISBN: 978-3-030-91560-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics