skip to main content
10.1145/3582768.3582769acmotherconferencesArticle/Chapter ViewAbstractPublication PagesnlpirConference Proceedingsconference-collections
research-article

Contextualised Modelling for Effective Citation Function Classification

Published: 27 June 2023 Publication History

Abstract

Citation function classification is an important task in scientific text mining. The past two decades have witnessed many computerised algorithms working on various citation function datasets tailored to various annotation schemes. Recently, deep learning has pushed the state of the art by a large margin. Several pitfalls exist. Due to annotation difficulty, data sizes, especially the minority classes, are often not big enough for training effective deep learning models. Being less discussed, most state-of-the-art deep learning solutions in fact generate a feature representation for the citation sentence or context, instead of modelling individual in-text citations. This is conceptually flawed as it is common to see multiple in-text citations with different functions in the same citation sentence. In addition, existing deep learning studies have only explored a rather limited design space of encoding citation and its surrounding context. This paper explored a wide range of modelling options based on SciBERT, the popular cross-disciplinary pre-trained scientific language model, and their performances on citation function classification, for the purpose of determining the most effective way of modeling citation and its context. To deal with the data size issue, we created a large-scale citation function dataset by mapping, merging and re-annotating six publicly available datasets from the computational linguistics domain by adapting Teufel et al.’s 12-class scheme. The best F1 scores we achieved were around 66.16%, 71.39% and 73.56% on a 11-class annotation scheme slightly adapted from Teufel et al.’s 12-class scheme, a reduced 7-class scheme by merging comparison functions, and Jurgens et al.’s 6-class scheme respectively. A useful observation is that there is no single best model that is superior for all functions, therefore the trained model variants allow for applications which emphasise on a specific type of or a specific group of citation functions.

References

[1]
Myriam Hernández Alvarez, and José Manuel Gómez, 2016. Survey about citation context analysis: Tasks, techniques, and resources. Natural Language Engineering 22, 3, 327–349. https://doi.org/10.1017/S1351324915000388
[2]
Dongqing Lyu, Xuanmin Ruan, Juan Xie, and Ying Cheng, 2021. The classification of citing motivations: a meta‑synthesis. Scientometrics 126 (Feb, 2021), 3243–3264. https://doi.org/10.1007/s11192-021-03908-z
[3]
Suchetha N. Kunnath, Drahomira Herrmannova, David Pride, and Petr Knoth, 2021. A meta-analysis of semantic classification of citations. Quantitative Science Studies, 2, 4, 1170–1215. https://doi.org/10.1162/qss_a_00159
[4]
Mark Garzone, and Robert E. Mercer, 2000. Towards an Automated Citation Classifier. In Proceedings of the 2000 Conference of the Canadian Society for Computational Studies of Intelligence (Canadian AI’20). Springer, Berlin, Heidelberg, 337-346. https://doi.org/10.1007/3-540-45486-1_28
[5]
Hidetsugu Nanba, Noriko Kando, and Manabu Okumura, 2000. Classification of research papers using citation links and citation types: Towards automatic review article generation. In Proceedings of the 11th ASIS SIG/CR Classification Research Workshop. 117-134. http://dx.doi.org/10.7152/acro.v11i1.12774
[6]
Simone Teufel, 2010. The Structure of Scientific Articles: Applications to Citation Indexing and Summarization. CSLI Publications.
[7]
Simone Teufel, Advaith Siddharthan, and Dan Tidhar, 2006. An annotation scheme for citation function. In Proceedings of the 7th SIGdial Workshop on Discourse and Dialogue (SIGdial’06). Association for Computational Linguistics, Stroudsburg, PA, USA, 80-87. https://aclanthology.org/W06-1312
[8]
Simone Teufel, Advaith Siddharthan, and Dan Tidhar, 2006. Automatic classification of citation function. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing (EMNLP’06). Association for Computational Linguistics, Stroudsburg, PA, USA, 103-110. https://aclanthology.org/W06-1613
[9]
David Jurgens, Srijan Kumar, Raine Hoover, Dan McFarland, and Dan Jurafsky, 2018. Measuring the Evolution of a Scientific Field through Citation Frames. Transactions of the Association for Computational Linguistic 6 (2018), 391-406. https://aclanthology.org/Q18-1028
[10]
Khadidja Bakhti, Zhendong Niu, Abdallah Yousif, and Ally S. Nyamawe, 2018. Citation Function Classification Based on Ontologies and Convolutional Neural Networks. In Proceedings of the 7th International Workshop of Learning Technology for Education Challenges (LTEC’18). Springer, Heidelberg, Germany, 105-115. https://doi.org/10.1007/978-3-319-95522-3_10
[11]
Iz Beltagy, Kyle Lo, and Arman Cohan. SciBERT: A Pretrained Language Model for Scientific Text. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP’19). Association for Computational Linguistics, Stroudsburg, PA, USA, 3615-3620. https://aclanthology.org/D19-1371
[12]
Arman Cohan, Waleed Ammar, Madeleine van Zuylen, and Field Cady, 2019. Structural Scaffolds for Citation Intent Classification in Scientific Publications. In Proceedings of the 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL’19). Association for Computational Linguistics, Stroudsburg, PA, USA, 3856-3896. https://aclanthology.org/N19-1361
[13]
Anne Lauscher, Brandon Ko, Bailey Kuehl, Sophie Johnson, Arman Cohan, David Jurgens, and Kyle Lo, 2022. MULTICITE: Modelling realistic citations requires moving beyond the single-sentence single-label setting. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL’22). Association for Computational Linguistics, Stroudsburg, PA, USA, 1875-1899. https://aclanthology.org/2022.naacl-main.137
[14]
Himanshu Maheshwari, Bhavyajeet Singh, and Vasudeva Varma, 2021. SciBERT Sentence Representation for Citation Context Classification. In Proceedings of the Second Workshop on Scholarly Document Processing (SDP’21). Association for Computational Linguistics, Stroudsburg, PA, USA, 130-133. https://aclanthology.org/2021.sdp-1.17
[15]
Tsendsuren Munkhdalai, John Lalor, and Hong Yu, 2016. Citation Analysis with Neural Attention Models. In Proceedings of the Seventh International Workshop on Health Text Mining and Information Analysis (LOUHI’16). Association for Computational Linguistics, Stroudsburg, PA, USA, 69-77. https://aclanthology.org/W16-6109
[16]
Abdallah Yousif, Zhendong Niu, and Ally S. Nyamawe, 2018. Citation Classification Using Multitask Convolutional Neural Network Model. In Proceedings of the 11th International Conference on Knowledge Science, Engineering and Management (KSEM’18). Springer, Heidelberg, Germany, 232-243. https://doi.org/10.1007/978-3-319-99247-1_20
[17]
Abdallah Yousif, Zhendong Niu, James Chambua, and Zahid YounasKhana, 2019. Multi-task learning model based on recurrent convolutional neural networks for citation sentiment and purpose classification. Neurocomputing 335 (Mar, 2019), 195-205. https://doi.org/10.1016/j.neucom.2019.01.021
[18]
Xiang Li, Yifan He, Adam Meyers, and Ralph Grishman, 2013. Towards Fine-grained Citation Function Classification. In Proceedings of the 2013 International Conference Recent Advances in Natural Language Processing (RANLP’13). Association for Computational Linguistics, Stroudsburg, PA, USA, 402-407. https://aclanthology.org/R13-1052
[19]
Anne Lauscher, Goran Glavaš, Simone Paolo Ponzetto, and Kai Eckert. 2017. Investigating convolutional networks and domain-specific embeddings for semantic classification of citations. In Proceedings of the 6th International Workshop on Mining Scientific Publications (WOSP’17). ACM, New York, NY, USA, 24-28. https://doi.org/10.1145/3127526.3127531
[20]
Myriam Hernández Alvarez, José Manuel Gómez, and Patricio Martínez-Barco, 2017. Citation function, polarity and influence classification. Natural Language Engineering 23, 4, 561-588. https://doi.org/10.1017/S1351324916000346
[21]
Xuan Su, Animesh Prasad, Min-Yen Kan, and Kazunari Sugiyama, 2019. Neural Multi-task Learning for Citation Function and Provenance. In Proceedings of the 2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL’19). IEEE, New York, NY, USA, 394-395. https://doi.org/10.1109/JCDL.2019.00122
[22]
Cailing Dong, and Ulrich Schäfer, 2011. Ensemble-style Self-training on Citation Classification. In Proceedings of 5th International Joint Conference on Natural Language Processing (IJCNLP’11). Association for Computational Linguistics, Stroudsburg, PA, USA, 623-631. https://aclanthology.org/I11-1070
[23]
Amjad Abu-Jbara, Jefferson Erza, and Dragomir Radev, 2013. Purpose and Polarity of Citation: Towards NLP-based Bibliometrics. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’03). Association for Computational Linguistics, Stroudsburg, PA, USA, 596-606. https://aclanthology.org/N13-1067
[24]
Charles Jochim, and Hinrich Schütze, 2012. Towards a Generic and Flexible Citation Classifier Based on a Faceted Classification Scheme. In Proceedings of the 24th International Conference on Computational Linguistics (COLING’12). Association for Computational Linguistics, Stroudsburg, PA, USA, 1343-1358. https://aclanthology.org/C12-1082
[25]
Rui Meng, Wei Lu, Yu-huan Chi, and Shuguang Han, 2017. Automatic Classification of Citation Function by New Linguistic Features. In Proceedings of iConference 2017. iSchools Inc, Grandville, MI, USA, 826-830. https://doi.org/10.9776/17349
[26]
Shashank Agarwal, Lisha Choubey, and Hong Yu, 2010. Automatically Classifying the Role of Citations in Biomedical Articles. In Proceedings of the 2010 Annual Symposium of the American Medical Informatics Association (AMIA’10). https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3041379
[27]
Suchetha N. Kunnath, David Pride, Bikash Gyawali, and Petr Knoth, 2020. Overview of the 2020 WOSP 3C Citation Context Classification Task. In Proceedings of the 8th International Workshop on Mining Scientific Publications (WOSP’20). Association for Computational Linguistics, Stroudsburg, PA, USA, 75-83. https://aclanthology.org/2020.wosp-1.12
[28]
Suchetha N. Kunnath, David Pride, Drahomira Herrmannova and Petr Knoth, 2021. Overview of the 2021 SDP 3C Citation Context Classification Shared Task. In Proceedings of the Second Workshop on Scholarly Document Processing (SDP’21). Association for Computational Linguistics, Stroudsburg, PA, USA, 137-145. https://aclanthology.org/2021.sdp-1.2
[29]
Muhammad Roman, Abdul Shahid, Shafiullah Khan, Anis Koubaa, and Lisu Yu, 2021. Citation Intent Classification Using Word Embedding. IEEE Access 9 (Jan, 2021), 9982-9995. https://doi.org/10.1109/ACCESS.2021.3050547
[30]
Suppawong Tuarob, Sung Woo Kang, Poom Wettayakorn, Chanatip Pornprasit, Tanakitti Sachati, Saeed-Ul Hassan, and Peter Haddawy, 2021. Automatic Classification of Algorithm Citation Functions in Scientific Literature. IEEE Transactions on Knowledge and Data Engineering 31, 10 (Apr, 2019), 1881–1896. https://doi.org/10.1109/TKDE.%202019.2913376
[31]
He Zhao, Zhunchen Luo, Chong Feng, Aiqing Zheng, and Xiaopeng Liu, 2019. A Context-based Framework for Modeling the Role and Function of On-line Resource Citations in Scientific Literature. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19). Association for Computational Linguistics, Stroudsburg, PA, USA, 5206–5215. https://aclanthology.org/D19-1524
[32]
Amjad Abu-Jbara, and Dragomir Radev, 2012. Reference Scope Identification in Citing Sentences. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’12). Association for Computational Linguistics, Stroudsburg, PA, 80-90. https://aclanthology.org/N12-1009
[33]
Peeyush Aggarwal, and Richa Sharma, 2016. Lexical and Syntactic cues to identify Reference Scope of Citance. In Proceedings of the 1st Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL’16). CEUR, 103-112. http://ceur-ws.org/Vol-1610/paper12.pdf
[34]
Do “Future Work” sections have a purpose? Citation links and entailment for global scientometric questions. In Proceedings of the 2nd Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL 2017). CEUR. http://ceur-ws.org/Vol-1888/paper1.pdf
[35]
Wenke Hao, Zhicheng Li, Yuchen Qian, Yuzhuo Wang, and Chengzhi Zhang, 2020. The ACL FWS-RC: A Dataset for Recognition and Classification of Sentence about Future Works. In Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020 (JCDL’20). ACM, New York, NY, USA, 261-269. https://doi.org/10.1145/3383583.3398526
[36]
Xiaorui Jiang, Xinghao Zhu, and Jingqiang Chen, 2020. Main path analysis on cyclic citation networks. Journal of the Association for Information Science and Technology 71, 5, 578-595. https://doi.org/10.1002/asi.24258

Cited By

View all
  • (2025)GAN-CITE: leveraging semi-supervised generative adversarial networks for citation function classification with limited dataScientometrics10.1007/s11192-025-05233-1Online publication date: 28-Jan-2025

Index Terms

  1. Contextualised Modelling for Effective Citation Function Classification
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Information & Contributors

            Information

            Published In

            cover image ACM Other conferences
            NLPIR '22: Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval
            December 2022
            241 pages
            ISBN:9781450397629
            DOI:10.1145/3582768
            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            Published: 27 June 2023

            Permissions

            Request permissions for this article.

            Check for updates

            Author Tags

            1. Citation function classification
            2. SciBERT
            3. citation context analysis
            4. citation intent identification
            5. deep learning

            Qualifiers

            • Research-article
            • Research
            • Refereed limited

            Funding Sources

            • National Planning Office for Philosophy and Social Sciences of China

            Conference

            NLPIR 2022

            Contributors

            Other Metrics

            Bibliometrics & Citations

            Bibliometrics

            Article Metrics

            • Downloads (Last 12 months)18
            • Downloads (Last 6 weeks)0
            Reflects downloads up to 14 Feb 2025

            Other Metrics

            Citations

            Cited By

            View all
            • (2025)GAN-CITE: leveraging semi-supervised generative adversarial networks for citation function classification with limited dataScientometrics10.1007/s11192-025-05233-1Online publication date: 28-Jan-2025

            View Options

            Login options

            View options

            PDF

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader

            HTML Format

            View this article in HTML Format.

            HTML Format

            Figures

            Tables

            Media

            Share

            Share

            Share this Publication link

            Share on social media