Abstract
With the rapid growth of scientific literature, it is becoming increasingly difficult to identify scientific contribution from the deluge of research papers. Automatically identifying the specific contribution made in a research paper would help quicker comprehension of the work, faster literature survey, comparison with the related works, etc. Here in this work, we investigate methods to automatically extract the contribution statements from research articles. We design a multitask deep neural network leveraging section identification and citance classification of scientific statements to predict whether a given scientific statement specifies a contribution or not. In the long-run, we envisage to create a knowledge graph of scientific contributions for machine comprehension and more straightforward navigation of research contributions in a particular domain. Our approach achieves the best performance over earlier methods (a relative improvement of 8.08% in terms of \(F_1\) score) for contributing sentence identification over a dataset of Natural Language Processing (NLP) papers. We make our code available at here (https://github.com/ammaarahmad1999/Sem-Eval-2021-Task-A).
K. Gupta and A. Ahmad—Equal Contribution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Arxiv submission rate statistics arxiv e-print repository. https://arxiv.org/help/stats/2018_by_area. Accessed 15 July 2021
Codalab - competition. https://competitions.codalab.org/competitions/25680#results. Accessed 15 July 2021
Github - kermitt2/grobid: a machine learning software for extracting information from scholarly documents. https://github.com/kermitt2/grobid. Accessed 15 July 2021
Overview—aasc. https://kmcs.nii.ac.jp/resource/AASC/AASC.html. Accessed 15 July 2021
Scibert-allenai. https://huggingface.co/allenai/scibert_scivocab_uncased. Accessed 15 July 2021
Beltagy, I., et al.: Proceedings of the second workshop on scholarly document processing. In: Proceedings of the Second Workshop on Scholarly Document Processing (2021)
Beltagy, I., Lo, K., Cohan, A.: SciBERT: a pretrained language model for scientific text. arXiv preprint arXiv:1903.10676 (2019)
Brack, A., D’Souza, J., Hoppe, A., Auer, S., Ewerth, R.: Domain-independent extraction of scientific concepts from research articles. Adv. Inf. Retrieval 12035, 251 (2020)
Caruana, R.: Multitask learning. Mach. Learn. 28(1), 41–75 (1997)
Chandrasekaran, M.K., et al.: Overview of the first workshop on scholarly document processing (SDP). In: Proceedings of the First Workshop on Scholarly Document Processing, pp. 1–6 (2020)
Cohan, A., Ammar, W., Van Zuylen, M., Cady, F.: Structural scaffolds for citation intent classification in scientific publications. arXiv preprint arXiv:1904.01608 (2019)
D’Souza, J., Auer, S.: NLPContributions: an annotation scheme for machine reading of scholarly contributions in natural language processing literature. arXiv preprint arXiv:2006.12870 (2020)
D’Souza, J., Auer, S., Pedersen, T.: SemEval-2021 task 11: NLPContributionGraph-structuring scholarly NLP contributions for a research knowledge graph. arXiv preprint arXiv:2106.07385 (2021)
Gupta, S., Manning, C.D.: Analyzing the dynamics of research by extracting key aspects of scientific papers. In: Proceedings of 5th International Joint Conference on Natural Language Processing, pp. 1–9 (2011)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–1780 (1997)
Liu, H., Sarol, M.J., Kilicoglu, H.: Uiuc\_bionlp at semeval-2021 task 11: a cascade of neural models for structuring scholarly NLP contributions. arXiv preprint arXiv:2105.05435 (2021)
Liu, P., Qiu, X., Huang, X.: Recurrent neural network for text classification with multi-task learning. arXiv preprint arXiv:1605.05101 (2016)
Liu, P., Qiu, X., Huang, X.: Adversarial multi-task learning for text classification. arXiv preprint arXiv:1704.05742 (2017)
Qi, P., Zhang, Y., Zhang, Y., Bolton, J., Manning, C.D.: Stanza: a python natural language processing toolkit for many human languages. arXiv preprint arXiv:2003.07082 (2020)
Shailabh, S., Chaurasia, S., Modi, A.: Knowgraph@ iitk at semeval-2021 task 11: building knowledge graph for NLP research. arXiv preprint arXiv:2104.01619 (2021)
Acknowledgement
Asif Ekbal is a recipient of the Visvesvaraya Young Faculty Award and acknowledges Digital India Corporation, Ministry of Electronics and Information Technology, Government of India for supporting this research.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Gupta, K., Ahmad, A., Ghosal, T., Ekbal, A. (2021). ContriSci: A BERT-Based Multitasking Deep Neural Architecture to Identify Contribution Statements from Research Papers. In: Ke, HR., Lee, C.S., Sugiyama, K. (eds) Towards Open and Trustworthy Digital Societies. ICADL 2021. Lecture Notes in Computer Science(), vol 13133. Springer, Cham. https://doi.org/10.1007/978-3-030-91669-5_34
Download citation
DOI: https://doi.org/10.1007/978-3-030-91669-5_34
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-91668-8
Online ISBN: 978-3-030-91669-5
eBook Packages: Computer ScienceComputer Science (R0)