Tagging Assistant for Scientific Articles

Nasar, Zara; Jaffry, Syed Waqar; Malik, Muhammad Kamran

doi:10.1007/978-981-13-6052-7_30

Zara Nasar¹¹,
Syed Waqar Jaffry¹¹ &
Muhammad Kamran Malik¹¹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 932))

Included in the following conference series:

International Conference on Intelligent Technologies and Applications

1587 Accesses

Abstract

With the advent of World Wide Web (WWW), world is being overloaded with huge data. This huge data carries potential information that once extracted, can be used for betterment of humanity. Information from this data can be extracted using manual and automatic analysis. Manual analysis is not scalable and efficient, whereas, the automatic analysis involves computing mechanisms that aid in automatic information extraction over huge amount of data. WWW has also affected overall growth in scientific literature that makes the process of literature review quite laborious, time consuming and cumbersome job for researchers. Hence a dire need is felt to automatically extract potential information out of immense set of scientific articles in order to automate the process of literature review. Such service would require machine learning models to train. Whereas, such model in turn require training dataset. To construct a quality dataset often involves employment of annotation tools. There exist wide variety of annotation tools, but none are tailored to assist annotation of scientific articles. Hence in this study, web-based annotation tool for scientific articles is developed using Python language. The developed assistant employs state of the art machine learning models to extract metadata from scientific articles as well as to process article’s text. It provides various filters in order to assist annotators. An article is divided into various textual constructs including sections, paragraphs, sentences, tokens and lemmas. This division can help annotators by addressing their information need in an efficient manner. Hence, this annotation tool can significantly reduce time while preparing dataset for full-text scientific articles.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Price, D.J.: Science Since Babylon. Yale University Press, New Haven (1961)
Google Scholar
Mudrak, B.: Scholarly Publishing in 2016, AJE: American Journal Experts (2016). https://www.aje.com/en/arc/scholarly-publishing-trends-2016/. Accessed 2 Apr 2018
NSF: S&E Indicators 2018, NSF - National Science Foundation (2018). https://www.nsf.gov/statistics/2018/nsb20181/. Accessed 03 Apr 2018
Morin, B.: LibGuides: Systematic Reviews: Intro (2017). https://researchguides.library.tufts.edu/c.php?g=249130&p=1658802. Accessed 27 Mar 2018
Borah, R., Brown, A.W., Capers, P.L., Kaiser, K.A.: Analysis of the time and workers needed to conduct systematic reviews of medical interventions using data from the PROSPERO registry. BMJ Open 7(2), e012545 (2017)
Google Scholar
Harkema, H., Roberts, I., Gaizauskas, R., Hepple, M.: Information extraction from clinical records. In: Proceedings of the 4th UK e-Science All Hands Meeting (2005)
Google Scholar
Seymore, K., McCallum, A., Rosenfeld, R.: Learning hidden Markov model structure for information extraction. In: Proceedings of the AAAI 1999 Workshop Machine Learning for Information Extraction, pp. 37–42 (1999)
Google Scholar
Anzaroot, S., Mccallum, A.: A new dataset for fine-grained citation field extraction (2013)
Google Scholar
McCallum, A., Freitag, D., Pereira, F.C.: Maximum entropy Markov models for information extraction and segmentation. In: ICML, vol. 17, pp. 591–598 (2000)
Google Scholar
Liakata, M.: Aberystwyth University – ART (2009). https://www.aber.ac.uk/en/cs/research/cb/projects/art/. Accessed 12 Feb 2018
Liakata, M.: Zones of conceptualisation in scientific papers: a window to negative and speculative statements. In: Proceedings of the Workshop on Negation and Speculation in Natural Language Processing, Stroudsburg, PA, USA, pp. 1–4 (2010)
Google Scholar
Gupta, S., Manning, C.: Analyzing the dynamics of research by extracting key aspects of scientific papers. In: Proceedings of 5th International Joint Conference on Natural Language Processing, pp. 1–9 (2011)
Google Scholar
Tateisi, Y., Ohta, T., Pyysalo, S., Miyao, Y., Aizawa, A.: Typed entity and relation annotation on computer science papers. In: LREC (2016)
Google Scholar
Augenstein, I., Das, M., Riedel, S., Vikraman, L., McCallum, A.: SemEval 2017 task 10: ScienceIE - extracting keyphrases and relations from Scientific publications, arXiv:170402853 Cs Stat, April 2017
Stenetorp, P., Pyysalo, S., Topić, G., Ohta, T., Ananiadou, S., Tsujii, J.: BRAT: a web-based tool for NLP-assisted text annotation. In: Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics, pp. 102–107 (2012)
Google Scholar
Mitre, C.: Callisto - Home Page (2013). https://mitre.github.io/callisto/index.html. Accessed 7 July 2018
Open Calais: Open Calais (2008). http://www.opencalais.com/. Accessed 6 Sept 2017
Beel, J., Langer, S., Genzmehr, M., Nürnberger, A.: Introducing Docear’s research paper recommender system. In: Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries, New York, NY, USA, pp. 459–460 (2013)
Google Scholar
Councill, I., Giles, C.L., Kan, M.-Y.: ParsCit: an open-source CRF reference string parsing package. In: Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008), Marrakech, Morocco (2008)
Google Scholar
Lopez, P.: GROBID: combining automatic bibliographic data recognition and term extraction for scholarship publications. In: Agosti, M., Borbinha, J., Kapidakis, S., Papatheodorou, C., Tsakonas, G. (eds.) ECDL 2009. LNCS, vol. 5714, pp. 473–474. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04346-8_62
Google Scholar
Tkaczyk, D., Szostek, P., Fedoryszak, M., Dendek, P., Bolikowski, Ł.: CERMINE: automatic extraction of structured metadata from scientific literature. Int. J. Doc. Anal. Recogn. IJDAR 18(4), 317–335 (2015)
Google Scholar
Tkaczyk, D., Collins, A., Sheridan, P., Beel, J.: Machine learning vs. rules and out-of-the-box vs. retrained: an evaluation of open-source bibliographic reference and citation parsers. In: Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries, pp. 99–108 (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Artificial Intelligence and Multidisciplinary Research Lab, Punjab University College of Information Technology, University of the Punjab, Lahore, 54000, Pakistan
Zara Nasar, Syed Waqar Jaffry & Muhammad Kamran Malik

Authors

Zara Nasar
View author publications
You can also search for this author in PubMed Google Scholar
Syed Waqar Jaffry
View author publications
You can also search for this author in PubMed Google Scholar
Muhammad Kamran Malik
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zara Nasar .

Editor information

Editors and Affiliations

Department of Computer Science and IT, Islamia University of Bahawalpur, Baghdad, Pakistan
Imran Sarwar Bajwa
Mathematical and Computer Sciences, Heriot-Watt University, Edinburgh, UK
Fairouz Kamareddine
Department of Computer Engineering and Digital Systems, University of Sao Paulo, São Paulo, Brazil
Anna Costa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nasar, Z., Jaffry, S.W., Malik, M.K. (2019). Tagging Assistant for Scientific Articles. In: Bajwa, I., Kamareddine, F., Costa, A. (eds) Intelligent Technologies and Applications. INTAP 2018. Communications in Computer and Information Science, vol 932. Springer, Singapore. https://doi.org/10.1007/978-981-13-6052-7_30

Download citation

DOI: https://doi.org/10.1007/978-981-13-6052-7_30
Published: 12 March 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-6051-0
Online ISBN: 978-981-13-6052-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics