Skip to main content

Tagging Assistant for Scientific Articles

  • Conference paper
  • First Online:
Intelligent Technologies and Applications (INTAP 2018)

Abstract

With the advent of World Wide Web (WWW), world is being overloaded with huge data. This huge data carries potential information that once extracted, can be used for betterment of humanity. Information from this data can be extracted using manual and automatic analysis. Manual analysis is not scalable and efficient, whereas, the automatic analysis involves computing mechanisms that aid in automatic information extraction over huge amount of data. WWW has also affected overall growth in scientific literature that makes the process of literature review quite laborious, time consuming and cumbersome job for researchers. Hence a dire need is felt to automatically extract potential information out of immense set of scientific articles in order to automate the process of literature review. Such service would require machine learning models to train. Whereas, such model in turn require training dataset. To construct a quality dataset often involves employment of annotation tools. There exist wide variety of annotation tools, but none are tailored to assist annotation of scientific articles. Hence in this study, web-based annotation tool for scientific articles is developed using Python language. The developed assistant employs state of the art machine learning models to extract metadata from scientific articles as well as to process article’s text. It provides various filters in order to assist annotators. An article is divided into various textual constructs including sections, paragraphs, sentences, tokens and lemmas. This division can help annotators by addressing their information need in an efficient manner. Hence, this annotation tool can significantly reduce time while preparing dataset for full-text scientific articles.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Price, D.J.: Science Since Babylon. Yale University Press, New Haven (1961)

    Google Scholar 

  2. Mudrak, B.: Scholarly Publishing in 2016, AJE: American Journal Experts (2016). https://www.aje.com/en/arc/scholarly-publishing-trends-2016/. Accessed 2 Apr 2018

  3. NSF: S&E Indicators 2018, NSF - National Science Foundation (2018). https://www.nsf.gov/statistics/2018/nsb20181/. Accessed 03 Apr 2018

  4. Morin, B.: LibGuides: Systematic Reviews: Intro (2017). https://researchguides.library.tufts.edu/c.php?g=249130&p=1658802. Accessed 27 Mar 2018

  5. Borah, R., Brown, A.W., Capers, P.L., Kaiser, K.A.: Analysis of the time and workers needed to conduct systematic reviews of medical interventions using data from the PROSPERO registry. BMJ Open 7(2), e012545 (2017)

    Google Scholar 

  6. Harkema, H., Roberts, I., Gaizauskas, R., Hepple, M.: Information extraction from clinical records. In: Proceedings of the 4th UK e-Science All Hands Meeting (2005)

    Google Scholar 

  7. Seymore, K., McCallum, A., Rosenfeld, R.: Learning hidden Markov model structure for information extraction. In: Proceedings of the AAAI 1999 Workshop Machine Learning for Information Extraction, pp. 37–42 (1999)

    Google Scholar 

  8. Anzaroot, S., Mccallum, A.: A new dataset for fine-grained citation field extraction (2013)

    Google Scholar 

  9. McCallum, A., Freitag, D., Pereira, F.C.: Maximum entropy Markov models for information extraction and segmentation. In: ICML, vol. 17, pp. 591–598 (2000)

    Google Scholar 

  10. Liakata, M.: Aberystwyth University – ART (2009). https://www.aber.ac.uk/en/cs/research/cb/projects/art/. Accessed 12 Feb 2018

  11. Liakata, M.: Zones of conceptualisation in scientific papers: a window to negative and speculative statements. In: Proceedings of the Workshop on Negation and Speculation in Natural Language Processing, Stroudsburg, PA, USA, pp. 1–4 (2010)

    Google Scholar 

  12. Gupta, S., Manning, C.: Analyzing the dynamics of research by extracting key aspects of scientific papers. In: Proceedings of 5th International Joint Conference on Natural Language Processing, pp. 1–9 (2011)

    Google Scholar 

  13. Tateisi, Y., Ohta, T., Pyysalo, S., Miyao, Y., Aizawa, A.: Typed entity and relation annotation on computer science papers. In: LREC (2016)

    Google Scholar 

  14. Augenstein, I., Das, M., Riedel, S., Vikraman, L., McCallum, A.: SemEval 2017 task 10: ScienceIE - extracting keyphrases and relations from Scientific publications, arXiv:170402853 Cs Stat, April 2017

  15. Stenetorp, P., Pyysalo, S., Topić, G., Ohta, T., Ananiadou, S., Tsujii, J.: BRAT: a web-based tool for NLP-assisted text annotation. In: Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics, pp. 102–107 (2012)

    Google Scholar 

  16. Mitre, C.: Callisto - Home Page (2013). https://mitre.github.io/callisto/index.html. Accessed 7 July 2018

  17. Open Calais: Open Calais (2008). http://www.opencalais.com/. Accessed 6 Sept 2017

  18. Beel, J., Langer, S., Genzmehr, M., Nürnberger, A.: Introducing Docear’s research paper recommender system. In: Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries, New York, NY, USA, pp. 459–460 (2013)

    Google Scholar 

  19. Councill, I., Giles, C.L., Kan, M.-Y.: ParsCit: an open-source CRF reference string parsing package. In: Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008), Marrakech, Morocco (2008)

    Google Scholar 

  20. Lopez, P.: GROBID: combining automatic bibliographic data recognition and term extraction for scholarship publications. In: Agosti, M., Borbinha, J., Kapidakis, S., Papatheodorou, C., Tsakonas, G. (eds.) ECDL 2009. LNCS, vol. 5714, pp. 473–474. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04346-8_62

    Google Scholar 

  21. Tkaczyk, D., Szostek, P., Fedoryszak, M., Dendek, P., Bolikowski, Ł.: CERMINE: automatic extraction of structured metadata from scientific literature. Int. J. Doc. Anal. Recogn. IJDAR 18(4), 317–335 (2015)

    Google Scholar 

  22. Tkaczyk, D., Collins, A., Sheridan, P., Beel, J.: Machine learning vs. rules and out-of-the-box vs. retrained: an evaluation of open-source bibliographic reference and citation parsers. In: Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries, pp. 99–108 (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zara Nasar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Nasar, Z., Jaffry, S.W., Malik, M.K. (2019). Tagging Assistant for Scientific Articles. In: Bajwa, I., Kamareddine, F., Costa, A. (eds) Intelligent Technologies and Applications. INTAP 2018. Communications in Computer and Information Science, vol 932. Springer, Singapore. https://doi.org/10.1007/978-981-13-6052-7_30

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-6052-7_30

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-6051-0

  • Online ISBN: 978-981-13-6052-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics