Skip to main content

A Text Annotation Tool with Pre-annotation Based on Deep Learning

  • Conference paper
  • First Online:
Knowledge Science, Engineering and Management (KSEM 2019)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11775))

Abstract

In this paper, we introduce an open-source tool, YEDDA, supported by a pre-annotation module based deep learning. EPAD proposes a novel annotation workflow, combining pre-annotation and manual annotation, which improves the efficiency and quality of annotation. The pre-annotation module can effectively reduce the annotation time, and meanwhile improve the precision and recall of annotation. EPAD also contains some of the mechanisms to facilitate the usage of the pre-annotation module. As a collaborative design, EPAD provides administrators with annotation statistics and analysis functions. Experiments showed that EPAD shortened almost 60.0\(\%\) of the total annotation time, and improved 12.7\(\%\) of F-measure for annotation quality.

Supported by Sichuan Science and Technology Program (No. 2017SZYZF0002).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/cloudXia777/EPAD.

  2. 2.

    https://docs.python.org/3.6/library/tkinter.html.

  3. 3.

    https://keras.io/.

  4. 4.

    On average, there are 24 sentences per document, 100 characters per sentence and 5 entities per sentence.

  5. 5.

    https://biendata.com/competition/CCKS2018_1/.

References

  1. Marcińczuk, M., Oleksy, M., Kocoń, J.: Inforex-a collaborative system for text corpora annotation and analysis. In: Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP. INCOMA Shoumen, pp. 473–482 (2017)

    Google Scholar 

  2. Yang, J., Zhang, Y., Li, L., Li, X.: YEDDA: a lightweight collaborative text span annotation tool, arXiv preprint arXiv:1711.03759 (2017)

  3. Stenetorp, P., Pyysalo, S., Topić, G., Ohta, T., Ananiadou, S., Tsujii, J.: BRAT: a web-based tool for NLP-assisted text annotation. In: Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics, pp. 102–107. Association for Computational Linguistics (2012)

    Google Scholar 

  4. Yu, X., Lam, W., Chan, S.-K., Wu, Y.K., Chen, B.: Chinese NER using CRFs and logic for the fourth SIGHAN bakeoff. In: Proceedings of the Sixth SIGHAN Workshop on Chinese Language Processing (2008)

    Google Scholar 

  5. Grishman, R., Sundheim, B.: Message understanding conference-6: a brief history. In: COLING 1996 Volume 1: The 16th International Conference on Computational Linguistics, vol. 1 (1996)

    Google Scholar 

  6. Chen, W.-T., Styler, W.: Anafora: a web-based general purpose annotation tool. In: Proceedings of the Conference. Association for Computational Linguistics. North American Chapter. Meeting, vol. 2013, p. 14. NIH Public Access (2013)

    Google Scholar 

  7. Bontcheva, K., et al.: Gate teamware: a web-based, collaborative text annotation framework. Lang. Res. Eval. 47(4), 1007–1029 (2013)

    Article  MathSciNet  Google Scholar 

  8. Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: GATE: an architecture for development of robust HLT applications. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 168–175. Association for Computational Linguistics (2002)

    Google Scholar 

  9. Hripcsak, G., Rothschild, A.S.: Agreement, the f-measure, and reliability in information retrieval. J. Am. Med. Inform. Assoc. 12(3), 296–298 (2005)

    Article  Google Scholar 

  10. Morton, T., LaCivita, J.: WordFreak: an open tool for linguistic annotation. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: Demonstrations-Volume 4, pp. 17–18. Association for Computational Linguistics (2003)

    Google Scholar 

  11. Ogren, P.V.: Knowtator: a protégé plug-in for annotated corpus construction. In: Proceedings of the 2006 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: Companion Volume: Demonstrations, pp. 273–275. Association for Computational Linguistics (2006)

    Google Scholar 

  12. Noy, N.F., et al.: Protégé-2000: an open-source ontology-development and knowledge-acquisition environment. In: AMIA... Annual Symposium Proceedings. AMIA Symposium, vol. 2003, p. 953. American Medical Informatics Association (2003)

    Google Scholar 

  13. Alonso, H.M., Johannsen, A., Plank, B.: Supersense tagging with inter-annotator disagreement. In: Linguistic Annotation Workshop 2016, pp. 43–48 (2016)

    Google Scholar 

  14. Saito, K., Nagata, M.: Multi-language named-entity recognition system based on HMM. In: Proceedings of the ACL 2003 Workshop on Multilingual and Mixed-Language Named Entity Recognition-Volume 15, pp. 41–48. Association for Computational Linguistics (2003)

    Google Scholar 

  15. Huang, Z., Xu, W., Yu, K.: Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint arXiv:1508.01991 (2015)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fei Teng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Teng, F., Ma, M., Ma, Z., Huang, L., Xiao, M., Li, X. (2019). A Text Annotation Tool with Pre-annotation Based on Deep Learning. In: Douligeris, C., Karagiannis, D., Apostolou, D. (eds) Knowledge Science, Engineering and Management. KSEM 2019. Lecture Notes in Computer Science(), vol 11775. Springer, Cham. https://doi.org/10.1007/978-3-030-29551-6_39

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-29551-6_39

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-29550-9

  • Online ISBN: 978-3-030-29551-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics