A Text Annotation Tool with Pre-annotation Based on Deep Learning

Teng, Fei; Ma, Minbo; Ma, Zheng; Huang, Lufei; Xiao, Ming; Li, Xuan

doi:10.1007/978-3-030-29551-6_39

Fei Teng ORCID: orcid.org/0000-0001-9535-7245¹¹,
Minbo Ma¹¹,
Zheng Ma¹¹,
Lufei Huang¹²,
Ming Xiao¹³ &
…
Xuan Li¹²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11775))

Included in the following conference series:

International Conference on Knowledge Science, Engineering and Management

2699 Accesses
2 Citations

Abstract

In this paper, we introduce an open-source tool, YEDDA, supported by a pre-annotation module based deep learning. EPAD proposes a novel annotation workflow, combining pre-annotation and manual annotation, which improves the efficiency and quality of annotation. The pre-annotation module can effectively reduce the annotation time, and meanwhile improve the precision and recall of annotation. EPAD also contains some of the mechanisms to facilitate the usage of the pre-annotation module. As a collaborative design, EPAD provides administrators with annotation statistics and analysis functions. Experiments showed that EPAD shortened almost 60.0\(\%\) of the total annotation time, and improved 12.7\(\%\) of F-measure for annotation quality.

Supported by Sichuan Science and Technology Program (No. 2017SZYZF0002).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://github.com/cloudXia777/EPAD.
2.
https://docs.python.org/3.6/library/tkinter.html.
3.
https://keras.io/.
4.
On average, there are 24 sentences per document, 100 characters per sentence and 5 entities per sentence.
5.
https://biendata.com/competition/CCKS2018_1/.

References

Marcińczuk, M., Oleksy, M., Kocoń, J.: Inforex-a collaborative system for text corpora annotation and analysis. In: Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP. INCOMA Shoumen, pp. 473–482 (2017)
Google Scholar
Yang, J., Zhang, Y., Li, L., Li, X.: YEDDA: a lightweight collaborative text span annotation tool, arXiv preprint arXiv:1711.03759 (2017)
Stenetorp, P., Pyysalo, S., Topić, G., Ohta, T., Ananiadou, S., Tsujii, J.: BRAT: a web-based tool for NLP-assisted text annotation. In: Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics, pp. 102–107. Association for Computational Linguistics (2012)
Google Scholar
Yu, X., Lam, W., Chan, S.-K., Wu, Y.K., Chen, B.: Chinese NER using CRFs and logic for the fourth SIGHAN bakeoff. In: Proceedings of the Sixth SIGHAN Workshop on Chinese Language Processing (2008)
Google Scholar
Grishman, R., Sundheim, B.: Message understanding conference-6: a brief history. In: COLING 1996 Volume 1: The 16th International Conference on Computational Linguistics, vol. 1 (1996)
Google Scholar
Chen, W.-T., Styler, W.: Anafora: a web-based general purpose annotation tool. In: Proceedings of the Conference. Association for Computational Linguistics. North American Chapter. Meeting, vol. 2013, p. 14. NIH Public Access (2013)
Google Scholar
Bontcheva, K., et al.: Gate teamware: a web-based, collaborative text annotation framework. Lang. Res. Eval. 47(4), 1007–1029 (2013)
Article MathSciNet Google Scholar
Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: GATE: an architecture for development of robust HLT applications. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 168–175. Association for Computational Linguistics (2002)
Google Scholar
Hripcsak, G., Rothschild, A.S.: Agreement, the f-measure, and reliability in information retrieval. J. Am. Med. Inform. Assoc. 12(3), 296–298 (2005)
Article Google Scholar
Morton, T., LaCivita, J.: WordFreak: an open tool for linguistic annotation. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: Demonstrations-Volume 4, pp. 17–18. Association for Computational Linguistics (2003)
Google Scholar
Ogren, P.V.: Knowtator: a protégé plug-in for annotated corpus construction. In: Proceedings of the 2006 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: Companion Volume: Demonstrations, pp. 273–275. Association for Computational Linguistics (2006)
Google Scholar
Noy, N.F., et al.: Protégé-2000: an open-source ontology-development and knowledge-acquisition environment. In: AMIA... Annual Symposium Proceedings. AMIA Symposium, vol. 2003, p. 953. American Medical Informatics Association (2003)
Google Scholar
Alonso, H.M., Johannsen, A., Plank, B.: Supersense tagging with inter-annotator disagreement. In: Linguistic Annotation Workshop 2016, pp. 43–48 (2016)
Google Scholar
Saito, K., Nagata, M.: Multi-language named-entity recognition system based on HMM. In: Proceedings of the ACL 2003 Workshop on Multilingual and Mixed-Language Named Entity Recognition-Volume 15, pp. 41–48. Association for Computational Linguistics (2003)
Google Scholar
Huang, Z., Xu, W., Yu, K.: Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint arXiv:1508.01991 (2015)

Download references

Author information

Authors and Affiliations

School of Information Science and Technology, Southwest Jiaotong University, Chengdu, China
Fei Teng, Minbo Ma & Zheng Ma
The Third People’s Hospital of Chengdu, Chengdu, China
Lufei Huang & Xuan Li
School of Electrical Engineering, KTH Royal Institute of Technology, Stockholm, Sweden
Ming Xiao

Authors

Fei Teng
View author publications
You can also search for this author in PubMed Google Scholar
Minbo Ma
View author publications
You can also search for this author in PubMed Google Scholar
Zheng Ma
View author publications
You can also search for this author in PubMed Google Scholar
Lufei Huang
View author publications
You can also search for this author in PubMed Google Scholar
Ming Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Xuan Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fei Teng .

Editor information

Editors and Affiliations

University of Piraeus, Piraeus, Greece
Christos Douligeris
University of Vienna, Vienna, Austria
Dimitris Karagiannis
University of Piraeus, Piraeus, Greece
Dimitris Apostolou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Teng, F., Ma, M., Ma, Z., Huang, L., Xiao, M., Li, X. (2019). A Text Annotation Tool with Pre-annotation Based on Deep Learning. In: Douligeris, C., Karagiannis, D., Apostolou, D. (eds) Knowledge Science, Engineering and Management. KSEM 2019. Lecture Notes in Computer Science(), vol 11775. Springer, Cham. https://doi.org/10.1007/978-3-030-29551-6_39

Download citation

DOI: https://doi.org/10.1007/978-3-030-29551-6_39
Published: 21 August 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-29550-9
Online ISBN: 978-3-030-29551-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics