Skip to main content
Log in

A segment-based annotation tool for Korean treebanks with minimal human intervention

  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

In this paper, we propose a segment-based annotation tool providing appropriate interactivity between a human annotator and an automatic parser. The proposed annotation tool provides the preview of a complete sentence structure suggested by the parser, and updates the preview whenever the annotator cancels or selects each segmentation point. Thus, the annotator can select the proper sentence segments maximizing parsing accuracy and minimizing human intervention. Experimental results show that the proposed tool allows the annotator to be able to reduce human intervention by approximately 39% compared with manual annotation. Sejong Korean treebank, one of the large scale treebanks, was constructed with the proposed annotation tool.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. The segmentation model, \(\mathop{}\limits_{{s_{1n}}}^{argmax} \prod^{n}_{i=0}P(s_{i}|t_{i},t_{i+1}),\) performs at 81.31% precision and 64.62% recall where the precision indicates the ratio of correct candidate segmentation points ‘)(’ from candidate segmentation points ‘)(’ generated by the parsing model while the recall indicates the ratio of correct candidate segmentation points ‘)(’ from correct segmentation points ‘)(’ in the test set of the treebank.

References

  • Bohmova, A., Hajic, J., Hajicova, E., & Hladka, B. (2001). The Prague dependency treebank: Three-level annotation scenario. In A. Abeille (Ed.), Treebanks: Building and using syntactically annotated corpora. Dordrecht, The Netherlands: Kluwer Academic Publishers.

    Google Scholar 

  • Choi, K.-S. (2001). KAIST language resources ver. 2001. The Result of Core Software Project from Ministry of Science and Technology, http://kibs.kaist.ac.kr. (written in Korean)

  • Doi, S., Muraki, K., Kamei, S., & Yamabana, K. (1993). Long sentence analysis by domain-specific pattern grammar. In Proceedings of the 6th conference on the European chapter of the association of computational linguistics, p. 466.

  • Goodman, J. (1996). Parsing algorithms and metrics. In Proceedings of the annual meeting of the association for computational linguistics, pp. 177–183.

  • Hindle, D. (1989). Acquiring disambiguation rules from text. In Proceedings of the annual meeting of the association for computational linguistics, pp. 118–125.

  • Kakkonen, T. (2005). Dependency treebanks: Methods, annotation schemes and tools. In Proceedings of the 15th Nordic conference of computational linguistics, pp. 94–104.

  • Kim, S., Zhang, B., & Kim, Y. (2000). Reducing parsing complexity by intra-sentence segmentation based on maximum entropy model. In Proceedings of the joint SIGDAT conference on empirical methods in natural language processing and very large corpora, pp. 164–171.

  • Kim, U.-S., & Kang, B.-M. (2002). Principles, methods and some problems in compiling a Korean treebank. In Proceedings of Hangul and Korean information processing conference 1997, pp. 155–162.

  • Li, W.-C., Pei, T., Lee, B.-H., & Chiou, C.-F. (1990). Parsing long English sentences with pattern rules. In Proceedings of the 13th international conference on computational linguistics, pp. 410–412.

  • Lim, J.-H., Park, S.-Y., Kwak, Y.-J., & Rim, H.-C. (2004). A semi-automatic tree annotating workbench for building a Korean treebank. Lecture Note in Computer Science, 2945, 253–257.

    Article  Google Scholar 

  • Mitchell, P. M., Santorini, B., & Marcinkiewicz, M. A. (1993). Building a large annotated corpus of English: The Penn treebank. Computational Linguistics, 19(2), 313–330.

    Google Scholar 

  • Park, S.-Y., Kwak, Y.-J., Lim, J.-H., & Rim, H.-C. (2004). A probabilistic feature-based parsing model for head-final languages. IEICE Transaction on Information & System, E87-D(12), 2286–2289.

    Google Scholar 

  • Plaehen, O., & Brants, T. (2000). Annotate—an efficient interactive annotation tool. In Proceedings of the 6th applied natural language processing conference, pp. 214–225.

  • Rambow, O., Creswell, C., Szekely, R., Taber, H., & Walker, M. (2002). A dependency treebank for English. In Proceedings of the 3rd international conference on language resources and evaluation, Vol. 3, pp. 857–863.

Download references

Acknowledgments

This work was supported partly by grant R01-2006-000-11162-0 from the Korea Science & Engineering Foundation’s Basic Research Program and partly by the second stage of the BK-21 project.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hae-Chang Rim.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Park, SY., Song, YI. & Rim, HC. A segment-based annotation tool for Korean treebanks with minimal human intervention. Lang Resources & Evaluation 40, 281–289 (2006). https://doi.org/10.1007/s10579-007-9029-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-007-9029-5

Keywords

Navigation