A segment-based annotation tool for Korean treebanks with minimal human intervention

Park, So-Young; Song, Young-In; Rim, Hae-Chang

doi:10.1007/s10579-007-9029-5

A segment-based annotation tool for Korean treebanks with minimal human intervention

Published: 17 July 2007

Volume 40, pages 281–289, (2006)
Cite this article

Language Resources and Evaluation Aims and scope Submit manuscript

So-Young Park¹,
Young-In Song² &
Hae-Chang Rim²

91 Accesses
Explore all metrics

Abstract

In this paper, we propose a segment-based annotation tool providing appropriate interactivity between a human annotator and an automatic parser. The proposed annotation tool provides the preview of a complete sentence structure suggested by the parser, and updates the preview whenever the annotator cancels or selects each segmentation point. Thus, the annotator can select the proper sentence segments maximizing parsing accuracy and minimizing human intervention. Experimental results show that the proposed tool allows the annotator to be able to reduce human intervention by approximately 39% compared with manual annotation. Sejong Korean treebank, one of the large scale treebanks, was constructed with the proposed annotation tool.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Annotation Schema for Contemporary Chinese Based on JinXi Li’s Grammar System

The Construction of Interactive Environment for Sentence Pattern Structure Based Treebank Annotation

Ensuring annotation consistency and accuracy for Vietnamese treebank

Article 22 July 2017

Notes

The segmentation model, \(\mathop{}\limits_{{s_{1n}}}^{argmax} \prod^{n}_{i=0}P(s_{i}|t_{i},t_{i+1}),\) performs at 81.31% precision and 64.62% recall where the precision indicates the ratio of correct candidate segmentation points ‘)(’ from candidate segmentation points ‘)(’ generated by the parsing model while the recall indicates the ratio of correct candidate segmentation points ‘)(’ from correct segmentation points ‘)(’ in the test set of the treebank.

References

Bohmova, A., Hajic, J., Hajicova, E., & Hladka, B. (2001). The Prague dependency treebank: Three-level annotation scenario. In A. Abeille (Ed.), Treebanks: Building and using syntactically annotated corpora. Dordrecht, The Netherlands: Kluwer Academic Publishers.
Google Scholar
Choi, K.-S. (2001). KAIST language resources ver. 2001. The Result of Core Software Project from Ministry of Science and Technology, http://kibs.kaist.ac.kr. (written in Korean)
Doi, S., Muraki, K., Kamei, S., & Yamabana, K. (1993). Long sentence analysis by domain-specific pattern grammar. In Proceedings of the 6th conference on the European chapter of the association of computational linguistics, p. 466.
Goodman, J. (1996). Parsing algorithms and metrics. In Proceedings of the annual meeting of the association for computational linguistics, pp. 177–183.
Hindle, D. (1989). Acquiring disambiguation rules from text. In Proceedings of the annual meeting of the association for computational linguistics, pp. 118–125.
Kakkonen, T. (2005). Dependency treebanks: Methods, annotation schemes and tools. In Proceedings of the 15th Nordic conference of computational linguistics, pp. 94–104.
Kim, S., Zhang, B., & Kim, Y. (2000). Reducing parsing complexity by intra-sentence segmentation based on maximum entropy model. In Proceedings of the joint SIGDAT conference on empirical methods in natural language processing and very large corpora, pp. 164–171.
Kim, U.-S., & Kang, B.-M. (2002). Principles, methods and some problems in compiling a Korean treebank. In Proceedings of Hangul and Korean information processing conference 1997, pp. 155–162.
Li, W.-C., Pei, T., Lee, B.-H., & Chiou, C.-F. (1990). Parsing long English sentences with pattern rules. In Proceedings of the 13th international conference on computational linguistics, pp. 410–412.
Lim, J.-H., Park, S.-Y., Kwak, Y.-J., & Rim, H.-C. (2004). A semi-automatic tree annotating workbench for building a Korean treebank. Lecture Note in Computer Science, 2945, 253–257.
Article Google Scholar
Mitchell, P. M., Santorini, B., & Marcinkiewicz, M. A. (1993). Building a large annotated corpus of English: The Penn treebank. Computational Linguistics, 19(2), 313–330.
Google Scholar
Park, S.-Y., Kwak, Y.-J., Lim, J.-H., & Rim, H.-C. (2004). A probabilistic feature-based parsing model for head-final languages. IEICE Transaction on Information & System, E87-D(12), 2286–2289.
Google Scholar
Plaehen, O., & Brants, T. (2000). Annotate—an efficient interactive annotation tool. In Proceedings of the 6th applied natural language processing conference, pp. 214–225.
Rambow, O., Creswell, C., Szekely, R., Taber, H., & Walker, M. (2002). A dependency treebank for English. In Proceedings of the 3rd international conference on language resources and evaluation, Vol. 3, pp. 857–863.

Download references

Acknowledgments

This work was supported partly by grant R01-2006-000-11162-0 from the Korea Science & Engineering Foundation’s Basic Research Program and partly by the second stage of the BK-21 project.

Author information

Authors and Affiliations

College of Computer Software & Media Technology, SangMyung University, 7 Hongji-dong, Jongno-ku, Seoul, 110-743, Korea
So-Young Park
Department of Computer Science & Engineering, Korea University, 5-ka 1, Anam-dong, Seongbuk-ku, Seoul, 136-701, Korea
Young-In Song & Hae-Chang Rim

Authors

So-Young Park
View author publications
You can also search for this author in PubMed Google Scholar
Young-In Song
View author publications
You can also search for this author in PubMed Google Scholar
Hae-Chang Rim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hae-Chang Rim.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Park, SY., Song, YI. & Rim, HC. A segment-based annotation tool for Korean treebanks with minimal human intervention. Lang Resources & Evaluation 40, 281–289 (2006). https://doi.org/10.1007/s10579-007-9029-5

Download citation

Received: 24 August 2006
Accepted: 02 June 2007
Published: 17 July 2007
Issue Date: December 2006
DOI: https://doi.org/10.1007/s10579-007-9029-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A segment-based annotation tool for Korean treebanks with minimal human intervention

Abstract

Access this article

Similar content being viewed by others

Annotation Schema for Contemporary Chinese Based on JinXi Li’s Grammar System

The Construction of Interactive Environment for Sentence Pattern Structure Based Treebank Annotation

Ensuring annotation consistency and accuracy for Vietnamese treebank

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A segment-based annotation tool for Korean treebanks with minimal human intervention

Abstract

Access this article

Similar content being viewed by others

Annotation Schema for Contemporary Chinese Based on JinXi Li’s Grammar System

The Construction of Interactive Environment for Sentence Pattern Structure Based Treebank Annotation

Ensuring annotation consistency and accuracy for Vietnamese treebank

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation