Abstract
In this paper, we propose a semi-automatic tree annotating workbench for building a Korean treebank. Generally, building a treebank requires an enormous effort by the annotator. In order to improve annotating efficiency, decrease the number of intervention required by the annotator, and help maintain consistent annotation in building a treebank, we have developed a semi-automatic tree annotating workbench consisting of following three stages: syntactic pattern extraction, syntactic pattern selection, and syntactic pattern application. The experiment was carried out with 27,966 tree tagged sentences as a training set and 3,108 sentences as a test set. As a result, the burden of manual annotation can be reduced by about 47% with the best selection of the feature set by using the proposed tree annotating workbench.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Hindle, D.: Acquiring disambiguation rules from text. In: Proceeding of ACL, pp. 118–125 (1989)
Chang, B.-G., Lee, K.J., Kim, G.C.: Design and Implementation of Tree TaggingWorkbench to Build a Large Tree Tagged Corpus of Korean. In: Proc. of the 9th Conference of Hangul and Korean Information Processing, pp. 421–429 (1997)
Lee, K.J., Chang, B.-G., ChangKim, G.: Bracketing Guidelines for Korean Syntactic Tree Tagged Corpus Version 1. Technical Report CS/TR-97-112, KAIST, Dept. of Computer Science (1997)
Marcus, M.P., Santorini, B., Marcinkiewicz, M.A.: Building a large annotated corpus of English: the Penn Treebank. Computational Linguistics 19(2), 313–330 (1993)
Park, S.-Y., Kwak, Y.-J., Chung, H.-J., Hwang, Y.-S., Rim, H.-C.: Learning Syntactic Constraints for Improving the Efficiency of Korean Parsing. Journal of Korean Information Science Society 29(10(B)), 755–765 (2002)
Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (1997)
Kim, U.-s., Kang, B.-m.: Principles, methods and some problems in compiling a Korean treebank. In: Proc. of the 14th Conference of Hangul and Korean Information Processing, pp. 155–162 (2002)
Kwak, Y.-J., Hwang, Y.-S., Chung, H.-J., Park, S.-Y., Rim, H.-C.: FIDELITY: A Framework for Context-Sensitive Grammar Development. In: Proc. of the 2001 International Conference on Computer Processing of Oriental Languages, pp. 305–308 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lim, JH., Park, SY., Kwak, YJ., Rim, HC. (2004). A Semi-automatic Tree Annotating Workbench for Building a Korean Treebank. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2004. Lecture Notes in Computer Science, vol 2945. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24630-5_31
Download citation
DOI: https://doi.org/10.1007/978-3-540-24630-5_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-21006-1
Online ISBN: 978-3-540-24630-5
eBook Packages: Springer Book Archive