Abstract
In this paper, we describe our initial efforts for creating a Turkish constituency parse treebank by utilizing the English Penn Treebank. We employ a semi-automated approach for annotation. In our previous work [18], the English parse trees were manually translated to Turkish. In this paper, the words are semi-automatically annotated morphologically. As a second step, a rule-based approach is used for refining the parse trees based on the morphological analyses of the words. We generated Turkish phrase structure trees for 5143 sentences from Penn Treebank that contain fewer than 15 tokens. The annotated corpus can be used in statistical natural language processing studies for developing tools such as constituency parsers and statistical machine translation systems for Turkish.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abeillé, A., Clément, L., Toussenel, F.: Building a treebank for French. In: Treebanks, pp. 165–187. Springer (2003)
Atalay, N.B., Oflazer, K., Say, B.: The annotation process in the Turkish treebank. In: 4th International Workshop on Linguistically Interpreted Corpora (2003)
Brants, S., Dipper, S., Hansen, S., Lezius, W., Smith, G.: The tiger treebank. In: Proceedings of the workshop on treebanks and linguistic theories, vol. 168 (2002)
Cakici, R.: Automatic induction of a ccg grammar for Turkish. In: ACL Student Research Workshop (2005)
Cetinoglu, O., Oflazer, K.: Morphology-syntax interface for Turkish lfg. In: Computational Linguistics and Annual Meeting of the Association (2006)
Cetinoglu, O., Oflazer, K.: Integrating derivational morphology into syntax. In: Recent Advances in Natural Language Processing V (2009)
Csendes, D., Csirik, J., Gyimóthy, T., Kocsor, A.: The szeged treebank. In: Text, Speech and Dialogue, pp. 123–131. Springer (2005)
Eryigit, G., Nivre, J., Oflazer, K.: Dependency parsing of Turkish. Comput. Linguist. (2008)
Eryigit, G., Oflazer, K.: Statistical dependency parsing for Turkish. In: 11th Conference of the European Chapter of the Association for Computational Linguistics (2006)
Haverinen, K., Nyblom, J., Viljanen, T., Laippala, V., Kohonen, S., Missilä, A., Ojala, S., Salakoski, T., Ginter, F.: Building the essential resources for Finnish: The Turku dependency treebank. Lang. Resour. Eval. 1–39 (2013)
Kornfilt, J.: Turkish. Routledge (1997)
Maamouri, M., Bies, A., Buckwalter, T., Mekki, W.: The penn Arabic treebank: Building a large-scale annotated Arabic corpus. In: NEMLAR Conference on Arabic Language Resources and Tools, pp. 102–109 (2004)
Marcus, M.P., Marcinkiewicz, M.A., Santorini, B.: Building a large annotated corpus of english: The penn treebank. Comput. Linguist. 19(2), 313–330 (1993)
Oflazer, K.: Two-level description of Turkish morphology. Literary Linguist. Comput. 9(2), 137–148 (1994)
Riedel, S., Cakici, R., Meza-Ruiz, I.: Multi-lingual dependency parsing with incremental integer linear programming (2006)
Ruket, C., Baldridge, J.: Projective and non-projective Turkish parsing. In: Fifth International Workshop on Treebanks and Linguistic Theories (2006)
Xue, N., Xia, F., Chiou, F.D., Palmer, M.: The penn Chinese treebank: Phrase structure annotation of a large corpus. Nat. Lang. Eng. 11(2), 207–238 (2005)
Yıldız, O.T., Solak, E., Görgün, O., Ehsani, R.: Constructing a Turkish-English parallel treebank. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp. 112–117. Association for Computational Linguistics, Baltimore, Maryland (2014)
Yuret, D.: Dependency parsing as a classification problem. In: Proceedings of the Tenth Conference on Computational Natural Language Learning (2006)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Yıldız, O.T., Solak, E., Çandır, Ş., Ehsani, R., Görgün, O. (2016). Constructing a Turkish Constituency Parse TreeBank. In: Abdelrahman, O., Gelenbe, E., Gorbil, G., Lent, R. (eds) Information Sciences and Systems 2015. Lecture Notes in Electrical Engineering, vol 363. Springer, Cham. https://doi.org/10.1007/978-3-319-22635-4_31
Download citation
DOI: https://doi.org/10.1007/978-3-319-22635-4_31
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-22634-7
Online ISBN: 978-3-319-22635-4
eBook Packages: EngineeringEngineering (R0)