Skip to main content

Constructing a Turkish Constituency Parse TreeBank

  • Conference paper
  • First Online:
Information Sciences and Systems 2015

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 363))

Abstract

In this paper, we describe our initial efforts for creating a Turkish constituency parse treebank by utilizing the English Penn Treebank. We employ a semi-automated approach for annotation. In our previous work [18], the English parse trees were manually translated to Turkish. In this paper, the words are semi-automatically annotated morphologically. As a second step, a rule-based approach is used for refining the parse trees based on the morphological analyses of the words. We generated Turkish phrase structure trees for 5143 sentences from Penn Treebank that contain fewer than 15 tokens. The annotated corpus can be used in statistical natural language processing studies for developing tools such as constituency parsers and statistical machine translation systems for Turkish.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abeillé, A., Clément, L., Toussenel, F.: Building a treebank for French. In: Treebanks, pp. 165–187. Springer (2003)

    Google Scholar 

  2. Atalay, N.B., Oflazer, K., Say, B.: The annotation process in the Turkish treebank. In: 4th International Workshop on Linguistically Interpreted Corpora (2003)

    Google Scholar 

  3. Brants, S., Dipper, S., Hansen, S., Lezius, W., Smith, G.: The tiger treebank. In: Proceedings of the workshop on treebanks and linguistic theories, vol. 168 (2002)

    Google Scholar 

  4. Cakici, R.: Automatic induction of a ccg grammar for Turkish. In: ACL Student Research Workshop (2005)

    Google Scholar 

  5. Cetinoglu, O., Oflazer, K.: Morphology-syntax interface for Turkish lfg. In: Computational Linguistics and Annual Meeting of the Association (2006)

    Google Scholar 

  6. Cetinoglu, O., Oflazer, K.: Integrating derivational morphology into syntax. In: Recent Advances in Natural Language Processing V (2009)

    Google Scholar 

  7. Csendes, D., Csirik, J., Gyimóthy, T., Kocsor, A.: The szeged treebank. In: Text, Speech and Dialogue, pp. 123–131. Springer (2005)

    Google Scholar 

  8. Eryigit, G., Nivre, J., Oflazer, K.: Dependency parsing of Turkish. Comput. Linguist. (2008)

    Google Scholar 

  9. Eryigit, G., Oflazer, K.: Statistical dependency parsing for Turkish. In: 11th Conference of the European Chapter of the Association for Computational Linguistics (2006)

    Google Scholar 

  10. Haverinen, K., Nyblom, J., Viljanen, T., Laippala, V., Kohonen, S., Missilä, A., Ojala, S., Salakoski, T., Ginter, F.: Building the essential resources for Finnish: The Turku dependency treebank. Lang. Resour. Eval. 1–39 (2013)

    Google Scholar 

  11. Kornfilt, J.: Turkish. Routledge (1997)

    Google Scholar 

  12. Maamouri, M., Bies, A., Buckwalter, T., Mekki, W.: The penn Arabic treebank: Building a large-scale annotated Arabic corpus. In: NEMLAR Conference on Arabic Language Resources and Tools, pp. 102–109 (2004)

    Google Scholar 

  13. Marcus, M.P., Marcinkiewicz, M.A., Santorini, B.: Building a large annotated corpus of english: The penn treebank. Comput. Linguist. 19(2), 313–330 (1993)

    Google Scholar 

  14. Oflazer, K.: Two-level description of Turkish morphology. Literary Linguist. Comput. 9(2), 137–148 (1994)

    Article  Google Scholar 

  15. Riedel, S., Cakici, R., Meza-Ruiz, I.: Multi-lingual dependency parsing with incremental integer linear programming (2006)

    Google Scholar 

  16. Ruket, C., Baldridge, J.: Projective and non-projective Turkish parsing. In: Fifth International Workshop on Treebanks and Linguistic Theories (2006)

    Google Scholar 

  17. Xue, N., Xia, F., Chiou, F.D., Palmer, M.: The penn Chinese treebank: Phrase structure annotation of a large corpus. Nat. Lang. Eng. 11(2), 207–238 (2005)

    Article  Google Scholar 

  18. Yıldız, O.T., Solak, E., Görgün, O., Ehsani, R.: Constructing a Turkish-English parallel treebank. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp. 112–117. Association for Computational Linguistics, Baltimore, Maryland (2014)

    Google Scholar 

  19. Yuret, D.: Dependency parsing as a classification problem. In: Proceedings of the Tenth Conference on Computational Natural Language Learning (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Olcay Taner Yıldız .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Yıldız, O.T., Solak, E., Çandır, Ş., Ehsani, R., Görgün, O. (2016). Constructing a Turkish Constituency Parse TreeBank. In: Abdelrahman, O., Gelenbe, E., Gorbil, G., Lent, R. (eds) Information Sciences and Systems 2015. Lecture Notes in Electrical Engineering, vol 363. Springer, Cham. https://doi.org/10.1007/978-3-319-22635-4_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-22635-4_31

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-22634-7

  • Online ISBN: 978-3-319-22635-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics