Skip to main content

Building Uyghur Dependency Treebank: Design Principles, Annotation Schema and Tools

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9442))

Abstract

Treebank is a crucial source of information for NLP and linguistic researches. In this paper, we describe the process of building a Uyghur dependency treebank, including designing principles, annotation schemas and tools for corpus creation. The Uyghur Treebank is built from a public readings corpora, employed multi-tier representation for extending future use, and created about 23 dependency relations. This paper presents the preliminary results of this project and an overview of the new idea about combining this project with Language Grid.

The research was supported by the National Natural Science Foundation of China (Grant No. 61262061) and Science & Technology Foundation of Xinjiang (Grant No. 201423120).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Hajič, J., Pajas, P.: The Prague dependency treebank: annotation structure and support. In: Proceedings of the IRCS Workshop on Linguistic Databases, University of Pennsylvania, USA, pp. 105–114 (2001)

    Google Scholar 

  2. Čmejrek, M., Cuřín, J., Havelka, J.: Prague Czech-English dependecy treebank: any hopes for a common annotation scheme?. In: HLT/NAACL Workshop: Frontiers in Corpus Annotation, Boston, Massachusetts, pp. 47–54 (2004)

    Google Scholar 

  3. Čmejrek, M., Hajič, J., Kubo, V.: Prague Czech-English dependency treebank syntactically annotated resources for machine translation. In: Proceedings of EAMT 10th Annual Conference, pp. 1597–1600 (2004)

    Google Scholar 

  4. Hajič, J., Zemánek, P.: Prague Arabic dependency treebank: development in data and tools. In: Proceedings of the NEMLAR International Conference on Arabic Language Resources and Tools, pp. 110–114 (2004)

    Google Scholar 

  5. Nivre, J.: Theory-supporting treebanks. In: Nivre, J., Hinrichs, E. (Eds.) Proceedings of the Second Workshop on Treebanks and Linguistic Theories, Växjö University Press, pp. 117–128 (2003)

    Google Scholar 

  6. Prokopidis, S., Desipri, P., Koutsombogera, E., Papageorgiou, M., Piperidis, H.: Theoretical and practical issues in the construction of a Greek dependency corpus. In: Proceedings of the 4th Workshop on Treebanks and Linguistic Theories, Barcelona, pp. 149–160 (2005)

    Google Scholar 

  7. Boguslavsky, I., Grigorieva, S.: Dependency treebank for Russian: concept, tools, types of information. In: Proceedings of 18th International Conference on Computational Linguistics, pp. 987–991 (2000)

    Google Scholar 

  8. Džeroski, A., Erjavec, S., Ledinek, T., Pajas, N., Žabokrtský, P., Žele, Z.: Towards a Slovene dependency treebank. In: Proceedings of 5th International Conference on Language Resources and Evaluation (2006)

    Google Scholar 

  9. Lepage, I., Shin-Ichi, Y., Susumu, A., Hitoshi, A.: An annotated corpus in Japanese using Tesnière’s structural syntax. In: Proceedings of COLING-ACL 1998 Workshop on the Processing of Dependency-Based Grammars, Montreal (1998)

    Google Scholar 

  10. Liu, H.: Building and using a Chinese dependency treebank. GrKG/Humankybernetik 48(1), 3–14 (2007)

    Google Scholar 

  11. Bamman, D., Crane, G.: The design and use of a Latin dependency treebank. In: Proceedings of the TLT, pp. 67–78 (2006)

    Google Scholar 

  12. Buchholz, S., Marsi, E.: CoNLL-X shared task on multilingual dependency parsing. In: Proceedings of CONLL-X, New York, pp. 149–164 (2006)

    Google Scholar 

  13. Marcus, M.P., Santorini, B., Marcinkiewicz, M.A.: Building a large annotated corpus of English: The Penn Treebank. Comput. Linguist. 19, 313–330 (1993)

    Google Scholar 

  14. Eryigit, G., Nivre, J., Oflazer, K.: Dependency parsing of Turkish. Comput. Linguist. 34(3), 357–389 (2008)

    Article  Google Scholar 

  15. Goldberg, Y., Tsarfaty, R.: A single generative model for joint morphological segmentation and syntactic parsing. In: Proceedings of ACLHLT Colombus, Ohio, USA (2008)

    Google Scholar 

  16. Aili, M., Wenbin, J., Zhiyang, W., Yibulayin, T., Qun, L.: Directed graph model of Uyghur morphological analysis. J. Softw. 23(12), 3115–3129 (2012)

    Google Scholar 

  17. Oflazeri, K., Hakkani-Tiur, D., Tiur, G.: Design for a Turkish treebank. In: Linguistically Interpreted Corpora: EACL Post-Conference Workshop, pp. 1–9 (1999)

    Google Scholar 

  18. Abeillé, A., Clément, L., Toussenel, F.: Building a treebank for French. In: Abeillé, A. (ed.) Treebanks: Building and Using Parsed Corpora. Treebanks Text, Speech Language Technology, pp. 165–187. Springer, Netherlands (2003)

    Chapter  Google Scholar 

  19. Oflazer, K.: Building a turkish treebank. In: Abeillé, A. (ed.) Treebanks: Building and Using Parsed Corpora, pp. 261–277. Springer, Netherlands (2003)

    Chapter  Google Scholar 

  20. Mel’cuk, I.A.: Levels of dependency in linguistic description: concepts and problems. In: Dependency and Valency. An International Handbook of Contemporary Research, Vol. 1, pp. 188–229, Berlin-New York (2003)

    Google Scholar 

  21. Atalay, N.B., Oflazer, K., Say, B.: The annotation process in the Turkish treebank. In: Proceedings of the 4th International Workshop on Linguistically Interpreteted Corpora, pp. 33–38 (2003)

    Google Scholar 

  22. Kakkonen,T.: DepAnn - an annotation tool for dependency treebanks. In: Proceedings of the 11th ESSLLI Student Session, pp. 214–225 (2006)

    Google Scholar 

  23. Eryiğit, G.: ITU treebank annotation tool. In: Proceedings of the Linguistic Annotation Workshop, pp. 117–120 (2007)

    Google Scholar 

  24. Mamitimin, S., Ibrahim, T., Eli, M.: The annotation scheme for Uyghur dependency treebank. In: Proceedings of International Conference on Asian Language Processing, pp. 185–188 (2013)

    Google Scholar 

  25. Ishida, T.: The Language Grid: Service-Oriented Collective Intelligence for Language Resource Interoperability. Springer, Heidelberg (2011). ISBN 978-3-642-21177-5

    Book  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mairehaba Aili .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Aili, M., Xialifu, A., Maihefureti, Maimaitimin, S. (2016). Building Uyghur Dependency Treebank: Design Principles, Annotation Schema and Tools. In: Murakami, Y., Lin, D. (eds) Worldwide Language Service Infrastructure. WLSI 2015. Lecture Notes in Computer Science(), vol 9442. Springer, Cham. https://doi.org/10.1007/978-3-319-31468-6_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-31468-6_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-31467-9

  • Online ISBN: 978-3-319-31468-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics