Building Uyghur Dependency Treebank: Design Principles, Annotation Schema and Tools

Aili, Mairehaba; Xialifu, Aziguli; Maihefureti; Maimaitimin, Saimaiti

doi:10.1007/978-3-319-31468-6_9

Building Uyghur Dependency Treebank: Design Principles, Annotation Schema and Tools

Mairehaba Aili¹⁵,
Aziguli Xialifu¹⁶,
Maihefureti¹⁵ &
…
Saimaiti Maimaitimin¹⁶

Conference paper
First Online: 13 March 2016

421 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9442))

Abstract

Treebank is a crucial source of information for NLP and linguistic researches. In this paper, we describe the process of building a Uyghur dependency treebank, including designing principles, annotation schemas and tools for corpus creation. The Uyghur Treebank is built from a public readings corpora, employed multi-tier representation for extending future use, and created about 23 dependency relations. This paper presents the preliminary results of this project and an overview of the new idea about combining this project with Language Grid.

The research was supported by the National Natural Science Foundation of China (Grant No. 61262061) and Science & Technology Foundation of Xinjiang (Grant No. 201423120).

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Hajič, J., Pajas, P.: The Prague dependency treebank: annotation structure and support. In: Proceedings of the IRCS Workshop on Linguistic Databases, University of Pennsylvania, USA, pp. 105–114 (2001)
Google Scholar
Čmejrek, M., Cuřín, J., Havelka, J.: Prague Czech-English dependecy treebank: any hopes for a common annotation scheme?. In: HLT/NAACL Workshop: Frontiers in Corpus Annotation, Boston, Massachusetts, pp. 47–54 (2004)
Google Scholar
Čmejrek, M., Hajič, J., Kubo, V.: Prague Czech-English dependency treebank syntactically annotated resources for machine translation. In: Proceedings of EAMT 10th Annual Conference, pp. 1597–1600 (2004)
Google Scholar
Hajič, J., Zemánek, P.: Prague Arabic dependency treebank: development in data and tools. In: Proceedings of the NEMLAR International Conference on Arabic Language Resources and Tools, pp. 110–114 (2004)
Google Scholar
Nivre, J.: Theory-supporting treebanks. In: Nivre, J., Hinrichs, E. (Eds.) Proceedings of the Second Workshop on Treebanks and Linguistic Theories, Växjö University Press, pp. 117–128 (2003)
Google Scholar
Prokopidis, S., Desipri, P., Koutsombogera, E., Papageorgiou, M., Piperidis, H.: Theoretical and practical issues in the construction of a Greek dependency corpus. In: Proceedings of the 4th Workshop on Treebanks and Linguistic Theories, Barcelona, pp. 149–160 (2005)
Google Scholar
Boguslavsky, I., Grigorieva, S.: Dependency treebank for Russian: concept, tools, types of information. In: Proceedings of 18th International Conference on Computational Linguistics, pp. 987–991 (2000)
Google Scholar
Džeroski, A., Erjavec, S., Ledinek, T., Pajas, N., Žabokrtský, P., Žele, Z.: Towards a Slovene dependency treebank. In: Proceedings of 5th International Conference on Language Resources and Evaluation (2006)
Google Scholar
Lepage, I., Shin-Ichi, Y., Susumu, A., Hitoshi, A.: An annotated corpus in Japanese using Tesnière’s structural syntax. In: Proceedings of COLING-ACL 1998 Workshop on the Processing of Dependency-Based Grammars, Montreal (1998)
Google Scholar
Liu, H.: Building and using a Chinese dependency treebank. GrKG/Humankybernetik 48(1), 3–14 (2007)
Google Scholar
Bamman, D., Crane, G.: The design and use of a Latin dependency treebank. In: Proceedings of the TLT, pp. 67–78 (2006)
Google Scholar
Buchholz, S., Marsi, E.: CoNLL-X shared task on multilingual dependency parsing. In: Proceedings of CONLL-X, New York, pp. 149–164 (2006)
Google Scholar
Marcus, M.P., Santorini, B., Marcinkiewicz, M.A.: Building a large annotated corpus of English: The Penn Treebank. Comput. Linguist. 19, 313–330 (1993)
Google Scholar
Eryigit, G., Nivre, J., Oflazer, K.: Dependency parsing of Turkish. Comput. Linguist. 34(3), 357–389 (2008)
Article Google Scholar
Goldberg, Y., Tsarfaty, R.: A single generative model for joint morphological segmentation and syntactic parsing. In: Proceedings of ACLHLT Colombus, Ohio, USA (2008)
Google Scholar
Aili, M., Wenbin, J., Zhiyang, W., Yibulayin, T., Qun, L.: Directed graph model of Uyghur morphological analysis. J. Softw. 23(12), 3115–3129 (2012)
Google Scholar
Oflazeri, K., Hakkani-Tiur, D., Tiur, G.: Design for a Turkish treebank. In: Linguistically Interpreted Corpora: EACL Post-Conference Workshop, pp. 1–9 (1999)
Google Scholar
Abeillé, A., Clément, L., Toussenel, F.: Building a treebank for French. In: Abeillé, A. (ed.) Treebanks: Building and Using Parsed Corpora. Treebanks Text, Speech Language Technology, pp. 165–187. Springer, Netherlands (2003)
Chapter Google Scholar
Oflazer, K.: Building a turkish treebank. In: Abeillé, A. (ed.) Treebanks: Building and Using Parsed Corpora, pp. 261–277. Springer, Netherlands (2003)
Chapter Google Scholar
Mel’cuk, I.A.: Levels of dependency in linguistic description: concepts and problems. In: Dependency and Valency. An International Handbook of Contemporary Research, Vol. 1, pp. 188–229, Berlin-New York (2003)
Google Scholar
Atalay, N.B., Oflazer, K., Say, B.: The annotation process in the Turkish treebank. In: Proceedings of the 4th International Workshop on Linguistically Interpreteted Corpora, pp. 33–38 (2003)
Google Scholar
Kakkonen,T.: DepAnn - an annotation tool for dependency treebanks. In: Proceedings of the 11th ESSLLI Student Session, pp. 214–225 (2006)
Google Scholar
Eryiğit, G.: ITU treebank annotation tool. In: Proceedings of the Linguistic Annotation Workshop, pp. 117–120 (2007)
Google Scholar
Mamitimin, S., Ibrahim, T., Eli, M.: The annotation scheme for Uyghur dependency treebank. In: Proceedings of International Conference on Asian Language Processing, pp. 185–188 (2013)
Google Scholar
Ishida, T.: The Language Grid: Service-Oriented Collective Intelligence for Language Resource Interoperability. Springer, Heidelberg (2011). ISBN 978-3-642-21177-5
Book Google Scholar

Download references

Author information

Authors and Affiliations

School of Information Science and Engineering, Xinjiang University, Urumqi, China
Mairehaba Aili & Maihefureti
School of Humanities, Xinjiang University, Urumqi, China
Aziguli Xialifu & Saimaiti Maimaitimin

Authors

Mairehaba Aili
View author publications
You can also search for this author in PubMed Google Scholar
Aziguli Xialifu
View author publications
You can also search for this author in PubMed Google Scholar
Maihefureti
View author publications
You can also search for this author in PubMed Google Scholar
Saimaiti Maimaitimin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mairehaba Aili .

Editor information

Editors and Affiliations

Unit of Design, Kyoto University, Kyoto, Japan
Yohei Murakami
Kyoto University, Kyoto, Japan
Donghui Lin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Aili, M., Xialifu, A., Maihefureti, Maimaitimin, S. (2016). Building Uyghur Dependency Treebank: Design Principles, Annotation Schema and Tools. In: Murakami, Y., Lin, D. (eds) Worldwide Language Service Infrastructure. WLSI 2015. Lecture Notes in Computer Science(), vol 9442. Springer, Cham. https://doi.org/10.1007/978-3-319-31468-6_9

Download citation

DOI: https://doi.org/10.1007/978-3-319-31468-6_9
Published: 13 March 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-31467-9
Online ISBN: 978-3-319-31468-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics