Abstract
Treebank is a crucial source of information for NLP and linguistic researches. In this paper, we describe the process of building a Uyghur dependency treebank, including designing principles, annotation schemas and tools for corpus creation. The Uyghur Treebank is built from a public readings corpora, employed multi-tier representation for extending future use, and created about 23 dependency relations. This paper presents the preliminary results of this project and an overview of the new idea about combining this project with Language Grid.
The research was supported by the National Natural Science Foundation of China (Grant No. 61262061) and Science & Technology Foundation of Xinjiang (Grant No. 201423120).
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Hajič, J., Pajas, P.: The Prague dependency treebank: annotation structure and support. In: Proceedings of the IRCS Workshop on Linguistic Databases, University of Pennsylvania, USA, pp. 105–114 (2001)
Čmejrek, M., Cuřín, J., Havelka, J.: Prague Czech-English dependecy treebank: any hopes for a common annotation scheme?. In: HLT/NAACL Workshop: Frontiers in Corpus Annotation, Boston, Massachusetts, pp. 47–54 (2004)
Čmejrek, M., Hajič, J., Kubo, V.: Prague Czech-English dependency treebank syntactically annotated resources for machine translation. In: Proceedings of EAMT 10th Annual Conference, pp. 1597–1600 (2004)
Hajič, J., Zemánek, P.: Prague Arabic dependency treebank: development in data and tools. In: Proceedings of the NEMLAR International Conference on Arabic Language Resources and Tools, pp. 110–114 (2004)
Nivre, J.: Theory-supporting treebanks. In: Nivre, J., Hinrichs, E. (Eds.) Proceedings of the Second Workshop on Treebanks and Linguistic Theories, Växjö University Press, pp. 117–128 (2003)
Prokopidis, S., Desipri, P., Koutsombogera, E., Papageorgiou, M., Piperidis, H.: Theoretical and practical issues in the construction of a Greek dependency corpus. In: Proceedings of the 4th Workshop on Treebanks and Linguistic Theories, Barcelona, pp. 149–160 (2005)
Boguslavsky, I., Grigorieva, S.: Dependency treebank for Russian: concept, tools, types of information. In: Proceedings of 18th International Conference on Computational Linguistics, pp. 987–991 (2000)
Džeroski, A., Erjavec, S., Ledinek, T., Pajas, N., Žabokrtský, P., Žele, Z.: Towards a Slovene dependency treebank. In: Proceedings of 5th International Conference on Language Resources and Evaluation (2006)
Lepage, I., Shin-Ichi, Y., Susumu, A., Hitoshi, A.: An annotated corpus in Japanese using Tesnière’s structural syntax. In: Proceedings of COLING-ACL 1998 Workshop on the Processing of Dependency-Based Grammars, Montreal (1998)
Liu, H.: Building and using a Chinese dependency treebank. GrKG/Humankybernetik 48(1), 3–14 (2007)
Bamman, D., Crane, G.: The design and use of a Latin dependency treebank. In: Proceedings of the TLT, pp. 67–78 (2006)
Buchholz, S., Marsi, E.: CoNLL-X shared task on multilingual dependency parsing. In: Proceedings of CONLL-X, New York, pp. 149–164 (2006)
Marcus, M.P., Santorini, B., Marcinkiewicz, M.A.: Building a large annotated corpus of English: The Penn Treebank. Comput. Linguist. 19, 313–330 (1993)
Eryigit, G., Nivre, J., Oflazer, K.: Dependency parsing of Turkish. Comput. Linguist. 34(3), 357–389 (2008)
Goldberg, Y., Tsarfaty, R.: A single generative model for joint morphological segmentation and syntactic parsing. In: Proceedings of ACLHLT Colombus, Ohio, USA (2008)
Aili, M., Wenbin, J., Zhiyang, W., Yibulayin, T., Qun, L.: Directed graph model of Uyghur morphological analysis. J. Softw. 23(12), 3115–3129 (2012)
Oflazeri, K., Hakkani-Tiur, D., Tiur, G.: Design for a Turkish treebank. In: Linguistically Interpreted Corpora: EACL Post-Conference Workshop, pp. 1–9 (1999)
Abeillé, A., Clément, L., Toussenel, F.: Building a treebank for French. In: Abeillé, A. (ed.) Treebanks: Building and Using Parsed Corpora. Treebanks Text, Speech Language Technology, pp. 165–187. Springer, Netherlands (2003)
Oflazer, K.: Building a turkish treebank. In: Abeillé, A. (ed.) Treebanks: Building and Using Parsed Corpora, pp. 261–277. Springer, Netherlands (2003)
Mel’cuk, I.A.: Levels of dependency in linguistic description: concepts and problems. In: Dependency and Valency. An International Handbook of Contemporary Research, Vol. 1, pp. 188–229, Berlin-New York (2003)
Atalay, N.B., Oflazer, K., Say, B.: The annotation process in the Turkish treebank. In: Proceedings of the 4th International Workshop on Linguistically Interpreteted Corpora, pp. 33–38 (2003)
Kakkonen,T.: DepAnn - an annotation tool for dependency treebanks. In: Proceedings of the 11th ESSLLI Student Session, pp. 214–225 (2006)
Eryiğit, G.: ITU treebank annotation tool. In: Proceedings of the Linguistic Annotation Workshop, pp. 117–120 (2007)
Mamitimin, S., Ibrahim, T., Eli, M.: The annotation scheme for Uyghur dependency treebank. In: Proceedings of International Conference on Asian Language Processing, pp. 185–188 (2013)
Ishida, T.: The Language Grid: Service-Oriented Collective Intelligence for Language Resource Interoperability. Springer, Heidelberg (2011). ISBN 978-3-642-21177-5
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Aili, M., Xialifu, A., Maihefureti, Maimaitimin, S. (2016). Building Uyghur Dependency Treebank: Design Principles, Annotation Schema and Tools. In: Murakami, Y., Lin, D. (eds) Worldwide Language Service Infrastructure. WLSI 2015. Lecture Notes in Computer Science(), vol 9442. Springer, Cham. https://doi.org/10.1007/978-3-319-31468-6_9
Download citation
DOI: https://doi.org/10.1007/978-3-319-31468-6_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-31467-9
Online ISBN: 978-3-319-31468-6
eBook Packages: Computer ScienceComputer Science (R0)