Abstract
The paper is dedicated to the problem of grammatical ambiguity in the Tatar National Corpus and describes the methodology and software used for automation of the disambiguation process. Grammatical ambiguity is widely represented in agglutinative languages like Turkic or Finno-Ugric. Disambiguation in the corpus is based on the context-oriented classification of ambiguity types which has been carried out on corpus data in the Tatar language for the first time. In this study the corpus is used as a source for the research and at the same time as a destination for implementing the results. The grammatical ambiguity types are detected automatically using the finite-state morphological analyzer and then classified. In order to build up the grammatically disambiguated subcorpus, a special software module was developed. It searches for ambiguous tokens in the corpus, collects statistical information and allows creating and implementing the formal context-based disambiguation rules for different ambiguity types.
Keywords
References
«Tugan Tel» Tatar National Corpus Homepage. http://tugantel.tatar/?lang=en. 05 June 2017
Suleymanov, D.S., Nevzorova, O.A., Gatiatullin, A.R., Gilmullin, R.A., Khakimov, B.E.: National corpus of the Tatar language “Tugan Tel”: grammatical annotation and implementation. Procedia Soc. Behav. Sci. 95, 68–74 (2013)
Suleymanov, D.S., Khakimov, B.E., Gilmullin, R.A.: Corpus of Tatar: conception and linguistic aspects (in Russian). Philol. Cult. 4(26), 211–216 (2011)
Suleymanov, D.S., Gilmullin, R.A.: Two-level description of the Tatar morphology (in Russian). In: Proceedings of “Language Semantics and Image of the World” International Scientific Conference, vol. 2, pp. 65–67. Kazan State University, Kazan (1997)
Galieva, A.M., Khakimov, B.E., Gatiatullin, A.R.: A Metalanguage for describing the structure of Tatar word forms for corpus grammatical annotations (in Russian). In: Uchenye Zapiski Kazanskogo Universiteta, vol. 155(5), pp. 287–296. Seriya Gumanitarnye Nauki (2013)
HFST Homepage. https://kitwiki.csc.fi/twiki/bin/view/KitWiki/HfstHome. Accessed 20 Apr 2017
Kurbatov, K.: Grammatical homonyms in the Tatar language (in Tatar). J. Tatar Lang. Lit. 307–311 (1959)
Salimgarayeva, B.: Homonyms in modern Tatar language: abstract of dissertation (in Tatar). Bashkir State University, Ufa (1971)
Salakhova, R.R.: Homonym suffixes of the Tatar language (in Russian). Gumanitarya, Kazan (2007)
Khakimov, B.E., Gilmullin, R.A., Gataullin, R.R.: Grammatical disambiguation in the corpus of the Tatar Language (in Russian). Uchenye Zapiski Kazanskogo Universiteta. Seriya Gumanitarnye Nauki 156(5), 236–244 (2014)
Brill, E.: Unsupervised learning of disambiguation rules for part of speech tagging. In: Proceedings of the Third Workshop on Very Large Corpora, vol. 30, pp. 1–13. Association for Computational Linguistics, Somerset (1995)
Yuret, D., Ture, F.: Learning morphological disambiguation rules for Turkish. In: Proceedings of the Human Language Technology Conference of the North American Chapter of the ACL, pp. 328–334. ACL, New York (2006)
Nevzorova, O.A., Zinkina, Y., Pyatkin, N.: Resolution of functional homonymy in the Russian language based on context rules (in Russian). In: Proceedings of “Dialog’2005” International Conference, pp. 198–202. Nauka, Moscow (2005)
Tatar Grammar: Morphology (in Russian), vol. 2. Tatar Publishing Company, Kazan (1993)
Tatar Grammar: Morphology (in Tatar), vol. 2. Insan, Moscow. Fiker, Kazan (2002)
Gataullin, R.R., Gilmullin, R.A.: Web interface for removing morphological ambiguity in the corpus of the Tatar language (in Russian). In: Open Semantic Technologies for Intelligent Systems OSTIS-2015 Proceedings of IV International Scientific and Technical Conference, pp. 451–454. BSUIR, Minsk (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Gataullin, R., Khakimov, B., Suleymanov, D., Gilmullin, R. (2017). Context-Based Rules for Grammatical Disambiguation in the Tatar Language. In: Nguyen, N., Papadopoulos, G., Jędrzejowicz, P., Trawiński, B., Vossen, G. (eds) Computational Collective Intelligence. ICCCI 2017. Lecture Notes in Computer Science(), vol 10449. Springer, Cham. https://doi.org/10.1007/978-3-319-67077-5_51
Download citation
DOI: https://doi.org/10.1007/978-3-319-67077-5_51
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67076-8
Online ISBN: 978-3-319-67077-5
eBook Packages: Computer ScienceComputer Science (R0)