Abstract
Automatically building semantic resources is essential to low resource-languages like Uyghur. However, Uyghur suffers from a lack of publicly available evaluation dataset for automatically building semantic resources like WordNet. To cope with this problem, first, we build the largest Uyghur-English and English-Uyghur dictionaries by exploiting many possible online and offline resources. Then by using Princeton WordNet (PWN) 3.0 and Contemporary Uyghur Detailed Dictionary (CUDD), we construct an English-Uyghur WordNet evaluation dataset which is publicly available (https://github.com/kaharjan/uywordnet). In this dataset, more than 73,000 English synsets are mapped Uyghur automatically, in which over 20,000 are annotated manually. And the corresponding Uyghur words include definition and examples in Uyghur language context. We also propose a Synset Mapping based on Word Embeddings (SMWE) method. The experimental results on the dataset are promising.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Abiderexiti, K., Maimaiti, M., Yibulayin, T., Wumaier, A.: Annotation schemes for constructing Uyghur named entity relation corpus. In: The 2016 International Conference on Asian Language Processing (IALP 2016), pp. 103–107 (2016)
Abiderexiti, K., Maimaiti, M., Yibulayin, T., Wumaier, A.: Construction of Uyghur named entity relation corpus. Int. J. Asian Lang. Process. 27(2), 155–172 (2017)
Abudukelimu, H., Liu, Y., Chen, X., Sun, M., Abulizi, A.: Learning distributed representations of Uyghur words and morphemes. In: Sun, M., Liu, Z., Zhang, M., Liu, Y. (eds.) CCL 2015. LNCS (LNAI), vol. 9427, pp. 202–211. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25816-4_17
Abudukelimu, H., Sun, M., Liu, Y., Abulizi, A.: THUUyMorph: an Uyghur morpheme segmentation corpus. J. Chin. Inf. Process. 32(02), 81–86 (2018). (In Chinese)
Abudukelimu, H., Cheng, Y., Liu, Y., Sun, M.: Uyghur morphological segmentation with bidirectional GRU neural networks. J. Tsinghua Univ. (Sci. Technol.) 57(1), 1–5 (2017). (In Chinese)
Aierken, R., Xiao, L., Tohti, A., Jiang, Z.M.: Constructing a Uyghur language semantic lexicon based on WordNet. In: 2014 Science and Information Conference, pp. 182–186 (2014)
Arcan, M., McCrae, J.P., Buitelaar, P.: Expanding WordNets to new languages with multilingual sense disambiguation. In: Proceedings of COLING 2016: Technical Papers, pp. 97–108 (2016)
Bond, F., Foster, R.: Linking and extending an open multilingual WordNet. Proc. ACL 2013, 1352–1362 (2013)
Ercan, G., Haziyev, F.: Synset expansion on translation graph for automatic WordNet construction. Inf. Process. Manag. 56(1), 130–150 (2019)
Fellbaum, C.: WordNet. In: Poli, R., Healy, M., Kameas, A. (eds.) Theory and Applications of Ontology: Computer Applications, pp. 231–243. Springer, Dordrecht (2010). https://doi.org/10.1007/978-90-481-8847-5_10
Grönroos, S.A., Virpioja, S., Smit, P., Kurimo, M.: Morfessor FlatCat: an HMM-based method for unsupervised and semi-supervised learning of morphology. In: Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics, pp. 1177–1185 (2014)
Huang, C.R., Chang, R.Y., Lee, S.B.: Sinica BOW (Bilingual Ontological WordNet): integration of bilingual WordNet and SUMO. In: Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC 2004), Lisbon, Portugal, pp. 26–28 (2004)
Huang, C., et al.: Chinese WordNet: design, implementation, and application of an infrastructure for cross-lingual knowledge processing. J. Chin. Inf. Process. 24(02), 14–23 (2010). (In Chinese)
Khodak, M., Risteski, A., Fellbaum, C., Arora, S.: Automated WordNet construction using word embeddings. In: Proceedings of the 1st Workshop on Sense, Concept and Entity Representations and their Applications, pp. 12–23 (2017)
Lam, K.N., Al Tarouti, F., Kalita, J.: Automatically constructing WordNet synsets. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 106–111 (2014)
Maimaiti, M., Wumaier, A., Abiderexiti, K., Wang, L., Wu, H., Yibulayin, T.: Construction of Uyghur named entity corpus. In: Yang, E., Sun, L. (eds.) Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). European Language Resources Association (ELRA), Miyazaki (2018)
Montazery, M., Faili, H.: Automatic Persian WordNet construction. In: COLING 2010: Poster, Beijing, China, pp. 846–850 (2010)
Osman, T., Yang, Y., Tursun, E., Cheng, L.: Collaborative analysis of Uyghur morphology based on character level. Acta Scientiarum Naturalium Universitatis Pekinensis 55(01), 47–54 (2019). (In Chinese)
Qiu, L., Yang, H., Zhou, R.: The design and implementation of Chinese-Uighur-English online dictionary based on knowledge graph. In: 22017 IEEE International Conference on Computational Science and Engineering (CSE) and IEEE International Conference on Embedded and Ubiquitous Computing (EUC), pp. 883–886 (2017)
Qiu, L., Yang, N., Maolimamuti, M.: Chinese-Uyghur-English semantic search based on the knowledge graphs. In: 2017 IEEE International Conference on Computational Science and Engineering (CSE) and IEEE International Conference on Embedded and Ubiquitous Computing (EUC), pp. 879–882 (2017)
Qiu, L., Zhang, H.: Review of development and construction of Uyghur knowledge graph. In: 2017 IEEE International Conference on Computational Science and Engineering (CSE) and IEEE International Conference on Embedded and Ubiquitous Computing (EUC), pp. 894–897 (2017)
Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany, pp. 1715–1725 (2016)
Tarouti, F.A., Kalita, J.: Enhancing automatic WordNet construction using word embeddings. In: Proceedings of the Workshop on Multilingual and Cross-lingual Methods in NLP, pp. 30–34. Association for Computational Linguistics (2016)
Virpioja, S., Smit, P., Grönroos, S.A., Kurimo, M.: Morfessor 2.0: Python implementation and extensions for Morfessor baseline. Technical report, Aalto University, School of Electrical Engineering, Department of Signal Processing and Acoustic (2013)
Wang, S., Bond, F.: Building the Chinese open WordNet ( COW ): starting from core synsets. In: International Joint Conference on Natural Language Processing, pp. 10–18. Asian Federation of Natural Language Processing, Nagoya (2013)
Xu, R., Gao, Z., Pan, Y., Qu, Y., Huang, Z.: An integrated approach for automatic construction of bilingual Chinese-English Wordnet. In: Domingue, J., Anutariya, C. (eds.) ASWC 2008. LNCS, vol. 5367, pp. 302–314. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-89704-0_21
Yilahun, H., Imam, S., Hamdulla, A.: A survey on Uyghur ontology. Int. J. Database Theor. Appl. 8(4), 157–168 (2015)
Acknowledgments
This work is supported by National Natural Science Foundation of China (NSFC) grant 61532001.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Abiderexiti, K., Sun, M. (2019). Construction of an English-Uyghur WordNet Dataset. In: Sun, M., Huang, X., Ji, H., Liu, Z., Liu, Y. (eds) Chinese Computational Linguistics. CCL 2019. Lecture Notes in Computer Science(), vol 11856. Springer, Cham. https://doi.org/10.1007/978-3-030-32381-3_31
Download citation
DOI: https://doi.org/10.1007/978-3-030-32381-3_31
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32380-6
Online ISBN: 978-3-030-32381-3
eBook Packages: Computer ScienceComputer Science (R0)