Construction of an English-Uyghur WordNet Dataset

Abiderexiti, Kahaerjiang; Sun, Maosong

doi:10.1007/978-3-030-32381-3_31

Construction of an English-Uyghur WordNet Dataset

Kahaerjiang Abiderexiti¹³ &
Maosong Sun¹³

Conference paper
First Online: 13 October 2019

4160 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11856))

Abstract

Automatically building semantic resources is essential to low resource-languages like Uyghur. However, Uyghur suffers from a lack of publicly available evaluation dataset for automatically building semantic resources like WordNet. To cope with this problem, first, we build the largest Uyghur-English and English-Uyghur dictionaries by exploiting many possible online and offline resources. Then by using Princeton WordNet (PWN) 3.0 and Contemporary Uyghur Detailed Dictionary (CUDD), we construct an English-Uyghur WordNet evaluation dataset which is publicly available (https://github.com/kaharjan/uywordnet). In this dataset, more than 73,000 English synsets are mapped Uyghur automatically, in which over 20,000 are annotated manually. And the corresponding Uyghur words include definition and examples in Uyghur language context. We also propose a Synset Mapping based on Word Embeddings (SMWE) method. The experimental results on the dataset are promising.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

References

Abiderexiti, K., Maimaiti, M., Yibulayin, T., Wumaier, A.: Annotation schemes for constructing Uyghur named entity relation corpus. In: The 2016 International Conference on Asian Language Processing (IALP 2016), pp. 103–107 (2016)
Google Scholar
Abiderexiti, K., Maimaiti, M., Yibulayin, T., Wumaier, A.: Construction of Uyghur named entity relation corpus. Int. J. Asian Lang. Process. 27(2), 155–172 (2017)
Google Scholar
Abudukelimu, H., Liu, Y., Chen, X., Sun, M., Abulizi, A.: Learning distributed representations of Uyghur words and morphemes. In: Sun, M., Liu, Z., Zhang, M., Liu, Y. (eds.) CCL 2015. LNCS (LNAI), vol. 9427, pp. 202–211. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25816-4_17
Chapter Google Scholar
Abudukelimu, H., Sun, M., Liu, Y., Abulizi, A.: THUUyMorph: an Uyghur morpheme segmentation corpus. J. Chin. Inf. Process. 32(02), 81–86 (2018). (In Chinese)
Google Scholar
Abudukelimu, H., Cheng, Y., Liu, Y., Sun, M.: Uyghur morphological segmentation with bidirectional GRU neural networks. J. Tsinghua Univ. (Sci. Technol.) 57(1), 1–5 (2017). (In Chinese)
Google Scholar
Aierken, R., Xiao, L., Tohti, A., Jiang, Z.M.: Constructing a Uyghur language semantic lexicon based on WordNet. In: 2014 Science and Information Conference, pp. 182–186 (2014)
Google Scholar
Arcan, M., McCrae, J.P., Buitelaar, P.: Expanding WordNets to new languages with multilingual sense disambiguation. In: Proceedings of COLING 2016: Technical Papers, pp. 97–108 (2016)
Google Scholar
Bond, F., Foster, R.: Linking and extending an open multilingual WordNet. Proc. ACL 2013, 1352–1362 (2013)
Google Scholar
Ercan, G., Haziyev, F.: Synset expansion on translation graph for automatic WordNet construction. Inf. Process. Manag. 56(1), 130–150 (2019)
Article Google Scholar
Fellbaum, C.: WordNet. In: Poli, R., Healy, M., Kameas, A. (eds.) Theory and Applications of Ontology: Computer Applications, pp. 231–243. Springer, Dordrecht (2010). https://doi.org/10.1007/978-90-481-8847-5_10
Chapter Google Scholar
Grönroos, S.A., Virpioja, S., Smit, P., Kurimo, M.: Morfessor FlatCat: an HMM-based method for unsupervised and semi-supervised learning of morphology. In: Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics, pp. 1177–1185 (2014)
Google Scholar
Huang, C.R., Chang, R.Y., Lee, S.B.: Sinica BOW (Bilingual Ontological WordNet): integration of bilingual WordNet and SUMO. In: Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC 2004), Lisbon, Portugal, pp. 26–28 (2004)
Google Scholar
Huang, C., et al.: Chinese WordNet: design, implementation, and application of an infrastructure for cross-lingual knowledge processing. J. Chin. Inf. Process. 24(02), 14–23 (2010). (In Chinese)
Google Scholar
Khodak, M., Risteski, A., Fellbaum, C., Arora, S.: Automated WordNet construction using word embeddings. In: Proceedings of the 1st Workshop on Sense, Concept and Entity Representations and their Applications, pp. 12–23 (2017)
Google Scholar
Lam, K.N., Al Tarouti, F., Kalita, J.: Automatically constructing WordNet synsets. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 106–111 (2014)
Google Scholar
Maimaiti, M., Wumaier, A., Abiderexiti, K., Wang, L., Wu, H., Yibulayin, T.: Construction of Uyghur named entity corpus. In: Yang, E., Sun, L. (eds.) Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). European Language Resources Association (ELRA), Miyazaki (2018)
Google Scholar
Montazery, M., Faili, H.: Automatic Persian WordNet construction. In: COLING 2010: Poster, Beijing, China, pp. 846–850 (2010)
Google Scholar
Osman, T., Yang, Y., Tursun, E., Cheng, L.: Collaborative analysis of Uyghur morphology based on character level. Acta Scientiarum Naturalium Universitatis Pekinensis 55(01), 47–54 (2019). (In Chinese)
MATH Google Scholar
Qiu, L., Yang, H., Zhou, R.: The design and implementation of Chinese-Uighur-English online dictionary based on knowledge graph. In: 22017 IEEE International Conference on Computational Science and Engineering (CSE) and IEEE International Conference on Embedded and Ubiquitous Computing (EUC), pp. 883–886 (2017)
Google Scholar
Qiu, L., Yang, N., Maolimamuti, M.: Chinese-Uyghur-English semantic search based on the knowledge graphs. In: 2017 IEEE International Conference on Computational Science and Engineering (CSE) and IEEE International Conference on Embedded and Ubiquitous Computing (EUC), pp. 879–882 (2017)
Google Scholar
Qiu, L., Zhang, H.: Review of development and construction of Uyghur knowledge graph. In: 2017 IEEE International Conference on Computational Science and Engineering (CSE) and IEEE International Conference on Embedded and Ubiquitous Computing (EUC), pp. 894–897 (2017)
Google Scholar
Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany, pp. 1715–1725 (2016)
Google Scholar
Tarouti, F.A., Kalita, J.: Enhancing automatic WordNet construction using word embeddings. In: Proceedings of the Workshop on Multilingual and Cross-lingual Methods in NLP, pp. 30–34. Association for Computational Linguistics (2016)
Google Scholar
Virpioja, S., Smit, P., Grönroos, S.A., Kurimo, M.: Morfessor 2.0: Python implementation and extensions for Morfessor baseline. Technical report, Aalto University, School of Electrical Engineering, Department of Signal Processing and Acoustic (2013)
Google Scholar
Wang, S., Bond, F.: Building the Chinese open WordNet ( COW ): starting from core synsets. In: International Joint Conference on Natural Language Processing, pp. 10–18. Asian Federation of Natural Language Processing, Nagoya (2013)
Google Scholar
Xu, R., Gao, Z., Pan, Y., Qu, Y., Huang, Z.: An integrated approach for automatic construction of bilingual Chinese-English Wordnet. In: Domingue, J., Anutariya, C. (eds.) ASWC 2008. LNCS, vol. 5367, pp. 302–314. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-89704-0_21
Chapter Google Scholar
Yilahun, H., Imam, S., Hamdulla, A.: A survey on Uyghur ontology. Int. J. Database Theor. Appl. 8(4), 157–168 (2015)
Article Google Scholar

Download references

Acknowledgments

This work is supported by National Natural Science Foundation of China (NSFC) grant 61532001.

Author information

Authors and Affiliations

Department of Computer Science and Technology, Institute for Artificial Intelligence, State Key Lab on Intelligent Technology and Systems, Tsinghua University, Beijing, China
Kahaerjiang Abiderexiti & Maosong Sun

Authors

Kahaerjiang Abiderexiti
View author publications
You can also search for this author in PubMed Google Scholar
Maosong Sun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Maosong Sun .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Maosong Sun
Fudan University, Shanghai, China
Xuanjing Huang
University of Illinois at Urbana Champaign, Illinois, USA
Heng Ji
Tsinghua University, Beijing, China
Zhiyuan Liu
Tsinghua University, Beijing, China
Yang Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Abiderexiti, K., Sun, M. (2019). Construction of an English-Uyghur WordNet Dataset. In: Sun, M., Huang, X., Ji, H., Liu, Z., Liu, Y. (eds) Chinese Computational Linguistics. CCL 2019. Lecture Notes in Computer Science(), vol 11856. Springer, Cham. https://doi.org/10.1007/978-3-030-32381-3_31

Download citation

DOI: https://doi.org/10.1007/978-3-030-32381-3_31
Published: 13 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32380-6
Online ISBN: 978-3-030-32381-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics