Skip to main content

Construction of an English-Uyghur WordNet Dataset

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11856))

Abstract

Automatically building semantic resources is essential to low resource-languages like Uyghur. However, Uyghur suffers from a lack of publicly available evaluation dataset for automatically building semantic resources like WordNet. To cope with this problem, first, we build the largest Uyghur-English and English-Uyghur dictionaries by exploiting many possible online and offline resources. Then by using Princeton WordNet (PWN) 3.0 and Contemporary Uyghur Detailed Dictionary (CUDD), we construct an English-Uyghur WordNet evaluation dataset which is publicly available (https://github.com/kaharjan/uywordnet). In this dataset, more than 73,000 English synsets are mapped Uyghur automatically, in which over 20,000 are annotated manually. And the corresponding Uyghur words include definition and examples in Uyghur language context. We also propose a Synset Mapping based on Word Embeddings (SMWE) method. The experimental results on the dataset are promising.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://globalwordnet.org/resources/wordnets-in-the-world/.

  2. 2.

    http://babelnet.org/.

  3. 3.

    https://www.wiktionary.org.

  4. 4.

    http://cldr.unicode.org.

  5. 5.

    http://www.chineseldc.org.

References

  1. Abiderexiti, K., Maimaiti, M., Yibulayin, T., Wumaier, A.: Annotation schemes for constructing Uyghur named entity relation corpus. In: The 2016 International Conference on Asian Language Processing (IALP 2016), pp. 103–107 (2016)

    Google Scholar 

  2. Abiderexiti, K., Maimaiti, M., Yibulayin, T., Wumaier, A.: Construction of Uyghur named entity relation corpus. Int. J. Asian Lang. Process. 27(2), 155–172 (2017)

    Google Scholar 

  3. Abudukelimu, H., Liu, Y., Chen, X., Sun, M., Abulizi, A.: Learning distributed representations of Uyghur words and morphemes. In: Sun, M., Liu, Z., Zhang, M., Liu, Y. (eds.) CCL 2015. LNCS (LNAI), vol. 9427, pp. 202–211. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25816-4_17

    Chapter  Google Scholar 

  4. Abudukelimu, H., Sun, M., Liu, Y., Abulizi, A.: THUUyMorph: an Uyghur morpheme segmentation corpus. J. Chin. Inf. Process. 32(02), 81–86 (2018). (In Chinese)

    Google Scholar 

  5. Abudukelimu, H., Cheng, Y., Liu, Y., Sun, M.: Uyghur morphological segmentation with bidirectional GRU neural networks. J. Tsinghua Univ. (Sci. Technol.) 57(1), 1–5 (2017). (In Chinese)

    Google Scholar 

  6. Aierken, R., Xiao, L., Tohti, A., Jiang, Z.M.: Constructing a Uyghur language semantic lexicon based on WordNet. In: 2014 Science and Information Conference, pp. 182–186 (2014)

    Google Scholar 

  7. Arcan, M., McCrae, J.P., Buitelaar, P.: Expanding WordNets to new languages with multilingual sense disambiguation. In: Proceedings of COLING 2016: Technical Papers, pp. 97–108 (2016)

    Google Scholar 

  8. Bond, F., Foster, R.: Linking and extending an open multilingual WordNet. Proc. ACL 2013, 1352–1362 (2013)

    Google Scholar 

  9. Ercan, G., Haziyev, F.: Synset expansion on translation graph for automatic WordNet construction. Inf. Process. Manag. 56(1), 130–150 (2019)

    Article  Google Scholar 

  10. Fellbaum, C.: WordNet. In: Poli, R., Healy, M., Kameas, A. (eds.) Theory and Applications of Ontology: Computer Applications, pp. 231–243. Springer, Dordrecht (2010). https://doi.org/10.1007/978-90-481-8847-5_10

    Chapter  Google Scholar 

  11. Grönroos, S.A., Virpioja, S., Smit, P., Kurimo, M.: Morfessor FlatCat: an HMM-based method for unsupervised and semi-supervised learning of morphology. In: Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics, pp. 1177–1185 (2014)

    Google Scholar 

  12. Huang, C.R., Chang, R.Y., Lee, S.B.: Sinica BOW (Bilingual Ontological WordNet): integration of bilingual WordNet and SUMO. In: Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC 2004), Lisbon, Portugal, pp. 26–28 (2004)

    Google Scholar 

  13. Huang, C., et al.: Chinese WordNet: design, implementation, and application of an infrastructure for cross-lingual knowledge processing. J. Chin. Inf. Process. 24(02), 14–23 (2010). (In Chinese)

    Google Scholar 

  14. Khodak, M., Risteski, A., Fellbaum, C., Arora, S.: Automated WordNet construction using word embeddings. In: Proceedings of the 1st Workshop on Sense, Concept and Entity Representations and their Applications, pp. 12–23 (2017)

    Google Scholar 

  15. Lam, K.N., Al Tarouti, F., Kalita, J.: Automatically constructing WordNet synsets. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 106–111 (2014)

    Google Scholar 

  16. Maimaiti, M., Wumaier, A., Abiderexiti, K., Wang, L., Wu, H., Yibulayin, T.: Construction of Uyghur named entity corpus. In: Yang, E., Sun, L. (eds.) Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). European Language Resources Association (ELRA), Miyazaki (2018)

    Google Scholar 

  17. Montazery, M., Faili, H.: Automatic Persian WordNet construction. In: COLING 2010: Poster, Beijing, China, pp. 846–850 (2010)

    Google Scholar 

  18. Osman, T., Yang, Y., Tursun, E., Cheng, L.: Collaborative analysis of Uyghur morphology based on character level. Acta Scientiarum Naturalium Universitatis Pekinensis 55(01), 47–54 (2019). (In Chinese)

    MATH  Google Scholar 

  19. Qiu, L., Yang, H., Zhou, R.: The design and implementation of Chinese-Uighur-English online dictionary based on knowledge graph. In: 22017 IEEE International Conference on Computational Science and Engineering (CSE) and IEEE International Conference on Embedded and Ubiquitous Computing (EUC), pp. 883–886 (2017)

    Google Scholar 

  20. Qiu, L., Yang, N., Maolimamuti, M.: Chinese-Uyghur-English semantic search based on the knowledge graphs. In: 2017 IEEE International Conference on Computational Science and Engineering (CSE) and IEEE International Conference on Embedded and Ubiquitous Computing (EUC), pp. 879–882 (2017)

    Google Scholar 

  21. Qiu, L., Zhang, H.: Review of development and construction of Uyghur knowledge graph. In: 2017 IEEE International Conference on Computational Science and Engineering (CSE) and IEEE International Conference on Embedded and Ubiquitous Computing (EUC), pp. 894–897 (2017)

    Google Scholar 

  22. Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany, pp. 1715–1725 (2016)

    Google Scholar 

  23. Tarouti, F.A., Kalita, J.: Enhancing automatic WordNet construction using word embeddings. In: Proceedings of the Workshop on Multilingual and Cross-lingual Methods in NLP, pp. 30–34. Association for Computational Linguistics (2016)

    Google Scholar 

  24. Virpioja, S., Smit, P., Grönroos, S.A., Kurimo, M.: Morfessor 2.0: Python implementation and extensions for Morfessor baseline. Technical report, Aalto University, School of Electrical Engineering, Department of Signal Processing and Acoustic (2013)

    Google Scholar 

  25. Wang, S., Bond, F.: Building the Chinese open WordNet ( COW ): starting from core synsets. In: International Joint Conference on Natural Language Processing, pp. 10–18. Asian Federation of Natural Language Processing, Nagoya (2013)

    Google Scholar 

  26. Xu, R., Gao, Z., Pan, Y., Qu, Y., Huang, Z.: An integrated approach for automatic construction of bilingual Chinese-English Wordnet. In: Domingue, J., Anutariya, C. (eds.) ASWC 2008. LNCS, vol. 5367, pp. 302–314. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-89704-0_21

    Chapter  Google Scholar 

  27. Yilahun, H., Imam, S., Hamdulla, A.: A survey on Uyghur ontology. Int. J. Database Theor. Appl. 8(4), 157–168 (2015)

    Article  Google Scholar 

Download references

Acknowledgments

This work is supported by National Natural Science Foundation of China (NSFC) grant 61532001.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maosong Sun .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Abiderexiti, K., Sun, M. (2019). Construction of an English-Uyghur WordNet Dataset. In: Sun, M., Huang, X., Ji, H., Liu, Z., Liu, Y. (eds) Chinese Computational Linguistics. CCL 2019. Lecture Notes in Computer Science(), vol 11856. Springer, Cham. https://doi.org/10.1007/978-3-030-32381-3_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-32381-3_31

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-32380-6

  • Online ISBN: 978-3-030-32381-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics