Skip to main content

Point the Point: Uyghur Morphological Segmentation Using PointerNetwork with GRU

  • Conference paper
  • First Online:
Chinese Computational Linguistics (CCL 2019)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11856))

Included in the following conference series:

Abstract

Uyghur is an agglutinative language that has many morphemes. It is necessary for processing Uyghur to segment words into morphemes. This work is called morphological segmentation. Previous works treat morphological segmentation as a tagging task and classify each character as one of four classes, which are \(\{b,m,e,s\}\). However, these labels are not independent from each other, which makes the models easily overfitted. We propose a new method for the segmentation task. Instead of using these labels, we use only segmentation points for modeling. The model used in our method is more robust and easier to train than previous methods. Applying our model to Uyghur morphological segmentation, it achieves high accuracy and higher recall and f1 score than previous models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Change history

  • 13 October 2019

    The original version of this chapter contained an error in the second author’s name. The spelling of Shuqin Li’s name was incorrect in the header of the paper. The author’s name has been corrected.

References

  • Abudukelimu, H., Cheng, Y., Liu, Y., Sun, M.: Uyghur morphological segmentation with bidirectional GRU neural networks. J. Tsinghua Univ. (Sci. Technol.) 57(1), 1–6 (2017)

    Google Scholar 

  • Bahdanau, D., Cho, K., Bengio, Y.: Neural Machine Translation by Jointly Learning to Align and Translate. arXiv:1409.0473 [cs, stat], September 2014

  • Bergmanis, T., Goldwater, S.: From segmentation to analyses: a probabilistic model for unsupervised morphology induction. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, pp. 337–346. Association for Computational Linguistics, Valencia, April 2017

    Google Scholar 

  • Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1724–1734, October 2014

    Google Scholar 

  • Cotterell, R., Vieira, T., Schütze, H.: A joint model of orthography and morphological segmentation. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 664–669. Association for Computational Linguistics, San Diego (2016)

    Google Scholar 

  • Creutz, M., Lagus, K.: Unsupervised discovery of morphemes. In: Proceedings of the ACL-2002 Workshop on Morphological and Phonological Learning (2002)

    Google Scholar 

  • Goldsmith, J.: Unsupervised learning of the morphology of a natural language. Comput. Linguis. 27(2), 153–198 (2001)

    Article  MathSciNet  Google Scholar 

  • Halidanmu, A., Abudukelimu, A., Sun, M., Liu, Y.: THUUyMorph: an uyghur morpheme segmentation corpus. J. Chin. Inf. Process. 32(2), 81 (2018)

    Google Scholar 

  • Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  • Kiefer, J., Wolfowitz, J.: Stochastic estimation of the maximum of a regression function. Ann. Math. Stat. 23(3), 462–466 (1952)

    Article  MathSciNet  Google Scholar 

  • Orhun, M., Tantug, A.C., Adali, E.: Rule based analysis of the uyghur nouns. Int. J. Asian Lang. Proc. 19(1), 33–44 (2009)

    Google Scholar 

  • Osman, T., Yang, Y., Tursun, E., Cheng, L.: Collaborative analysis of uyghur morphology based on character level. Beijing Daxue Xuebao (Ziran Kexue Ban)/Acta Scientiarum Naturalium Universitatis Pekinensis 55, 47–54 (2019)

    Google Scholar 

  • Plank, B., Søgaard, A., Goldberg, Y.: Multilingual part-of-speech tagging with bidirectional long short-term memory models and auxiliary loss. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 412–418, August 2016

    Google Scholar 

  • Poon, H., Cherry, C., Toutanova, K.: Unsupervised morphological segmentation with log-linear models. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 209–217. Association for Computational Linguistics, Boulder (2009)

    Google Scholar 

  • Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 27, pp. 3104–3112. Curran Associates, Inc. (2014)

    Google Scholar 

  • Vaswani, A., et al.: Attention is all you need. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems, vol. 30, pp. 5998–6008. Curran Associates, Inc. (2017)

    Google Scholar 

  • Vinyals, O., Fortunato, M., Jaitly, N.: Pointer networks. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 28, pp. 2692–2700. Curran Associates, Inc. (2015)

    Google Scholar 

  • Wang, L., Cao, Z., Xia, Y., de Melo, G.: Morphological segmentation with window LSTM neural networks. In: Thirtieth AAAI Conference on Artificial Intelligence, March 2016

    Google Scholar 

Download references

Acknowledgements

This work was supported by National Science Foundation of China (Grant No. 61772075), National Science Foundation of China (Grant No. 61772081), Scientific Research Project of Beijing Educational Committee (Grant No. KM201711232022), Beijing Municipal Education Committee (Grant No. SZ20171123228), Beijing Institute of Computer Technology and Application (Grant by Extensible Knowledge Graph Construction Technique Project).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hua-Ping Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yang, Y., Li, S., Zhang, Y., Zhang, HP. (2019). Point the Point: Uyghur Morphological Segmentation Using PointerNetwork with GRU. In: Sun, M., Huang, X., Ji, H., Liu, Z., Liu, Y. (eds) Chinese Computational Linguistics. CCL 2019. Lecture Notes in Computer Science(), vol 11856. Springer, Cham. https://doi.org/10.1007/978-3-030-32381-3_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-32381-3_30

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-32380-6

  • Online ISBN: 978-3-030-32381-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics