Skip to main content
Log in

Learning Morpheme Representation for Mongolian Named Entity Recognition

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

Traditional approaches to Mongolian named entity recognition heavily rely on the feature engineering. Even worse, the complex morphological structure of Mongolian words made the data more sparsity. To alleviate the feature engineering and data sparsity in Mongolian named entity recognition, we propose a framework of recurrent neural networks with morpheme representation. We then study this framework in depth with different model variants. More specially, the morpheme representation utilizes the characteristic of classical Mongolian script, which can be learned from unsupervised corpus. Our model will be further augmented by different character representations and auxiliary language model losses which will extract context knowledge from scratch. By jointly decoding by Conditional Random Field layer, the model could learn the dependence between different labels. Experimental results show that feeding the morpheme representation into neural networks outperforms the word representation. The additional character representation and morpheme language model loss also improve the performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Abudukelimu H, Liu Y, Chen X, Sun M, Abulizi A (2015) Learning distributed representations of uyghur words and morphemes. In: Chinese computational linguistics and natural language processing based on naturally annotated big data—14th China National Conference, CCL 2015 and third international symposium, NLP-NABD 2015, Guangzhou, China, November 13–14, 2015, Proceedings, pp 202–211

  2. Arisoy E, Sethy A, Ramabhadran B, Chen SF (2015) Bidirectional recurrent neural network language models for automatic speech recognition. In: 2015 IEEE international conference on acoustics, speech and signal processing, ICASSP 2015, South Brisbane, Queensland, Australia, April 19–24, 2015, pp 5421–5425

  3. Benajiba Y, Rosso P (2008) Arabic named entity recognition using conditional random fields. In: Proceedings of workshop on HLT & NLP within the Arabic World, LREC, vol 8, pp 143–153

  4. Benajiba Y, Zitouni I, Diab M, Rosso P (2010) Arabic named entity recognition: using features extracted from noisy data. In: Proceedings of the ACL 2010 conference short papers, pp 281–285. Association for Computational Linguistics

  5. Bengio Y, Ducharme R, Vincent P, Janvin C (2003) A neural probabilistic language model. J Mach Learn Res 3:1137–1155

    MATH  Google Scholar 

  6. Bengio Y, Simard PY, Frasconi P (1994) Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Net 5(2):157–166

    Article  Google Scholar 

  7. Bojanowski P, Grave E, Joulin A, Mikolov T (2017) Enriching word vectors with subword information. Trans Assoc Comput Linguist 5:135–146

    Article  Google Scholar 

  8. Botha JA, Blunsom P (2014) Compositional morphology for word representations and language modelling. In: Proceedings of the 31th international conference on machine learning, ICML 2014, Beijing, China, 21–26 June 2014, pp 1899–1907

  9. Chen X, Xu L, Liu Z, Sun M, Luan H (2015) Joint learning of character and word embeddings. In: Proceedings of the twenty-fourth international joint conference on artificial intelligence, IJCAI 2015, Buenos Aires, Argentina, July 25–31, 2015, pp 1236–1242

  10. Chiu J, Nichols E (2016) Named entity recognition with bidirectional lstm-cnns. Trans Assoc Comput Linguist 4:357–370. http://www.aclweb.org/anthology/Q16-1026

    Article  Google Scholar 

  11. Cho K, van Merrienboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using rnn encoder–decoder for statistical machine translation. In: Proceedings of the 2014 conference on empirical methods in natural language processing, pp 1724–1734. Association for Computational Linguistics. https://doi.org/10.3115/v1/D14-1179. http://www.aclweb.org/anthology/D14-1179

  12. Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12(Aug):2493–2537

    MATH  Google Scholar 

  13. David Nadeau SS (2007) A survey of named entity recognition and classification. Lingvisticae Investig 30(1):3–26

    Article  Google Scholar 

  14. Graves A, Mohamed A, Hinton GE (2013) Speech recognition with deep recurrent neural networks. In: IEEE international conference on acoustics, speech and signal processing, ICASSP 2013, Vancouver, BC, Canada, May 26–31, 2013, pp 6645–6649

  15. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735

    Article  Google Scholar 

  16. Huang EH, Socher R, Manning CD, Ng AY (2012) Improving word representations via global context and multiple word prototypes. In: The 50th annual meeting of the association for computational linguistics, proceedings of the conference, July 8–14, 2012, Jeju Island, Korea—Volume 1: Long Papers, pp 873–882

  17. Huang Z, Xu W, Yu K (2015) Bidirectional LSTM-CRF models for sequence tagging. CoRR arXiv:1508.01991

  18. Irsoy O, Cardie C (2014) Opinion mining with deep recurrent neural networks. In: Proceedings of the 2014 conference on empirical methods in natural language processing, EMNLP 2014, October 25–29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL, pp 720–728

  19. Kazama J, Torisawa K (2007) Exploiting wikipedia as external knowledge for named entity recognition. In: Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL)

  20. Kim Y, Jernite Y, Sontag D, Rush AM (2016) Character-aware neural language models. In: AAAI, pp 2741–2749

  21. Konkol M, Konopík M (2013) CRF-based Czech named entity recognizer and consolidation of czech NER research. In: Text, speech, and dialogue, pp 153–160. Springer

  22. Kudo T, Matsumoto Y (2001) Chunking with support vector machines. In: Proceedings of the 2001 conference of the North American chapter of the association for computational linguistics. Association for Computational Linguistics

  23. Lafferty J, McCallum A, Pereira FC (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data

  24. Lample G, Ballesteros M, Subramanian S, Kawakami K, Dyer C (2016) Neural architectures for named entity recognition. In: Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies, pp 260–270. Association for Computational Linguistics, San Diego, California

  25. Liu L, Shang J, Xu F, Ren X, Gui H, Peng J, Han J (2017) Empower sequence labeling with task-aware neural language model. arXiv preprint arXiv:1709.04109

  26. Luong M, Le QV, Sutskever I, Vinyals O, Kaiser L (2015) Multi-task sequence to sequence learning. CoRR arXiv:1511.06114

  27. Luong T, Socher R, Manning CD (2013) Better word representations with recursive neural networks for morphology. In: Proceedings of the seventeenth conference on computational natural language learning, CoNLL 2013, Sofia, Bulgaria, August 8–9, 2013, pp 104–113

  28. Ma X, Hovy E (2016) End-to-end sequence labeling via bi-directional lstm-cnns-crf. In: Proceedings of the 54th annual meeting of the association for computational linguistics (volume 1: long papers), pp 1064–1074. Association for computational linguistics. https://doi.org/10.18653/v1/P16-1101. http://www.aclweb.org/anthology/P16-1101

  29. Mesnil G, He X, Deng L, Bengio Y (2013) Investigation of recurrent-neural-network architectures and learning methods for spoken language understanding. In: INTERSPEECH 2013, 14th annual conference of the international speech communication association, Lyon, France, August 25–29, 2013, pp 3771–3775

  30. Ogawa A, Hori T (2015) ASR error detection and recognition rate estimation using deep bidirectional recurrent neural networks. In: 2015 IEEE international conference on acoustics, speech and signal processing, ICASSP 2015, South Brisbane, Queensland, Australia, April 19–24, 2015, pp 4370–4374

  31. Peng N, Dredze M (2016) Improving named entity recognition for chinese social media with word segmentation representation learning. In: Proceedings of the 54th annual meeting of the association for computational linguistics (volume 2: short papers), pp 149–155. Association for computational linguistics. https://doi.org/10.18653/v1/P16-2025. http://aclweb.org/anthology/P16-2025

  32. Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Empirical methods in natural language processing (EMNLP), pp 1532–1543. http://www.aclweb.org/anthology/D14-1162

  33. Plank B, Søgaard A, Goldberg Y (2016) Multilingual part-of-speech tagging with bidirectional long short-term memory models and auxiliary loss. In: Proceedings of the 54th annual meeting of the association for computational linguistics (volume 2: short papers), pp 412–418. Association for computational linguistics. https://doi.org/10.18653/v1/P16-2067. http://www.aclweb.org/anthology/P16-2067

  34. Radford W, Carreras X, Henderson J (2015) Named entity recognition with document-specific KB tag gazetteers. In: Proceedings of the 2015 conference on empirical methods in natural language processing, EMNLP 2015, Lisbon, Portugal, September 17–21, 2015, pp 512–517

  35. Ratinov L, Roth D (2009) Design challenges and misconceptions in named entity recognition. In: Proceedings of the thirteenth conference on computational natural language learning (CoNLL-2009), pp 147–155. Association for Computational Linguistics, Boulder, Colorado

  36. Rei M (2017) Semi-supervised multitask learning for sequence labeling. In: Proceedings of the 55th annual meeting of the association for computational linguistics (volume 1: long papers), pp 2121–2130. Association for computational linguistics. https://doi.org/10.18653/v1/P17-1194. http://www.aclweb.org/anthology/P17-1194

  37. Rei M, Crichton G, Pyysalo S (2016) Attending to characters in neural sequence labeling models. In: Proceedings of COLING 2016, the 26th international conference on computational linguistics: technical papers, pp 309–318. The COLING 2016 organizing committee. http://www.aclweb.org/anthology/C16-1030

  38. Reimers N, Gurevych I (2017) Reporting score distributions makes a difference: performance study of LSTM-networks for sequence tagging. In: Proceedings of the 2017 conference on empirical methods in natural language processing, pp 338–348. Association for computational linguistics. http://aclweb.org/anthology/D17-1035

  39. Sasano R, Kurohashi S (2008) Japanese named entity recognition using structural natural language processing. In: IJCNLP, pp 607–612

  40. Şeker GA, şen Eryiğit G (2012) Initial explorations on using CRFS for Turkish named entity recognition. In: Proceedings of the 24th international conference on computational linguistics, COLING 2012. Mumbai, India

  41. Schuster M, Paliwal KK (1997) Bidirectional recurrent neural networks. IEEE Trans Signal Process 45(11):2673–2681

    Article  Google Scholar 

  42. Seltzer ML, Droppo J (2013) Multi-task learning in deep neural networks for improved phoneme recognition. In: 2013 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 6965–6969. IEEE

  43. Sen S, Mitra M, Bhattacharyya A, Sarkar R, Schwenker F, Roy K (2019) Feature selection for recognition of online handwritten bangla characters. Neural Process Lett 1–24

  44. Wang L, Cao Z, Xia Y, de Melo G (2016) Morphological segmentation with window LSTM neural networks. In: Proceedings of the 13rd AAAI conference on artificial intelligence, pp 2842–2848

  45. Wang W, Bao F, Gao G (2015) Mongolian named entity recognition using suffixes segmentation. In: Proceedings of 2015 international conference on asian language processing (IALP), pp 169–172. Suzhou, China

  46. Wang W, Bao F, Gao G (2016) Cyrillic mongolian named entity recognition with rich features. In: Natural language understanding and intelligent applications, pp 497–505. Springer

  47. Wang W, Bao F, Gao G (2016) Mongolian named entity recognition system with rich features. In: Proceedings of the 26th international conference on computational linguistics (COLING): technical papers, pp 505–512. The COLING 2016 Organizing Committee, Osaka, Japan. http://www.aclweb.org/anthology/C16-1049

  48. Wang Z, Jiang T, Chang B, Sui Z (2015) Chinese semantic role labeling with bidirectional recurrent neural networks. In: Proceedings of the 2015 conference on empirical methods in natural language processing, EMNLP 2015, Lisbon, Portugal, September 17–21, 2015, pp 1626–1631

  49. Yannakoudakis H, Rei M, Andersen ØE, Yuan Z (2017) Neural sequence-labelling models for grammatical error correction. In: Proceedings of the 2017 conference on empirical methods in natural language processing, pp 2795–2806

  50. Yin R, Wang Q, Li P, Li R, Wang B (2016) Multi-granularity chinese word embedding. In: Proceedings of the 2016 conference on empirical methods in natural language processing, pp 981–986. Association for computational linguistics. https://doi.org/10.18653/v1/D16-1100. http://aclweb.org/anthology/D16-1100

  51. Zhang X, Zhao J, LeCun Y (2015) Character-level convolutional networks for text classification. In: Cortes C, Lawrence ND, Lee DD, Sugiyama M, Garnett R (eds) Advances in Neural Information Processing Systems 28. Curran Associates, Inc., Red Hook, pp 649–657

    Google Scholar 

  52. Zhou G, Su J (2002) Named entity recognition using an HMM-based chunk tagger. In: Proceedings of the 40th annual meeting on association for computational linguistics, pp. 473–480. Association for Computational Linguistics

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Feilong Bao.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported by the National Natural Science Foundation of China (Nos. 61563040, 61773224); Natural Science Foundation of Inner Mongolia (No. 2016ZD06).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, W., Bao, F. & Gao, G. Learning Morpheme Representation for Mongolian Named Entity Recognition. Neural Process Lett 50, 2647–2664 (2019). https://doi.org/10.1007/s11063-019-10044-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-019-10044-6

Keywords

Navigation