Abstract
Traditional approaches to Mongolian named entity recognition heavily rely on the feature engineering. Even worse, the complex morphological structure of Mongolian words made the data more sparsity. To alleviate the feature engineering and data sparsity in Mongolian named entity recognition, we propose a framework of recurrent neural networks with morpheme representation. We then study this framework in depth with different model variants. More specially, the morpheme representation utilizes the characteristic of classical Mongolian script, which can be learned from unsupervised corpus. Our model will be further augmented by different character representations and auxiliary language model losses which will extract context knowledge from scratch. By jointly decoding by Conditional Random Field layer, the model could learn the dependence between different labels. Experimental results show that feeding the morpheme representation into neural networks outperforms the word representation. The additional character representation and morpheme language model loss also improve the performance.
Similar content being viewed by others
References
Abudukelimu H, Liu Y, Chen X, Sun M, Abulizi A (2015) Learning distributed representations of uyghur words and morphemes. In: Chinese computational linguistics and natural language processing based on naturally annotated big data—14th China National Conference, CCL 2015 and third international symposium, NLP-NABD 2015, Guangzhou, China, November 13–14, 2015, Proceedings, pp 202–211
Arisoy E, Sethy A, Ramabhadran B, Chen SF (2015) Bidirectional recurrent neural network language models for automatic speech recognition. In: 2015 IEEE international conference on acoustics, speech and signal processing, ICASSP 2015, South Brisbane, Queensland, Australia, April 19–24, 2015, pp 5421–5425
Benajiba Y, Rosso P (2008) Arabic named entity recognition using conditional random fields. In: Proceedings of workshop on HLT & NLP within the Arabic World, LREC, vol 8, pp 143–153
Benajiba Y, Zitouni I, Diab M, Rosso P (2010) Arabic named entity recognition: using features extracted from noisy data. In: Proceedings of the ACL 2010 conference short papers, pp 281–285. Association for Computational Linguistics
Bengio Y, Ducharme R, Vincent P, Janvin C (2003) A neural probabilistic language model. J Mach Learn Res 3:1137–1155
Bengio Y, Simard PY, Frasconi P (1994) Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Net 5(2):157–166
Bojanowski P, Grave E, Joulin A, Mikolov T (2017) Enriching word vectors with subword information. Trans Assoc Comput Linguist 5:135–146
Botha JA, Blunsom P (2014) Compositional morphology for word representations and language modelling. In: Proceedings of the 31th international conference on machine learning, ICML 2014, Beijing, China, 21–26 June 2014, pp 1899–1907
Chen X, Xu L, Liu Z, Sun M, Luan H (2015) Joint learning of character and word embeddings. In: Proceedings of the twenty-fourth international joint conference on artificial intelligence, IJCAI 2015, Buenos Aires, Argentina, July 25–31, 2015, pp 1236–1242
Chiu J, Nichols E (2016) Named entity recognition with bidirectional lstm-cnns. Trans Assoc Comput Linguist 4:357–370. http://www.aclweb.org/anthology/Q16-1026
Cho K, van Merrienboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using rnn encoder–decoder for statistical machine translation. In: Proceedings of the 2014 conference on empirical methods in natural language processing, pp 1724–1734. Association for Computational Linguistics. https://doi.org/10.3115/v1/D14-1179. http://www.aclweb.org/anthology/D14-1179
Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12(Aug):2493–2537
David Nadeau SS (2007) A survey of named entity recognition and classification. Lingvisticae Investig 30(1):3–26
Graves A, Mohamed A, Hinton GE (2013) Speech recognition with deep recurrent neural networks. In: IEEE international conference on acoustics, speech and signal processing, ICASSP 2013, Vancouver, BC, Canada, May 26–31, 2013, pp 6645–6649
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
Huang EH, Socher R, Manning CD, Ng AY (2012) Improving word representations via global context and multiple word prototypes. In: The 50th annual meeting of the association for computational linguistics, proceedings of the conference, July 8–14, 2012, Jeju Island, Korea—Volume 1: Long Papers, pp 873–882
Huang Z, Xu W, Yu K (2015) Bidirectional LSTM-CRF models for sequence tagging. CoRR arXiv:1508.01991
Irsoy O, Cardie C (2014) Opinion mining with deep recurrent neural networks. In: Proceedings of the 2014 conference on empirical methods in natural language processing, EMNLP 2014, October 25–29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL, pp 720–728
Kazama J, Torisawa K (2007) Exploiting wikipedia as external knowledge for named entity recognition. In: Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL)
Kim Y, Jernite Y, Sontag D, Rush AM (2016) Character-aware neural language models. In: AAAI, pp 2741–2749
Konkol M, Konopík M (2013) CRF-based Czech named entity recognizer and consolidation of czech NER research. In: Text, speech, and dialogue, pp 153–160. Springer
Kudo T, Matsumoto Y (2001) Chunking with support vector machines. In: Proceedings of the 2001 conference of the North American chapter of the association for computational linguistics. Association for Computational Linguistics
Lafferty J, McCallum A, Pereira FC (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data
Lample G, Ballesteros M, Subramanian S, Kawakami K, Dyer C (2016) Neural architectures for named entity recognition. In: Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies, pp 260–270. Association for Computational Linguistics, San Diego, California
Liu L, Shang J, Xu F, Ren X, Gui H, Peng J, Han J (2017) Empower sequence labeling with task-aware neural language model. arXiv preprint arXiv:1709.04109
Luong M, Le QV, Sutskever I, Vinyals O, Kaiser L (2015) Multi-task sequence to sequence learning. CoRR arXiv:1511.06114
Luong T, Socher R, Manning CD (2013) Better word representations with recursive neural networks for morphology. In: Proceedings of the seventeenth conference on computational natural language learning, CoNLL 2013, Sofia, Bulgaria, August 8–9, 2013, pp 104–113
Ma X, Hovy E (2016) End-to-end sequence labeling via bi-directional lstm-cnns-crf. In: Proceedings of the 54th annual meeting of the association for computational linguistics (volume 1: long papers), pp 1064–1074. Association for computational linguistics. https://doi.org/10.18653/v1/P16-1101. http://www.aclweb.org/anthology/P16-1101
Mesnil G, He X, Deng L, Bengio Y (2013) Investigation of recurrent-neural-network architectures and learning methods for spoken language understanding. In: INTERSPEECH 2013, 14th annual conference of the international speech communication association, Lyon, France, August 25–29, 2013, pp 3771–3775
Ogawa A, Hori T (2015) ASR error detection and recognition rate estimation using deep bidirectional recurrent neural networks. In: 2015 IEEE international conference on acoustics, speech and signal processing, ICASSP 2015, South Brisbane, Queensland, Australia, April 19–24, 2015, pp 4370–4374
Peng N, Dredze M (2016) Improving named entity recognition for chinese social media with word segmentation representation learning. In: Proceedings of the 54th annual meeting of the association for computational linguistics (volume 2: short papers), pp 149–155. Association for computational linguistics. https://doi.org/10.18653/v1/P16-2025. http://aclweb.org/anthology/P16-2025
Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Empirical methods in natural language processing (EMNLP), pp 1532–1543. http://www.aclweb.org/anthology/D14-1162
Plank B, Søgaard A, Goldberg Y (2016) Multilingual part-of-speech tagging with bidirectional long short-term memory models and auxiliary loss. In: Proceedings of the 54th annual meeting of the association for computational linguistics (volume 2: short papers), pp 412–418. Association for computational linguistics. https://doi.org/10.18653/v1/P16-2067. http://www.aclweb.org/anthology/P16-2067
Radford W, Carreras X, Henderson J (2015) Named entity recognition with document-specific KB tag gazetteers. In: Proceedings of the 2015 conference on empirical methods in natural language processing, EMNLP 2015, Lisbon, Portugal, September 17–21, 2015, pp 512–517
Ratinov L, Roth D (2009) Design challenges and misconceptions in named entity recognition. In: Proceedings of the thirteenth conference on computational natural language learning (CoNLL-2009), pp 147–155. Association for Computational Linguistics, Boulder, Colorado
Rei M (2017) Semi-supervised multitask learning for sequence labeling. In: Proceedings of the 55th annual meeting of the association for computational linguistics (volume 1: long papers), pp 2121–2130. Association for computational linguistics. https://doi.org/10.18653/v1/P17-1194. http://www.aclweb.org/anthology/P17-1194
Rei M, Crichton G, Pyysalo S (2016) Attending to characters in neural sequence labeling models. In: Proceedings of COLING 2016, the 26th international conference on computational linguistics: technical papers, pp 309–318. The COLING 2016 organizing committee. http://www.aclweb.org/anthology/C16-1030
Reimers N, Gurevych I (2017) Reporting score distributions makes a difference: performance study of LSTM-networks for sequence tagging. In: Proceedings of the 2017 conference on empirical methods in natural language processing, pp 338–348. Association for computational linguistics. http://aclweb.org/anthology/D17-1035
Sasano R, Kurohashi S (2008) Japanese named entity recognition using structural natural language processing. In: IJCNLP, pp 607–612
Şeker GA, şen Eryiğit G (2012) Initial explorations on using CRFS for Turkish named entity recognition. In: Proceedings of the 24th international conference on computational linguistics, COLING 2012. Mumbai, India
Schuster M, Paliwal KK (1997) Bidirectional recurrent neural networks. IEEE Trans Signal Process 45(11):2673–2681
Seltzer ML, Droppo J (2013) Multi-task learning in deep neural networks for improved phoneme recognition. In: 2013 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 6965–6969. IEEE
Sen S, Mitra M, Bhattacharyya A, Sarkar R, Schwenker F, Roy K (2019) Feature selection for recognition of online handwritten bangla characters. Neural Process Lett 1–24
Wang L, Cao Z, Xia Y, de Melo G (2016) Morphological segmentation with window LSTM neural networks. In: Proceedings of the 13rd AAAI conference on artificial intelligence, pp 2842–2848
Wang W, Bao F, Gao G (2015) Mongolian named entity recognition using suffixes segmentation. In: Proceedings of 2015 international conference on asian language processing (IALP), pp 169–172. Suzhou, China
Wang W, Bao F, Gao G (2016) Cyrillic mongolian named entity recognition with rich features. In: Natural language understanding and intelligent applications, pp 497–505. Springer
Wang W, Bao F, Gao G (2016) Mongolian named entity recognition system with rich features. In: Proceedings of the 26th international conference on computational linguistics (COLING): technical papers, pp 505–512. The COLING 2016 Organizing Committee, Osaka, Japan. http://www.aclweb.org/anthology/C16-1049
Wang Z, Jiang T, Chang B, Sui Z (2015) Chinese semantic role labeling with bidirectional recurrent neural networks. In: Proceedings of the 2015 conference on empirical methods in natural language processing, EMNLP 2015, Lisbon, Portugal, September 17–21, 2015, pp 1626–1631
Yannakoudakis H, Rei M, Andersen ØE, Yuan Z (2017) Neural sequence-labelling models for grammatical error correction. In: Proceedings of the 2017 conference on empirical methods in natural language processing, pp 2795–2806
Yin R, Wang Q, Li P, Li R, Wang B (2016) Multi-granularity chinese word embedding. In: Proceedings of the 2016 conference on empirical methods in natural language processing, pp 981–986. Association for computational linguistics. https://doi.org/10.18653/v1/D16-1100. http://aclweb.org/anthology/D16-1100
Zhang X, Zhao J, LeCun Y (2015) Character-level convolutional networks for text classification. In: Cortes C, Lawrence ND, Lee DD, Sugiyama M, Garnett R (eds) Advances in Neural Information Processing Systems 28. Curran Associates, Inc., Red Hook, pp 649–657
Zhou G, Su J (2002) Named entity recognition using an HMM-based chunk tagger. In: Proceedings of the 40th annual meeting on association for computational linguistics, pp. 473–480. Association for Computational Linguistics
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work was supported by the National Natural Science Foundation of China (Nos. 61563040, 61773224); Natural Science Foundation of Inner Mongolia (No. 2016ZD06).
Rights and permissions
About this article
Cite this article
Wang, W., Bao, F. & Gao, G. Learning Morpheme Representation for Mongolian Named Entity Recognition. Neural Process Lett 50, 2647–2664 (2019). https://doi.org/10.1007/s11063-019-10044-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-019-10044-6