Abstract
Named Entity Recognition in Arabic is a challenging topic because of morphological and lexical richness of Arabic. In this paper, we propose an Arabic NER system that is based on word embedding. Word embedding hold semantic information about the context of the words. We hypothesized that the integration of word embedding features to the conventional lexical and contextual features could improve Arabic NER performance. The Conditional Random Field (CRF) sequence classifier was used. Since most CRF implementations only support categorical features, continuous word embedding vectors are clustered. In this paper, we are mainly investigating the effect of the number of clusters on NER performance. Moreover, the combination of fine and coarse grained clusters has resulted in further recognition improvement.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvisticae Investigationes 30, 3–26 (2007)
Lafferty, J., McCallum, A., Pereira, F.C.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the 18th International Conference on Machine Learning, pp. 282–289 (2001)
Benajiba, Y., Rosso, P.: Arabic named entity recognition using conditional random fields. In: Proceedings of Workshop on HLT & NLP within the Arabic World, LREC, vol. 8, pp. 143–153(2008)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)
Shaalan, K.: A survey of Arabic named entity recognition and classification. Comput. Linguist. 40, 469–510 (2014)
Shaalan, K., Raza, H.: Nera: named entity recognition for Arabic. J. Am. Soc. Inf. Sci. 60, 1652–1663 (2009)
Benajiba, Y., Rosso, P., BenedÃruiz, J.: Anersys: an Arabic named entity recognition system based on maximum entropy. Computational Linguistics and Intelligent Text Processing, pp. 143–153(2007)
Benajiba, Y., Diab, M., Rosso, P.: Arabic named entity recognition using optimized feature sets. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 284–293. Association for Computational Linguistics(2008)
Abdul-Hamid, A., Darwish, K.: Simplified feature set for Arabic named entity recognition. In: Proceedings of the 2010 Named Entities Workshop, pp. 110–115. Association for Computational Linguistics (2010)
Oudah, M., Shaalan, K.F.: A pipeline Arabic named entity recognition using a hybrid approach. In: Coling, pp. 2159–2176 (2012)
Samy, D., Moreno, A., Guirao, J.M.: A proposal for an Arabic named entity tagger leveraging a parallel corpus. In: International Conference RANLP, Borovets, Bulgaria, pp. 459–465(2005)
Seok, M., Song, H.J., Park, C.Y., Kim, J.D., Kim, Y.S.: Named entity recognition using word embedding as a feature. Int. J. Softw. Eng. Appl. 10, 93–104 (2016)
Zirikly, A., Diab, M.: Named entity recognition for Arabic social media. In: Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing, pp. 176–185(2015)
Laachfoubi, N., et al.: Arabic named entity recognition using word representations. Int. J. Comput. Sci. Inf. Secur. 14, 956 (2016)
Robert, P., David, G., Ke, C., Junbo, K., Kazuaki, M.: Arabic gigaword fourth edition ldc2009t30. Linguistic Data Consortium (LDC), Philadelphia (2009)
Taghva, K., Elkhoury, R., Coombs, J.: Arabic stemming without a root dictionary. In: International Conference on Information Technology: Coding and Computing, 2005. ITCC 2005, vol. 1, pp. 152–157. IEEE (2005)
Nguyen, D.Q., Nguyen, D.Q., Pham, D.D., Pham, S.B.: Rdrpostagger: A ripple down rules-based part-of-speech tagger. In: Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics, pp. 17–20(2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 Springer Nature Switzerland AG
About this paper
Cite this paper
Sabty, C., Elmahdy, M., Abdennadher, S. (2023). Arabic Named Entity Recognition Using Clustered Word Embedding. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2018. Lecture Notes in Computer Science, vol 13396. Springer, Cham. https://doi.org/10.1007/978-3-031-23793-5_4
Download citation
DOI: https://doi.org/10.1007/978-3-031-23793-5_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-23792-8
Online ISBN: 978-3-031-23793-5
eBook Packages: Computer ScienceComputer Science (R0)