Skip to main content
Log in

Arabic named entity recognition via deep co-learning

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

Named entity recognition (NER) is an important natural language processing (NLP) task with many applications. We tackle the problem of Arabic NER using deep learning based on Arabic word embeddings that capture syntactic and semantic relationships between words. Deep learning has been shown to perform significantly better than other approaches for various NLP tasks including NER. However, deep-learning models also require a significantly large amount of training data, which is highly lacking in the case of the Arabic language. To remedy this, we adopt the semi-supervised co-training approach to the realm of deep learning, which we refer to as deep co-learning. Our deep co-learning approach makes use of a small amount of labeled data, which is augmented with partially labeled data that is automatically generated from Wikipedia. Our approach relies only on word embeddings as features and does not involve any additional feature engineering. Nonetheless, when tested on three different Arabic NER benchmarks, our approach consistently outperforms state-of-the-art Arabic NER approaches, including ones that employ carefully-crafted NLP features. It also consistently outperforms various baselines including purely-supervised deep-learning approaches as well as semi-supervised ones that make use of only unlabeled data such as self-learning and the traditional co-training approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Anercorp (2007) http://www1.ccls.columbia.edu/~ybenajiba/downloads.html

  • Abdallah S, Shaalan K, Shoaib M (2012) Integrating rule-based system with classification for Arabic named entity recognition. In: International conference on intelligent text processing and computational Linguistics, Springer, pp 311–322

  • Abdelali A, Darwish K, Durrani N, Mubarak H (2016) Farasa: a fast and furious segmenter for arabic. In: Proceedings of the 2016 conference of the North American chapter of the association for computational Linguistics: demonstrations, pp 11–16. Association for computational Linguistics. San Diego, California

  • AbdelRahman S, Elarnaoty M, Magdy M, Fahmy A (2010) Integrated machine learning techniques for arabic named entity recognition. IJCSI 7:27–36

    Google Scholar 

  • Abdul-Hamid A, Darwish K (2010) Simplified feature set for arabic named entity recognition. In: Proceedings of the 2010 named entities workshop, pp 110–115. Association for computational Linguistics

  • Abuleil S (2004) Extracting names from arabic text for question-answering systems. In coupling approaches, coupling media and coupling languages for information retrieval, pp 638–647. LE CENTRE DE HAUTES ETUDES INTERNATIONALES D’INFORMATIQUE DOCUMENTAIRE

  • Al-Ahmari S, Al-Johar B (2016) Cross domains arabic named entity recognition system. In: First international workshop on pattern recognition, pp 100111I–100111I. International society for optics and photonics

  • Al-Rfou R, Perozzi B, Skiena S (2013) Polyglot: distributed word representations for multilingual nlp. arXiv preprint arXiv:1307.1662

  • Al-Shalabi R, Kanaan G, Al-Sarayreh B, Khanfar K, Al-Ghonmein A, Talhouni H, Al-Azazmeh S (2009) Proper noun extracting algorithm for arabic language. In International conference on IT, Thailand

    Google Scholar 

  • Alkharashi I (2009) Person named entity generation and recognition for arabic language. In: Proceedings of the second international conference on Arabic language resources and tools, pp 205–208. Citeseer

  • Alotaibi F, Lee MG (2014) A hybrid approach to features representation for fine-grained Arabic named entity recognition. In COLING, pp 984–995

  • Althobaiti M, Kruschwitz U, Massimo P (2014) Aranlp: a java-based library for the processing of arabic text, pp 4134–4138

  • Althobaiti M, Kruschwitz U, Poesio M (2014) Automatic creation of arabic named entity annotated corpus using wikipedia, pp 106–115

  • Benajiba Y, Diab M, Rosso P (2008) Arabic named entity recognition using optimized feature sets. In: Proceedings of the conference on empirical methods in natural language processing, pp 284–293. Association for computational Linguistics

  • Benajiba Y, Diab M, Rosso P (2009) Arabic named entity recognition: a feature-driven study. IEEE Trans Audio Speech Lang Process 17(5):926–934

    Article  Google Scholar 

  • Benajiba Y, Diab M, Rosso P, et al. (2008) Arabic named entity recognition: an svm-based approach. In: Proceedings of 2008 Arab international conference on information technology (ACIT), pp 16–18

  • Benajiba Y, Rosso P (2007) Anersys 2.0: conquering the ner task for the Arabic language by combining the maximum entropy with pos-tag information. In IICAI, pp 1814–1823

  • Benajiba Y, Rosso P (2008) Arabic named entity recognition using conditional random fields. In: Proceedings of Workshop on HLT & NLP within the Arabic World, LREC, volume 8, pp 143–153. Citeseer

  • Benajiba Y, Rosso P, Benedíruiz JM (2007) Anersys: an arabic named entity recognition system based on maximum entropy. In: international conference on intelligent text processing and computational Linguistics, Springer, pp 143–153

  • Benajiba Y, Zitouni I, Diab M, Rosso P (2010) Arabic named entity recognition: using features extracted from noisy data. In: Proceedings of the ACL 2010 conference short papers, pp 281–285. Association for computational Linguistics

  • Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. In: Proceedings of the eleventh annual conference on computational learning theory, pp 92–100. ACM

  • Buckwalter T (2002) Buckwalter Arabic morphological analyzer version 1.0

  • Collobert R, Weston J (2008) A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th international conference on machine learning, pp 160–167. ACM

  • Darwish K (2013) Named entity recognition using cross-lingual resources: Arabic as an example. ACL 1:1558–1567

    Google Scholar 

  • Doddington GR, Mitchell A, Przybocki MA , Ramshaw LA , Strassel S, Weischedel RM (2004) The automatic content extraction (ace) program-tasks, data, and evaluation. In LREC, vol. 2, p. 1

  • El-Haj M, Koulali R (2013) Kalimat a multipurpose arabic corpus. In Second Workshop on Arabic corpus Linguistics (WACL-2), pp 22–25

  • Elrazzaz M, Elbassuoni S, Shaban K, Helwe C (2017) Methodical evaluation of Arabic word embeddings. In: Proceedings of the 55th annual meeting of the association for computational Linguistics (Vol. 2: Short Papers), pp 454–458, Vancouver, Canada, Association for computational Linguistics

  • Elsebai A, Meziane F, Belkredim FZ (2009) A rule based persons names arabic extraction system. Communications of the IBIMA, 11(6):53–59

  • Finkel JR, Grenager T, Manning C (2005) Incorporating non-local information into information extraction systems by gibbs sampling. In: Proceedings of the 43rd annual meeting on association for computational linguistics, pp 363–370. Association for computational Linguistics

  • Gao B, Bian J, Liu TY (2014) Wordrep: a benchmark for research on learning word representations. arXiv preprint arXiv:1407.1640

  • Gers FA, Schmidhuber J (2000) Recurrent nets that time and count. In: Neural networks, 2000. IJCNN 2000, Proceedings of the IEEE-INNS-ENNS international joint conference on, vol. 3, pp 189–194

  • Goodfellow I, Bengio Y, Courville A (2016) Deep learning. Book in preparation for MIT Press

  • Gopee N (2016) Applying recurrent neural network for arabic named entity recognition

  • Gridach M (2016) Character-aware neural networks for Arabic named entity recognition for social media. In: Proceedings of the 6th workshop on South and Southeast Asian natural language processing (WSSANLP2016), pp 23–32

  • Habash Nizar Y (2010) Introduction to Arabic natural language processing. Synth Lect Hum Lang Technol 3(1):1–187

    Article  Google Scholar 

  • Halpern J et al. (2009) Lexicon-driven approach to the recognition of Arabic named entities. In: Second international conference on Arabic language resources and tools

  • Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural comput 9(8):1735–1780

    Article  Google Scholar 

  • Koulali R, Meziane A (2012) A contribution to Arabic named entity recognition. In: ICT and knowledge Engineering (ICT & knowledge Engineering), 2012 10th international conference on IEEE, pp 46–52

  • Lample G, Ballesteros M, Subramanian S, Kawakami K, Dyer C (2016) Neural architectures for named entity recognition. ArXiv preprint arXiv:1603.01360

  • Maamouri M, Bies A, Jin H, Buckwalter T (2010) The penn arabic tree bank. Current implementations in Arabic NLP. CSLI NLP Series, Computational approaches to Arabic script-based languages

  • Maloney J, Niv M (1998) Tagarab: a fast, accurate arabic name recognizer using high-precision morphological analysis. In: Proceedings of the workshop on computational approaches to semitic languages, pp 8–15. Association for computational Linguistics

  • Mayhew S, Tsai CT, Roth D (2017) Cheap translation for cross-lingual named entity recognition. In: Proceedings of the 2017 conference on empirical methods in natural language processing, pp 2536–2545

  • Mesfar S (2007) Named entity recognition for Arabic using syntactic grammars. In natural language processing and information systems, pp 305–316. Springer

  • Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781

  • Mohit B, Schneider N, Bhowmick R, Oflazer K, Smith NA (2012) Recall-oriented learning of named entities in Arabic Wikipedia. In: Proceedings of the 13th conference of the European chapter of the association for computational Linguistics, pp 162–173. Association for Computational Linguistics

  • Nothman J, Ringland N, Radford W, Murphy T, Curran James R (2013) Learning multilingual named entity recognition from Wikipedia. Artif Intell 194:151–175

    Article  MathSciNet  MATH  Google Scholar 

  • Oudah M, Shaalan KF (2012) A pipeline Arabic named entity recognition using a hybrid approach. In COLING, pp 2159–2176

  • Pascanu R, Gulcehre C, Cho K, Bengio Y (2013) How to construct deep recurrent neural networks? ArXiv preprint arXiv:1312.6026

  • Pasha A, Al-Badrashiny M, Diab MT , El Kholy A, Eskander R, Habash N, Pooleery M, Rambow O, Roth R (2014) Madamira: a fast, comprehensive tool for morphological analysis and disambiguation of Arabic. In LREC, vol. 14, pp 1094–1101

  • Pennington J, Socher R, Manning C (2014) Glove: Global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543, Doha, Qatar, Association for Computational Linguistics

  • Pennington J, Socher R, Manning CD (2014) Glove: Global vectors for word representation. In: Empirical methods in natural language processing (EMNLP), pp 1532–1543

  • Richman AE, Schone P (2008) Mining Wiki resources for multilingual named entity recognition. In: Proceedings of ACL-08: HLT, pp 1–9

  • Rosenberg C, Hebert M, Schneiderman H (2005) Semi-supervised self-training of object detection models

  • Samy D, Moreno A, Guirao JM (2005) A proposal for an Arabic named entity tagger leveraging a parallel corpus. In: International conference RANLP, Borovets, Bulgaria, pp 459–465

  • Shaalan K (2014) A survey of Arabic named entity recognition and classification. Comput Linguist 40(2):469–510

    Article  Google Scholar 

  • Shaalan K, Raza H (2007) Person name entity recognition for Arabic. In: Proceedings of the 2007 workshop on computational approaches to semitic languages: common issues and resources, pp 17–24. Association for computational Llinguistics

  • Shaalan K, Raza H (2008) Arabic named entity recognition from diverse text types. In advances in natural language processing, Springer, pp 440–451

  • Shaalan K, Raza H (2009) Nera: named entity recognition for Arabic. J Am Soc Inf Sci Technol 60(8):1652–1663

    Article  Google Scholar 

  • Tieleman T, Hinton G (2012) Lecture 6.5-rmsprop: divide the gradient by a running average of its recent magnitude. COURSERA: neural networks for machine learning, 4(2):26–31

  • Tjong EF, Sang K, De Meulder F (2003) Introduction to the conll-2003 shared task: language-independent named entity recognition. In: Proceedings of the seventh conference on natural language learning at HLT-NAACL 2003, Vol. 4, pp 142–147. Association for computational Linguistics

  • Zaghouani W, Pouliquen B, Ebrahim M, Steinberger R (2010) Adapting a resource-light highly multilingual named entity recognition system to arabic. In LREC

  • Zahran MA, Magooda A, Mahgoub AY , Raafat H, Rashwan M, Atyia A (2015) Word representations in vector space and their applications for Arabic. In: International conference on intelligent text processing and computational Linguistics, Springer, pp 430–443

Download references

Acknowledgements

The authors would like to thank the American University of Beirut Research Board (URB) for funding this project. This work is supported by the American University of Beirut Research Board (URB), award number 103367.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chadi Helwe.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Helwe, C., Elbassuoni, S. Arabic named entity recognition via deep co-learning. Artif Intell Rev 52, 197–215 (2019). https://doi.org/10.1007/s10462-019-09688-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-019-09688-6

Keywords

Navigation