Integrating Rule-Based System with Classification for Arabic Named Entity Recognition

Abdallah, Sherief; Shaalan, Khaled; Shoaib, Muhammad

doi:10.1007/978-3-642-28604-9_26

Integrating Rule-Based System with Classification for Arabic Named Entity Recognition

Sherief Abdallah^17,18,
Khaled Shaalan^17,18 &
Muhammad Shoaib¹⁸

Conference paper

2155 Accesses
36 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7181))

Abstract

Named Entity Recognition (NER) is a subtask of information extraction that seeks to recognize and classify named entities in unstructured text into predefined categories such as the names of persons, organizations, locations, etc. The majority of researchers used machine learning, while few researchers used handcrafted rules to solve the NER problem. We focus here on NER for the Arabic language (NERA), an important language with its own distinct challenges. This paper proposes a simple method for integrating machine learning with rule-based systems and implement this proposal using the state-of-the-art rule-based system for NERA. Experimental evaluation shows that our integrated approach increases the F-measure by 8 to 14% when compared to the original (pure) rule based system and the (pure) machine learning approach, and the improvement is statistically significant for different datasets. More importantly, our system outperforms the state-of-the-art machine-learning system in NERA over a benchmark dataset.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abdul Hamid, A., Darwish, K.: Simplified feature set for arabic named entity recognition. In: Proceedings of the 2010 Named Entities Workshop, pp. 110–115. Association for Computational Linguistics, Uppsala (2010), http://www.aclweb.org/anthology/W10-2417
Google Scholar
Attia, M., Toral, A., Tounsi, L., Monachini, M., van Genabith, J.: An automatically built named entity lexicon for arabic. In: Proceedings of the Seventh Conference on International Language Resources and Evaluation (LREC 2010), Valletta, Malta (May 2010)
Google Scholar
Baluja, S., Mittal, V.O., Sukthankar, R.: Applying machine learning for high performance named-entity extraction. Computational Intelligence 16(4), 586–595 (2000)
Article Google Scholar
Benajiba, Y., Diab, M., Rosso, P.: Arabic named entity recognition: An svm-based approach. In: The International Arab Conference on Information Technology, ACIT 2008 (2008)
Google Scholar
Benajiba, Y., Diab, M., Rosso, P.: Arabic named entity recognition using optimized feature sets. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP 2008, pp. 284–293. Association for Computational Linguistics, Morristown (2008)
Chapter Google Scholar
Benajiba, Y., Rosso, P.: Anersys 2.0: Conquering the ner task for the arabic language by combining the maximum entropy with pos-tag information. In: IICAI, pp. 1814–1823 (2007)
Google Scholar
Benajiba, Y., Rosso, P.: Arabic named entity recognition using conditional random fields. In: Workshop on HLT & NLP within the Arabic World. Arabic Language and Local Languages Processing: Status Updates and Prospects (2008)
Google Scholar
Benajiba, Y., Rosso, P., Benedí Ruiz, J.M.: ANERsys: An Arabic Named Entity Recognition System Based on Maximum Entropy. In: Gelbukh, A. (ed.) CICLing 2007. LNCS, vol. 4394, pp. 143–153. Springer, Heidelberg (2007)
Chapter Google Scholar
Biswas, S., Mishra, S.P., Acharya, S., Mohanty, S.: A hybrid oriya named entity recognition system: Harnessing the power of rule. International Journal of Artificial Intelligence and Expert Systems (IJAE) 1, 1–6 (2010)
Google Scholar
Ekbal, A., Bandyopadhyay, S.: Voted ner system using appropriate unlabeled data. In: Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration, NEWS 2009, pp. 202–210. Association for Computational Linguistics, Morristown (2009)
Chapter Google Scholar
Ekbal, A., Bandyopadhyay, S.: Named entity recognition using support vector machine: A language independent approach. International Journal of Electrical, Computer, and Systems Engineering 4(2), 155–170 (2010)
Google Scholar
Habash, N.Y.: Introduction to Arabic Natural Language Processing. Mogran & Claypool Publisher (2010)
Google Scholar
Maloney, J., Niv, M.: Tagarab: a fast, accurate arabic name recognizer using high-precision morphological analysis. In: Proceedings of the Workshop on Computational Approaches to Semitic Languages, Semitic 1998, pp. 8–15. Association for Computational Linguistics, Morristown (1998)
Chapter Google Scholar
Mayfield, J., McNamee, P., Piatko, C.: Named entity recognition using hundreds of thousands of features. In: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003, vol. 4, pp. 184–187. Association for Computational Linguistics, Morristown (2003), http://dx.doi.org/10.3115/1119176.1119205
Chapter Google Scholar
Petasis, G., Vichot, F., Wolinski, F., Paliouras, G., Karkaletsis, V., Spyropoulos, C.D.: Using machine learning to maintain rule-based named-entity recognition and classification systems. In: Proceedings of the 39th Annual Meeting on Association for Computational Linguistics, ACL 2001, pp. 426–433. Association for Computational Linguistics, Morristown (2001)
Chapter Google Scholar
Quinlan, J.R.: C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)
Google Scholar
Shaalan, K., Raza, H.: Arabic Named Entity Recognition from Diverse Text Types. In: Nordström, B., Ranta, A. (eds.) GoTAL 2008. LNCS (LNAI), vol. 5221, pp. 440–451. Springer, Heidelberg (2008)
Chapter Google Scholar
Shaalan, K., Raza, H.: NERA: Named entity recognition for arabic. Journal of the American Society for Information Science and Technology, 1652–1663 (2009)
Google Scholar
Tjong Kim Sang, E.F., De Meulder, F.: Introduction to the conll-2003 shared task: language-independent named entity recognition. In: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003, CONLL 2003, vol. 4, pp. 142–147. Association for Computational Linguistics, Stroudsburg (2003), http://dx.doi.org/10.3115/1119176.1119195
Chapter Google Scholar
Traboulsi, H.: Arabic named entity extraction: A local grammar-based approach. In: Proceedings of the International Multiconference on Computer Science and Information Technology, vol. 4, pp. 139–143 (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Edinburgh, UK
Sherief Abdallah & Khaled Shaalan
British University in Dubai, UAE
Sherief Abdallah, Khaled Shaalan & Muhammad Shoaib

Authors

Sherief Abdallah
View author publications
You can also search for this author in PubMed Google Scholar
Khaled Shaalan
View author publications
You can also search for this author in PubMed Google Scholar
Muhammad Shoaib
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Computing Research (CIC), National Polytechnic Institute (IPN), Mexico City, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Abdallah, S., Shaalan, K., Shoaib, M. (2012). Integrating Rule-Based System with Classification for Arabic Named Entity Recognition. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2012. Lecture Notes in Computer Science, vol 7181. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28604-9_26

Download citation

DOI: https://doi.org/10.1007/978-3-642-28604-9_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28603-2
Online ISBN: 978-3-642-28604-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics