ANERsys: An Arabic Named Entity Recognition System Based on Maximum Entropy

Benajiba, Yassine; Rosso, Paolo; BenedíRuiz, José Miguel

doi:10.1007/978-3-540-70939-8_13

ANERsys: An Arabic Named Entity Recognition System Based on Maximum Entropy

Yassine Benajiba¹,
Paolo Rosso¹ &
José Miguel BenedíRuiz¹

Conference paper

1680 Accesses
61 Citations
6 Altmetric

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4394))

Abstract

The task of Named Entity Recognition (NER) allows to identify proper names as well as temporal and numeric expressions, in an open-domain text. NER systems proved to be very important for many tasks in Natural Language Processing (NLP) such as Information Retrieval and Question Answering tasks. Unfortunately, the main efforts to build reliable NER systems for the Arabic language have been made in a commercial frame and the approach used as well as the accuracy of the performance are not known. In this paper, we present ANERsys: a NER system built exclusively for Arabic texts based-on n-grams and maximum entropy. Furthermore, we present both the specific Arabic language dependent heuristic and the gazetteers we used to boost our system. We developed our own training and test corpora (ANERcorp) and gazetteers (ANERgazet) to train, evaluate and boost the implemented technique. A major effort was conducted to make sure all the experiments are carried out in the same framework of the CONLL 2002 conference. We carried out several experiments and the preliminary results showed that this approach allows to tackle successfully the problem of NER for the Arabic language.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Friburger, N., Maurel, D.: Textual Similarity Based on Proper Names. In: (MFIR’2002) at the 25 th ACM SIGIR Conference, Tampere, Finland, pp. 155–167. ACM, New York (2002)
Google Scholar
Sundheim, B.M.: Overview of results of the MUC-6 evaluation. In: Proceedings of the 6th Conference on Message understanding, Columbia, Maryland, November 06-08 (1995)
Google Scholar
Abuleil, S., Evens, M.: Extracting Names from Arabic text for Question-Answering Systems. In: Computers and the Humanities, Springer, Heidelberg (2002)
Google Scholar
Maloney, J., Niv, M.: TAGARAB, A Fast, Accurate Arabic Name Recognizer Using High-Precision Morphological Analysis. In: Proceedings of the Workshop on Computational Approaches to Semitic Languages (1998)
Google Scholar
Bender, O., Och, F.J., Ney, H.: Maximum Entropy Models For Named Entity Recognition. In: Proceedings of CoNLL-2003, Edmonton, Canada (2003)
Google Scholar
Chieu, H.L., Ng, H.T.: Named Entity Recognition with a Maximum Entropy Approach. In: Proceedings of CoNLL-2003, Edmonton, Canada (2003)
Google Scholar
Curran, J.R., Clark, S.: Language Independent NER using a Maximum Entropy Tagger. In: Proceedings of CoNLL-2003, Edmonton, Canada (2003)
Google Scholar
Cucerzan, S., Yarowsky, D.: Language Independent Named Entity Recognition Combining Morphological and Contextual Evidence. In: Proceedings, 1999 Joint SIGDAT Conference on Empirical Methods in NLP and Very Large Corpora, pp. 90–99 (1999)
Google Scholar
Klein, D., Smarr, J., Nguyen, H., Manning, C.D.: Named Entity Recognition with Character-Level Models. In: Proceedings of CoNLL-2003, Edmonton, Canada (2003)
Google Scholar
Malouf, R.: Markov Models for Language-Independent Named Entity Recognition. In: Proceedings of CoNLL-2003, Edmonton, Canada (2003)
Google Scholar
Florian, R., Hassan, H., Ittycheriah, A., Jing, H., Kambhatla, N., Luo, X., Nicolov, N., Roukos, S.: A Statistical Model for Multilingual Entity Detection and Tracking. In: Proceedings of NAACL/HLT (2004)
Google Scholar
Lee, Y-S., Papineni, K., Roukos, S., Emam, O., Hassan, H.: Language Model Based Arabic Word Segmentation. In: Proceedings of the 41st Annual Meeting of the ACL, Sapporo, Japan, pp. 399–406.
Google Scholar
Carreras, X., Marquez, L., Padro, L.: Named Entity Extraction Using AdaBoost. In: Proceedings of CoNLL 2002 Shared Task, Taipei, Taiwan, September (2002)
Google Scholar
Ratnaparkhi, A.: A Simple Introduction to Maximum Entropy Models for Natural Language Processing. Technical Report IRCS-97-08, University of Pennsylvania, Institute for Research in Cognitive Science
Google Scholar
Amaya, F., Benedi, J.M.: Improvement of a Whole Sentence Maximum Entropy Language Model Using Grammatical Features. Association for Computational Linguistics, Toulouse, France, pp. 10-17 (2001)
Google Scholar
Fleischman, M., Kwon, N., Hovy, E.: Maximum Entropy Models for FrameNet Classification. In: Proceedings of the 2003 Conference on Emprical Methods in Natural Language Processing, pp. 49–56 (2003)
Google Scholar
Rosenfeld, R.: A Maximum Entropy Approach to Adaptive Statistical Language Modeling. Computer Speech and Language 10, 187–228 (1996)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Dpto. Sistemas Informáticos y Computación (DSIC), Universidad Politécnica de Valencia, Spain
Yassine Benajiba, Paolo Rosso & José Miguel BenedíRuiz

Authors

Yassine Benajiba
View author publications
You can also search for this author in PubMed Google Scholar
Paolo Rosso
View author publications
You can also search for this author in PubMed Google Scholar
José Miguel BenedíRuiz
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Benajiba, Y., Rosso, P., BenedíRuiz, J.M. (2007). ANERsys: An Arabic Named Entity Recognition System Based on Maximum Entropy. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2007. Lecture Notes in Computer Science, vol 4394. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70939-8_13

Download citation

DOI: https://doi.org/10.1007/978-3-540-70939-8_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-70938-1
Online ISBN: 978-3-540-70939-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics