Abstract
The task of Named Entity Recognition (NER) allows to identify proper names as well as temporal and numeric expressions, in an open-domain text. NER systems proved to be very important for many tasks in Natural Language Processing (NLP) such as Information Retrieval and Question Answering tasks. Unfortunately, the main efforts to build reliable NER systems for the Arabic language have been made in a commercial frame and the approach used as well as the accuracy of the performance are not known. In this paper, we present ANERsys: a NER system built exclusively for Arabic texts based-on n-grams and maximum entropy. Furthermore, we present both the specific Arabic language dependent heuristic and the gazetteers we used to boost our system. We developed our own training and test corpora (ANERcorp) and gazetteers (ANERgazet) to train, evaluate and boost the implemented technique. A major effort was conducted to make sure all the experiments are carried out in the same framework of the CONLL 2002 conference. We carried out several experiments and the preliminary results showed that this approach allows to tackle successfully the problem of NER for the Arabic language.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Friburger, N., Maurel, D.: Textual Similarity Based on Proper Names. In: (MFIR’2002) at the 25 th ACM SIGIR Conference, Tampere, Finland, pp. 155–167. ACM, New York (2002)
Sundheim, B.M.: Overview of results of the MUC-6 evaluation. In: Proceedings of the 6th Conference on Message understanding, Columbia, Maryland, November 06-08 (1995)
Abuleil, S., Evens, M.: Extracting Names from Arabic text for Question-Answering Systems. In: Computers and the Humanities, Springer, Heidelberg (2002)
Maloney, J., Niv, M.: TAGARAB, A Fast, Accurate Arabic Name Recognizer Using High-Precision Morphological Analysis. In: Proceedings of the Workshop on Computational Approaches to Semitic Languages (1998)
Bender, O., Och, F.J., Ney, H.: Maximum Entropy Models For Named Entity Recognition. In: Proceedings of CoNLL-2003, Edmonton, Canada (2003)
Chieu, H.L., Ng, H.T.: Named Entity Recognition with a Maximum Entropy Approach. In: Proceedings of CoNLL-2003, Edmonton, Canada (2003)
Curran, J.R., Clark, S.: Language Independent NER using a Maximum Entropy Tagger. In: Proceedings of CoNLL-2003, Edmonton, Canada (2003)
Cucerzan, S., Yarowsky, D.: Language Independent Named Entity Recognition Combining Morphological and Contextual Evidence. In: Proceedings, 1999 Joint SIGDAT Conference on Empirical Methods in NLP and Very Large Corpora, pp. 90–99 (1999)
Klein, D., Smarr, J., Nguyen, H., Manning, C.D.: Named Entity Recognition with Character-Level Models. In: Proceedings of CoNLL-2003, Edmonton, Canada (2003)
Malouf, R.: Markov Models for Language-Independent Named Entity Recognition. In: Proceedings of CoNLL-2003, Edmonton, Canada (2003)
Florian, R., Hassan, H., Ittycheriah, A., Jing, H., Kambhatla, N., Luo, X., Nicolov, N., Roukos, S.: A Statistical Model for Multilingual Entity Detection and Tracking. In: Proceedings of NAACL/HLT (2004)
Lee, Y-S., Papineni, K., Roukos, S., Emam, O., Hassan, H.: Language Model Based Arabic Word Segmentation. In: Proceedings of the 41st Annual Meeting of the ACL, Sapporo, Japan, pp. 399–406.
Carreras, X., Marquez, L., Padro, L.: Named Entity Extraction Using AdaBoost. In: Proceedings of CoNLL 2002 Shared Task, Taipei, Taiwan, September (2002)
Ratnaparkhi, A.: A Simple Introduction to Maximum Entropy Models for Natural Language Processing. Technical Report IRCS-97-08, University of Pennsylvania, Institute for Research in Cognitive Science
Amaya, F., Benedi, J.M.: Improvement of a Whole Sentence Maximum Entropy Language Model Using Grammatical Features. Association for Computational Linguistics, Toulouse, France, pp. 10-17 (2001)
Fleischman, M., Kwon, N., Hovy, E.: Maximum Entropy Models for FrameNet Classification. In: Proceedings of the 2003 Conference on Emprical Methods in Natural Language Processing, pp. 49–56 (2003)
Rosenfeld, R.: A Maximum Entropy Approach to Adaptive Statistical Language Modeling. Computer Speech and Language 10, 187–228 (1996)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Benajiba, Y., Rosso, P., BenedíRuiz, J.M. (2007). ANERsys: An Arabic Named Entity Recognition System Based on Maximum Entropy. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2007. Lecture Notes in Computer Science, vol 4394. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70939-8_13
Download citation
DOI: https://doi.org/10.1007/978-3-540-70939-8_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-70938-1
Online ISBN: 978-3-540-70939-8
eBook Packages: Computer ScienceComputer Science (R0)