Skip to main content

ANERsys: An Arabic Named Entity Recognition System Based on Maximum Entropy

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4394))

Abstract

The task of Named Entity Recognition (NER) allows to identify proper names as well as temporal and numeric expressions, in an open-domain text. NER systems proved to be very important for many tasks in Natural Language Processing (NLP) such as Information Retrieval and Question Answering tasks. Unfortunately, the main efforts to build reliable NER systems for the Arabic language have been made in a commercial frame and the approach used as well as the accuracy of the performance are not known. In this paper, we present ANERsys: a NER system built exclusively for Arabic texts based-on n-grams and maximum entropy. Furthermore, we present both the specific Arabic language dependent heuristic and the gazetteers we used to boost our system. We developed our own training and test corpora (ANERcorp) and gazetteers (ANERgazet) to train, evaluate and boost the implemented technique. A major effort was conducted to make sure all the experiments are carried out in the same framework of the CONLL 2002 conference. We carried out several experiments and the preliminary results showed that this approach allows to tackle successfully the problem of NER for the Arabic language.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Friburger, N., Maurel, D.: Textual Similarity Based on Proper Names. In: (MFIR’2002) at the 25 th ACM SIGIR Conference, Tampere, Finland, pp. 155–167. ACM, New York (2002)

    Google Scholar 

  2. Sundheim, B.M.: Overview of results of the MUC-6 evaluation. In: Proceedings of the 6th Conference on Message understanding, Columbia, Maryland, November 06-08 (1995)

    Google Scholar 

  3. Abuleil, S., Evens, M.: Extracting Names from Arabic text for Question-Answering Systems. In: Computers and the Humanities, Springer, Heidelberg (2002)

    Google Scholar 

  4. Maloney, J., Niv, M.: TAGARAB, A Fast, Accurate Arabic Name Recognizer Using High-Precision Morphological Analysis. In: Proceedings of the Workshop on Computational Approaches to Semitic Languages (1998)

    Google Scholar 

  5. Bender, O., Och, F.J., Ney, H.: Maximum Entropy Models For Named Entity Recognition. In: Proceedings of CoNLL-2003, Edmonton, Canada (2003)

    Google Scholar 

  6. Chieu, H.L., Ng, H.T.: Named Entity Recognition with a Maximum Entropy Approach. In: Proceedings of CoNLL-2003, Edmonton, Canada (2003)

    Google Scholar 

  7. Curran, J.R., Clark, S.: Language Independent NER using a Maximum Entropy Tagger. In: Proceedings of CoNLL-2003, Edmonton, Canada (2003)

    Google Scholar 

  8. Cucerzan, S., Yarowsky, D.: Language Independent Named Entity Recognition Combining Morphological and Contextual Evidence. In: Proceedings, 1999 Joint SIGDAT Conference on Empirical Methods in NLP and Very Large Corpora, pp. 90–99 (1999)

    Google Scholar 

  9. Klein, D., Smarr, J., Nguyen, H., Manning, C.D.: Named Entity Recognition with Character-Level Models. In: Proceedings of CoNLL-2003, Edmonton, Canada (2003)

    Google Scholar 

  10. Malouf, R.: Markov Models for Language-Independent Named Entity Recognition. In: Proceedings of CoNLL-2003, Edmonton, Canada (2003)

    Google Scholar 

  11. Florian, R., Hassan, H., Ittycheriah, A., Jing, H., Kambhatla, N., Luo, X., Nicolov, N., Roukos, S.: A Statistical Model for Multilingual Entity Detection and Tracking. In: Proceedings of NAACL/HLT (2004)

    Google Scholar 

  12. Lee, Y-S., Papineni, K., Roukos, S., Emam, O., Hassan, H.: Language Model Based Arabic Word Segmentation. In: Proceedings of the 41st Annual Meeting of the ACL, Sapporo, Japan, pp. 399–406.

    Google Scholar 

  13. Carreras, X., Marquez, L., Padro, L.: Named Entity Extraction Using AdaBoost. In: Proceedings of CoNLL 2002 Shared Task, Taipei, Taiwan, September (2002)

    Google Scholar 

  14. Ratnaparkhi, A.: A Simple Introduction to Maximum Entropy Models for Natural Language Processing. Technical Report IRCS-97-08, University of Pennsylvania, Institute for Research in Cognitive Science

    Google Scholar 

  15. Amaya, F., Benedi, J.M.: Improvement of a Whole Sentence Maximum Entropy Language Model Using Grammatical Features. Association for Computational Linguistics, Toulouse, France, pp. 10-17 (2001)

    Google Scholar 

  16. Fleischman, M., Kwon, N., Hovy, E.: Maximum Entropy Models for FrameNet Classification. In: Proceedings of the 2003 Conference on Emprical Methods in Natural Language Processing, pp. 49–56 (2003)

    Google Scholar 

  17. Rosenfeld, R.: A Maximum Entropy Approach to Adaptive Statistical Language Modeling. Computer Speech and Language 10, 187–228 (1996)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Benajiba, Y., Rosso, P., BenedíRuiz, J.M. (2007). ANERsys: An Arabic Named Entity Recognition System Based on Maximum Entropy. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2007. Lecture Notes in Computer Science, vol 4394. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70939-8_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-70939-8_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-70938-1

  • Online ISBN: 978-3-540-70939-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics