Skip to main content

Fine Tuning Features and Post-processing Rules to Improve Named Entity Recognition

  • Conference paper
  • 496 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3999))

Abstract

This paper presents a Named Entity Recognition (NER) system for Spanish which combines the learning and knowledge approaches. Our contribution focuses on two matters: first, a discussion about selecting the best features for a machine learning NER system. Second, an error study of this system which lead us to the creation of a set of general post-processing rules. These issues are explained in detail and then evaluated. The selection of features provides an improvement of around 2.3% over the results of our previous system while the application of the set of post-processing rules provides an increment of performance which is around 3.6%, reaching finally 83.37% f-score.

This research has been partially funded by the Spanish Government under project CICyT number TIC2003-07158-C04-01.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Arevalo, M., Civit, M., Martí, M.A.: Mice: A module for named entity recognition and clasification. International Journal of Corpus Linguistics 9(1), 53–68 (2004)

    Article  Google Scholar 

  2. Bogers, T.: Dutch named entity recognition: Optimizing features, algorithms, and output. Master’s thesis, Tilburg University (September 2004)

    Google Scholar 

  3. Borthwick, A., Sterling, J., Agichtein, E., Grishman, R.: Exploiting diverse knowledge sources via maximum entropy in named entity recognition. In: Proceedings of the 6th Workshop on Very Large Corpora, WVLC 1998, Montreal, Canada (1998)

    Google Scholar 

  4. Carreras, X., Màrques, L., Padró, L.: Named entity extraction using adaboost. In: Proceedings of CoNLL 2002, Taipei, Taiwan, pp. 167–170 (2002)

    Google Scholar 

  5. Daelemans, W., Zavrel, J., van der Sloot, K., van den Bosch, A.: TiMBL: Tilburg Memory-Based Learner. Technical Report ILK 03-10, Tilburg University (November 2003)

    Google Scholar 

  6. Ferrández, Ó., Kozareva, Z., Montoyo, A., Muñoz, R.: Nerua: sistema de detección y clasificación de entidades utilizando aprendizaje automático. Procesamiento del Lenguaje Natural 35, 37–44 (2005)

    Google Scholar 

  7. Florian, R., Ittycheriah, A., Jing, H., Zhang, T.: Named entity recognition through classifier combination. In: Proceedings of CoNLL 2003, Edmonton, Canada, pp. 168–171 (2003)

    Google Scholar 

  8. Grishman, R., Sundheim, B.: Message understanding conference-6: a brief history. In: Proceedings of the 16th Conference on Computational Linguistics, Copenhagen, Denmark, pp. 466–471 (1996)

    Google Scholar 

  9. Maynard, D., Tablan, V., Ursu, C., Cunningham, H., Wilks, Y.: Named entity recognition from diverse text types. In: Mitkov, R., Nicolov, N., Angelova, G., Bontcheva, K., Nikolov, N. (eds.) Recent Advances in Natural Language Processing, RANLP 2001, Tzigov Chark, Bulgaria (2001)

    Google Scholar 

  10. Rössler, M.: Using markov models for named entity recognition in german newspapers. In: Proceedings of the Workshop on Machine Learning Aproaches in Computational Linguistics, Trento, Italy, pp. 29–37 (2002)

    Google Scholar 

  11. Schröder, I.: A case study in part-of-speech tagging using the icopost toolkit. Technical Report FBI-HH-M-314/02, Department of Computer Science, University of Hamburg (2002)

    Google Scholar 

  12. Suárez, A., Palomar, M.: A maximum entropy-based word sense disambiguation system. In: Proceedings of the 19th International Conference on Computational Linguistics, COLING 2002, pp. 960–966 (August 2002)

    Google Scholar 

  13. Tjong Kim Sang, E.F.: Introduction to the conll 2002 shared task: Language-independent named entity recognition. In: Roth, D., van den Bosch, A. (eds.) Proceedings of CoNLL 2002, Taipei, Taiwan, pp. 155–158 (2002)

    Google Scholar 

  14. Toral, A.: DRAMNERI: a free knowledge based tool to Named Entity Recognition. In: Proceedings of the 1st Free Software Technologies Conference (2005)

    Google Scholar 

  15. Zhou, G., Su, J.: Named entity recognition using an hmm-based chunk tagger. In: ACL 2002: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, Philadelphia, Pennsylvania, pp. 473–480 (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ferrández, Ó., Toral, A., Muñoz, R. (2006). Fine Tuning Features and Post-processing Rules to Improve Named Entity Recognition. In: Kop, C., Fliedl, G., Mayr, H.C., Métais, E. (eds) Natural Language Processing and Information Systems. NLDB 2006. Lecture Notes in Computer Science, vol 3999. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11765448_16

Download citation

  • DOI: https://doi.org/10.1007/11765448_16

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-34616-6

  • Online ISBN: 978-3-540-34617-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics