Fine Tuning Features and Post-processing Rules to Improve Named Entity Recognition

Ferrández, Óscar; Toral, Antonio; Muñoz, Rafael

doi:10.1007/11765448_16

Fine Tuning Features and Post-processing Rules to Improve Named Entity Recognition

Óscar Ferrández¹⁸,
Antonio Toral¹⁸ &
Rafael Muñoz¹⁸

Conference paper

496 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3999))

Abstract

This paper presents a Named Entity Recognition (NER) system for Spanish which combines the learning and knowledge approaches. Our contribution focuses on two matters: first, a discussion about selecting the best features for a machine learning NER system. Second, an error study of this system which lead us to the creation of a set of general post-processing rules. These issues are explained in detail and then evaluated. The selection of features provides an improvement of around 2.3% over the results of our previous system while the application of the set of post-processing rules provides an increment of performance which is around 3.6%, reaching finally 83.37% f-score.

This research has been partially funded by the Spanish Government under project CICyT number TIC2003-07158-C04-01.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Arevalo, M., Civit, M., Martí, M.A.: Mice: A module for named entity recognition and clasification. International Journal of Corpus Linguistics 9(1), 53–68 (2004)
Article Google Scholar
Bogers, T.: Dutch named entity recognition: Optimizing features, algorithms, and output. Master’s thesis, Tilburg University (September 2004)
Google Scholar
Borthwick, A., Sterling, J., Agichtein, E., Grishman, R.: Exploiting diverse knowledge sources via maximum entropy in named entity recognition. In: Proceedings of the 6th Workshop on Very Large Corpora, WVLC 1998, Montreal, Canada (1998)
Google Scholar
Carreras, X., Màrques, L., Padró, L.: Named entity extraction using adaboost. In: Proceedings of CoNLL 2002, Taipei, Taiwan, pp. 167–170 (2002)
Google Scholar
Daelemans, W., Zavrel, J., van der Sloot, K., van den Bosch, A.: TiMBL: Tilburg Memory-Based Learner. Technical Report ILK 03-10, Tilburg University (November 2003)
Google Scholar
Ferrández, Ó., Kozareva, Z., Montoyo, A., Muñoz, R.: Nerua: sistema de detección y clasificación de entidades utilizando aprendizaje automático. Procesamiento del Lenguaje Natural 35, 37–44 (2005)
Google Scholar
Florian, R., Ittycheriah, A., Jing, H., Zhang, T.: Named entity recognition through classifier combination. In: Proceedings of CoNLL 2003, Edmonton, Canada, pp. 168–171 (2003)
Google Scholar
Grishman, R., Sundheim, B.: Message understanding conference-6: a brief history. In: Proceedings of the 16th Conference on Computational Linguistics, Copenhagen, Denmark, pp. 466–471 (1996)
Google Scholar
Maynard, D., Tablan, V., Ursu, C., Cunningham, H., Wilks, Y.: Named entity recognition from diverse text types. In: Mitkov, R., Nicolov, N., Angelova, G., Bontcheva, K., Nikolov, N. (eds.) Recent Advances in Natural Language Processing, RANLP 2001, Tzigov Chark, Bulgaria (2001)
Google Scholar
Rössler, M.: Using markov models for named entity recognition in german newspapers. In: Proceedings of the Workshop on Machine Learning Aproaches in Computational Linguistics, Trento, Italy, pp. 29–37 (2002)
Google Scholar
Schröder, I.: A case study in part-of-speech tagging using the icopost toolkit. Technical Report FBI-HH-M-314/02, Department of Computer Science, University of Hamburg (2002)
Google Scholar
Suárez, A., Palomar, M.: A maximum entropy-based word sense disambiguation system. In: Proceedings of the 19th International Conference on Computational Linguistics, COLING 2002, pp. 960–966 (August 2002)
Google Scholar
Tjong Kim Sang, E.F.: Introduction to the conll 2002 shared task: Language-independent named entity recognition. In: Roth, D., van den Bosch, A. (eds.) Proceedings of CoNLL 2002, Taipei, Taiwan, pp. 155–158 (2002)
Google Scholar
Toral, A.: DRAMNERI: a free knowledge based tool to Named Entity Recognition. In: Proceedings of the 1st Free Software Technologies Conference (2005)
Google Scholar
Zhou, G., Su, J.: Named entity recognition using an hmm-based chunk tagger. In: ACL 2002: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, Philadelphia, Pennsylvania, pp. 473–480 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Natural Language Processing and Information Systems Group, Department of Software and Computing Systems, University of Alicante, Spain
Óscar Ferrández, Antonio Toral & Rafael Muñoz

Authors

Óscar Ferrández
View author publications
You can also search for this author in PubMed Google Scholar
Antonio Toral
View author publications
You can also search for this author in PubMed Google Scholar
Rafael Muñoz
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute for Applied Informatics, Alpen-Adria-Universität Klagenfurt, Austria
Christian Kop & Heinrich C. Mayr &
Lab. CEDRIC, CNAM, Paris, France
Elisabeth Métais

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ferrández, Ó., Toral, A., Muñoz, R. (2006). Fine Tuning Features and Post-processing Rules to Improve Named Entity Recognition. In: Kop, C., Fliedl, G., Mayr, H.C., Métais, E. (eds) Natural Language Processing and Information Systems. NLDB 2006. Lecture Notes in Computer Science, vol 3999. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11765448_16

Download citation

DOI: https://doi.org/10.1007/11765448_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-34616-6
Online ISBN: 978-3-540-34617-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics