Skip to main content

The Tanl Tagger for Named Entity Recognition on Transcribed Broadcast News at Evalita 2011

  • Conference paper
Evaluation of Natural Language and Speech Tools for Italian (EVALITA 2012)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7689))

Abstract

The Tanl tagger is a flexible sequence labeller based on Conditional Markov Model that can be configured to use different classifiers and to extract features according to feature templates expressed through patterns provided in a configuration file. The Tanl Tagger was applied to the task of Named Entity Recognition (NER) on Transcribed Broadcast News of Evalita 2011. The goal of the task was to identify named entities within texts produced by an Automatic Speech Recognition (ASR) system. Since such texts do not provide capitalization, punctuation or even sentence segmentation and transcription is often noisy, this represents a challenge for state of the art NER tools. We report on the results of our experiments using the Tanl Tagger as well as another widely available tagger in both the closed and open modalities.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Attardi, G., Dei Rossi, S., Di Pietro, G., Lenci, A., Montemagni, S., Simi, M.: A Resource and Tool for Super-sense Tagging of Italian Texts. In: Proc. of 7th Language Resources and Evaluation Conference, Malta (2010)

    Google Scholar 

  2. Attardi, G., Dei Rossi, S., Simi, M.: The Tanl Pipeline. In: Proc. of Workshop on Web Services and Processing Pipelines in HLT, Malta (2010)

    Google Scholar 

  3. Attardi, G., Fuschetto, A., Tamberi, F., Simi, M., Vecchi, E.M.: Experiments in tagger combination: arbitrating, guessing, correcting, suggesting. In: Proc. of Workshop Evalita (2009) ISBN 978-88-903581-1-1

    Google Scholar 

  4. Baroni, M., Bernardini, S., Comastri, F., Piccioni, L., Volpi, A., Aston, G., Mazzoleni, M.: Introducing the “La Repubblica” Corpus: a Large Annotated TEI (XML)–compliant corpus of newspaper italian. In: Proc. of LREC 2004, Lisbon, ELDA, pp. 1771–1774 (2004)

    Google Scholar 

  5. Bartalesi Lenzi, V., Speranza, M., Sprugnoli, R.: EVALITA 2011: Description and Results of the Named Entity Recognition on Transcribed Broadcast News Task. In: Working Notes of Evalita 2011, Rome, Italy, January 24-25 (2012) ISSN 2240-5186

    Google Scholar 

  6. Chieu, H.L., Ng, H.T.: Named Entity Recognition with a Maximum Entropy Approach. In: Proc. of CoNLL 2003, Edmonton, Canada, pp. 160–163 (2003)

    Google Scholar 

  7. Finkel, J.R., Grenager, T., Manning, C.: Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling. In: Proc. of the 43nd Annual Meeting of the Association for Computational Linguistics, pp. 363–370 (2005)

    Google Scholar 

  8. Galibert, O., et al.: Structured and Extended Named Entity Evaluationin Automatic Speech Transcriptions. In: Proc. of the 5th International Joint Conference on Natural Language Processing, pp. 518–526. AFNLP (2011)

    Google Scholar 

  9. Halácsy, P., Kornai, A., Oravecz, C.: HunPos– an open source trigram tagger. In: Proc. of the Demo and Poster Sessions of the 45th Annual Meeting of the ACL, Prague, Czech Republic, pp. 209–212 (2007)

    Google Scholar 

  10. Magnini, B., Pianta, E., Girardi, C., Negri, M., Romano, L., Speranza, M., Bartalesi Lenzi, V., Sprugnoli, R.: I-CAB, the Italian Content Annotation Bank. In: Proc. of LREC 2006, Genoa, Italy (2006)

    Google Scholar 

  11. McCallum, A., Freitag, D., Pereira, F.: Maximum Entropy Markov Models for Information Extraction and Segmentation. In: Proc. ICML 2000, pp. 591–598 (2001)

    Google Scholar 

  12. Roventini, A., Alonge, A., Calzolari, N., Magnini, B., Bertagna, F.: ItalWordNet: a Large Semantic Database for Italian. In: Proceedings LREC 2000, Athens (2000)

    Google Scholar 

  13. Yu, H.F., Hsieh, C.J., Chang, K.W., Lin, C.J.: Large linear classification when data can-not fit in memory. ACM Trans. on Knowledge Discovery from Data 5(23), 1–23 (2012)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Attardi, G., Berardi, G., Dei Rossi, S., Simi, M. (2013). The Tanl Tagger for Named Entity Recognition on Transcribed Broadcast News at Evalita 2011. In: Magnini, B., Cutugno, F., Falcone, M., Pianta, E. (eds) Evaluation of Natural Language and Speech Tools for Italian. EVALITA 2012. Lecture Notes in Computer Science(), vol 7689. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35828-9_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-35828-9_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-35827-2

  • Online ISBN: 978-3-642-35828-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics