The Tanl Tagger for Named Entity Recognition on Transcribed Broadcast News at Evalita 2011

Attardi, Giuseppe; Berardi, Giacomo; Dei Rossi, Stefano; Simi, Maria

doi:10.1007/978-3-642-35828-9_13

Giuseppe Attardi²³,
Giacomo Berardi²³,
Stefano Dei Rossi²³ &
…
Maria Simi²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7689))

Included in the following conference series:

International Workshop on Evaluation of Natural Language and Speech Tool for Italian

691 Accesses
1 Citations

Abstract

The Tanl tagger is a flexible sequence labeller based on Conditional Markov Model that can be configured to use different classifiers and to extract features according to feature templates expressed through patterns provided in a configuration file. The Tanl Tagger was applied to the task of Named Entity Recognition (NER) on Transcribed Broadcast News of Evalita 2011. The goal of the task was to identify named entities within texts produced by an Automatic Speech Recognition (ASR) system. Since such texts do not provide capitalization, punctuation or even sentence segmentation and transcription is often noisy, this represents a challenge for state of the art NER tools. We report on the results of our experiments using the Tanl Tagger as well as another widely available tagger in both the closed and open modalities.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Attardi, G., Dei Rossi, S., Di Pietro, G., Lenci, A., Montemagni, S., Simi, M.: A Resource and Tool for Super-sense Tagging of Italian Texts. In: Proc. of 7th Language Resources and Evaluation Conference, Malta (2010)
Google Scholar
Attardi, G., Dei Rossi, S., Simi, M.: The Tanl Pipeline. In: Proc. of Workshop on Web Services and Processing Pipelines in HLT, Malta (2010)
Google Scholar
Attardi, G., Fuschetto, A., Tamberi, F., Simi, M., Vecchi, E.M.: Experiments in tagger combination: arbitrating, guessing, correcting, suggesting. In: Proc. of Workshop Evalita (2009) ISBN 978-88-903581-1-1
Google Scholar
Baroni, M., Bernardini, S., Comastri, F., Piccioni, L., Volpi, A., Aston, G., Mazzoleni, M.: Introducing the “La Repubblica” Corpus: a Large Annotated TEI (XML)–compliant corpus of newspaper italian. In: Proc. of LREC 2004, Lisbon, ELDA, pp. 1771–1774 (2004)
Google Scholar
Bartalesi Lenzi, V., Speranza, M., Sprugnoli, R.: EVALITA 2011: Description and Results of the Named Entity Recognition on Transcribed Broadcast News Task. In: Working Notes of Evalita 2011, Rome, Italy, January 24-25 (2012) ISSN 2240-5186
Google Scholar
Chieu, H.L., Ng, H.T.: Named Entity Recognition with a Maximum Entropy Approach. In: Proc. of CoNLL 2003, Edmonton, Canada, pp. 160–163 (2003)
Google Scholar
Finkel, J.R., Grenager, T., Manning, C.: Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling. In: Proc. of the 43nd Annual Meeting of the Association for Computational Linguistics, pp. 363–370 (2005)
Google Scholar
Galibert, O., et al.: Structured and Extended Named Entity Evaluationin Automatic Speech Transcriptions. In: Proc. of the 5th International Joint Conference on Natural Language Processing, pp. 518–526. AFNLP (2011)
Google Scholar
Halácsy, P., Kornai, A., Oravecz, C.: HunPos– an open source trigram tagger. In: Proc. of the Demo and Poster Sessions of the 45th Annual Meeting of the ACL, Prague, Czech Republic, pp. 209–212 (2007)
Google Scholar
Magnini, B., Pianta, E., Girardi, C., Negri, M., Romano, L., Speranza, M., Bartalesi Lenzi, V., Sprugnoli, R.: I-CAB, the Italian Content Annotation Bank. In: Proc. of LREC 2006, Genoa, Italy (2006)
Google Scholar
McCallum, A., Freitag, D., Pereira, F.: Maximum Entropy Markov Models for Information Extraction and Segmentation. In: Proc. ICML 2000, pp. 591–598 (2001)
Google Scholar
Roventini, A., Alonge, A., Calzolari, N., Magnini, B., Bertagna, F.: ItalWordNet: a Large Semantic Database for Italian. In: Proceedings LREC 2000, Athens (2000)
Google Scholar
Yu, H.F., Hsieh, C.J., Chang, K.W., Lin, C.J.: Large linear classification when data can-not fit in memory. ACM Trans. on Knowledge Discovery from Data 5(23), 1–23 (2012)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Dipartimento di Informatica, Università di Pisa, Largo B. Pontecovo 3, I-56127, Pisa, Italy
Giuseppe Attardi, Giacomo Berardi, Stefano Dei Rossi & Maria Simi

Authors

Giuseppe Attardi
View author publications
You can also search for this author in PubMed Google Scholar
Giacomo Berardi
View author publications
You can also search for this author in PubMed Google Scholar
Stefano Dei Rossi
View author publications
You can also search for this author in PubMed Google Scholar
Maria Simi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Fondazione Bruno Kessler, Via Sommarive 18, 38123, Povo, TN, Italy
Bernardo Magnini
University of Naples, Via Cinthia, 80126, Napoli, NA, Italy
Francesco Cutugno
Fondazione Ugo Bordoni, Viale del Policlinico, 161, Roma, Italy
Mauro Falcone
CELCT, Via alla Cascata, 38123, Povo, TN, Italy
Emanuele Pianta

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Attardi, G., Berardi, G., Dei Rossi, S., Simi, M. (2013). The Tanl Tagger for Named Entity Recognition on Transcribed Broadcast News at Evalita 2011. In: Magnini, B., Cutugno, F., Falcone, M., Pianta, E. (eds) Evaluation of Natural Language and Speech Tools for Italian. EVALITA 2012. Lecture Notes in Computer Science(), vol 7689. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35828-9_13

Download citation

DOI: https://doi.org/10.1007/978-3-642-35828-9_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35827-2
Online ISBN: 978-3-642-35828-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics