Spoken Information Extraction from Italian Broadcast News

Sandrini, Vanessa; Federico, Marcello

doi:10.1007/3-540-36618-0_11

Vanessa Sandrini⁵ &
Marcello Federico⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2633))

Included in the following conference series:

European Conference on Information Retrieval

1246 Accesses
1 Citations

Abstract

Current research on information extraction from spoken documents is mainly focused on the recognition of named entities, such as names of organizations, locations and persons, within transcripts automatically generated by a speech recognizer. In this work we present research carried out at ITC-irst on named entity recognition in Italian broadcast news. In particular, an original statistical named entity tagger is described which can be trained with relatively little language resources: a seed list of named entities and a large untagged text corpus. Moreover, the paper discusses and presents named entity recognition experiments with case sensitive automatic transcripts, generated by the ITC-irst speech recognizer, and by training the named entity model with seed lists of different size.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

N. Bertoldi, F. Brugnara, M. Cettolo, M. Federico, and D. Giuliani. From broadcast news to spontaneous dialogue transcription: portability issues. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Salt Lake City, UT, 2001.
Google Scholar
F. Brugnara and M. Federico. Dynamic language models for interactive speech applications. In Proceedings of the 5th European Conference on Speech Communication and Technology, pages 2751–2754, Rhodes, Greece, 1997.
Google Scholar
N. Chinchor, E. Brown, L. Ferro, and P. Robinson. 1999 Named Entity Recognition Task definition. Technical Report Version 1.4, MITRE, Corp., August 1999. http://www.nist.gov/speech/tests/ie-er/er_99/doc/ne99_taskdef_v1_4.ps.
A. Cucchiarelli, D. Luzi, and P. Velandri. Automatic semantic tagging of unknown proper names. In In Proceedings of COLING-ACL 1998, Montreal, Canada, 1998.
Google Scholar
A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum-likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, B, 39:1–38, 1977.
MATH MathSciNet Google Scholar
M. Federico, N. Bertoldi, and V. Sandrini. Bootstrapping named entity recognition for Italian broadcast news. In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP), Philadelphia, PA, July 2002.
Google Scholar
Y. Gotoh and S. Renals. Information extraction from broadcast news. Journal of the Royal Statistical Society, A, pages 1295–1310, 2000.
Google Scholar
X. Huang, A. Acero, H.-W. Hon, and R. Reddy. Spoken language processing: a guide to theory, algorithm and system development. Prentice Hall, 2001.
Google Scholar
K. Humphreys, R. Gaizauskas, S. Azzam, C. Huyck, B. Mitchell, H. Cunningham, and Y. Wilks. University of Sheffield: description of the LASIE-II system as used for MUC-7. In In Meggase Understanding Conference Proceedings: MUC-7, 1998.
Google Scholar
G. Krupke and K. Hausman. Isoquest Inc: description of the NetOwl(TM) extractor system as used for MUC-7. In In Meggase Understanding Conference Proceedings: MUC-7, 1998.
Google Scholar
A. Mikheev, M. Moens, and C. Grover. Named entity recognition without gazetteers. In In Proceedings. of 9th Conference of the European Chapter of the Association for Computatinal Linguistics, Bergen, Norway, June 1999.
Google Scholar
D. Miller, R. Schwartz, R. Weischedel, and R. Stone. Named entity extraction from broadcast news. In Proceedings of the DARPA Broadcast News Workshop, Herndon, VA, February 1999.
Google Scholar
M. A. Przybocki, J. G. Fiscus, J. S. Garafolo, and D. S. Pallett. 1998 Hub-4 information extraction evaluation. In Proceedings of the DARPA Broadcast News Workshop, Herndon, VA, February 1999.
Google Scholar

Download references

Author information

Authors and Affiliations

ITC-irst — Centro per la Ricerca Scientifica e Tecnologica, 38050, Povo, Trento, Italy
Vanessa Sandrini & Marcello Federico

Authors

Vanessa Sandrini
View author publications
You can also search for this author in PubMed Google Scholar
Marcello Federico
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Instituto di Scienza e Tecnologie dell’Informazione, Consiglio Nazionale delle Ricerche, Via Giuseppe Moruzzi, 1, 56124, Pisa, Italy
Fabrizio Sebastiani

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sandrini, V., Federico, M. (2003). Spoken Information Extraction from Italian Broadcast News. In: Sebastiani, F. (eds) Advances in Information Retrieval. ECIR 2003. Lecture Notes in Computer Science, vol 2633. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36618-0_11

Download citation

DOI: https://doi.org/10.1007/3-540-36618-0_11
Published: 15 April 2003
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-01274-0
Online ISBN: 978-3-540-36618-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics