Abstract
Current research on information extraction from spoken documents is mainly focused on the recognition of named entities, such as names of organizations, locations and persons, within transcripts automatically generated by a speech recognizer. In this work we present research carried out at ITC-irst on named entity recognition in Italian broadcast news. In particular, an original statistical named entity tagger is described which can be trained with relatively little language resources: a seed list of named entities and a large untagged text corpus. Moreover, the paper discusses and presents named entity recognition experiments with case sensitive automatic transcripts, generated by the ITC-irst speech recognizer, and by training the named entity model with seed lists of different size.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
N. Bertoldi, F. Brugnara, M. Cettolo, M. Federico, and D. Giuliani. From broadcast news to spontaneous dialogue transcription: portability issues. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Salt Lake City, UT, 2001.
F. Brugnara and M. Federico. Dynamic language models for interactive speech applications. In Proceedings of the 5th European Conference on Speech Communication and Technology, pages 2751–2754, Rhodes, Greece, 1997.
N. Chinchor, E. Brown, L. Ferro, and P. Robinson. 1999 Named Entity Recognition Task definition. Technical Report Version 1.4, MITRE, Corp., August 1999. http://www.nist.gov/speech/tests/ie-er/er_99/doc/ne99_taskdef_v1_4.ps.
A. Cucchiarelli, D. Luzi, and P. Velandri. Automatic semantic tagging of unknown proper names. In In Proceedings of COLING-ACL 1998, Montreal, Canada, 1998.
A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum-likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, B, 39:1–38, 1977.
M. Federico, N. Bertoldi, and V. Sandrini. Bootstrapping named entity recognition for Italian broadcast news. In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP), Philadelphia, PA, July 2002.
Y. Gotoh and S. Renals. Information extraction from broadcast news. Journal of the Royal Statistical Society, A, pages 1295–1310, 2000.
X. Huang, A. Acero, H.-W. Hon, and R. Reddy. Spoken language processing: a guide to theory, algorithm and system development. Prentice Hall, 2001.
K. Humphreys, R. Gaizauskas, S. Azzam, C. Huyck, B. Mitchell, H. Cunningham, and Y. Wilks. University of Sheffield: description of the LASIE-II system as used for MUC-7. In In Meggase Understanding Conference Proceedings: MUC-7, 1998.
G. Krupke and K. Hausman. Isoquest Inc: description of the NetOwl(TM) extractor system as used for MUC-7. In In Meggase Understanding Conference Proceedings: MUC-7, 1998.
A. Mikheev, M. Moens, and C. Grover. Named entity recognition without gazetteers. In In Proceedings. of 9th Conference of the European Chapter of the Association for Computatinal Linguistics, Bergen, Norway, June 1999.
D. Miller, R. Schwartz, R. Weischedel, and R. Stone. Named entity extraction from broadcast news. In Proceedings of the DARPA Broadcast News Workshop, Herndon, VA, February 1999.
M. A. Przybocki, J. G. Fiscus, J. S. Garafolo, and D. S. Pallett. 1998 Hub-4 information extraction evaluation. In Proceedings of the DARPA Broadcast News Workshop, Herndon, VA, February 1999.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sandrini, V., Federico, M. (2003). Spoken Information Extraction from Italian Broadcast News. In: Sebastiani, F. (eds) Advances in Information Retrieval. ECIR 2003. Lecture Notes in Computer Science, vol 2633. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36618-0_11
Download citation
DOI: https://doi.org/10.1007/3-540-36618-0_11
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-01274-0
Online ISBN: 978-3-540-36618-8
eBook Packages: Springer Book Archive