Abstract
Spoken audio documents are becoming more and more common on the World Wide Web, and this is likely to be accelerated by the widespread deployment of broadband technologies. Unfortunately, speech documents are inherently hard to browse because of their transient nature. One approach to this problem is to label segments of a spoken document with keyphrases that summarise them. In this paper, we investigate an approach for automatically extracting keyphrases from spoken audio documents. We use a keyphrase extraction system (Extractor) originally developed for text, and apply it to errorful Speech Recognition transcripts, which may contain multiple hypotheses for each of the utterances. We show that keyphrase extraction is an “easier” task than full text transcription and that keyphrases can be extracted with reasonable precision from transcripts with Word Error Rates (WER) as high as 62%. This robustness to noise can be attributed to the fact that keyphrase words have a lower WER than non-keyphrase words and that they tend to have more redundancy in the audio. From this we conclude that keyphrase extraction is feasible for a wide range of spoken documents, including less-than-broadcast casual speech. We also show that including multiple utterance hypotheses does not improve the precision of the extracted keyphrases.
We would like to thank Peter Turney for his helpful insights regarding the Extractor system. Thanks also to Bruno Emond who reviewed an early draft of the paper, and to 6 anonymous reviewers whose comments resulted in a much improved paper
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Arons, B. Speech Skimmer: Interactively Skimming Recorder Speech. Proc. UIST’ 93: ACM Symposium on User Interface Software and Technology. ACM Press. Nov 3-5’ 93. Atlanta. pp. 187–196.
Coden, A., Brown, R. Speech Transcript Analysis for Automatic Search. Proceedings of the 34th Annual Hawaii International Conference on System Sciences. Outrigger Wailea Resort, January 3–6, 2001.
Cooper, J. W., Viswanathan, M., Byron, D., Chan, M. Building Searchable Collections of Enterprise Speech Data. First ACM/IEEE-CS joint conference on Digital libraries. June 24–28, 2001, Roanoke, VA USA. Pp226–234.
Emond, B. Brooks, M., Smith, A. A Broadband Web-based application for Video Sharing and Annotation, ACM Multimedia, Submitted, 2001.
G. W. Furnas, Thomas K. Landauer, L. M. Gomez, and S. T. Dumais. The vocabulary problem in human-system communication. Communications of the ACM, 30(11):964–971, November 1987.
Garofolo, J., Lard, J., Voorhes, E. 2000 TREC-9 Spoken Document Retrieval Track. http://www.nist.gov/speech/sdr2000/
Garofolo, J., Auzanne, C. G. P., Voorhees, E. The TREC Spoken Document Retrieval Track: A Success Story. RIAO 2000, Content-Based Multimedia Information Access, pp. 1–20, 2000
Hirschberg, J., Whittaker, S., Hindle, D., Pereira, F., Singhal, A. Finding Information in Audio: a New Paradigm for Audio Browsing and Retrieval. In Proceeding of the ESCA ETRW Workshop.
Hirst G., St-Onge D. Lexical Chains as representation of context for the detection and correction malapropisms. In C. Fellbaum, editor, WordNet: An electronic lexical database and some of its applications. Cambrige, MA: The MIT Press. 1997.
Kazman, R., Al-Halimi, R., Hunt, W., and Mantei, M., Four Paradigms for Indexing Video Conferences. IEEE Multimedia, Spring 1996, pp. 63–73.
Koumpis, K., Renals, S., Niranjan, M. Extractive Summarization of Voicemail using Lexical and Prosodic Feature Subset Selection. Eurospeech 2001, Aalborg, Denmark, Sept. 2001.
Kurimo, M. Fast latent semantic indexing of spoken documents by using selforganizing maps. In Proc. ICASSP, 2000.
Nakatani, C., Whittaker, S., Hirshberg, J. Now you hear it, now you don’t: Empirical Studies of Audio Browsing Behavior. In Proceedings of the Fifth International Conference on Spoken Language Processing, Sydney, 1998. ICSLP98.
M. Siegler, A. Berger, A., M. Witbrock. Experiments in Spoken Document Retrieval at CMU. Proc. TREC-7, November 1998, page 319.
Silipo, R., Crestani, F. Acoustic Stress and Topic Detection in American English Spoken Sentences. Technical Report TR-00-005. International Computer Science Institute, Berkeley, Ca. March 2000.
Srinivasan, S., Petkovic, D. Phonetic Confusion Matrix Based Spoken Document Retrieval. Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval. July 24–28, 2000, Athens Greece. pp 81–87.
Turney, P. D. Learning to Extract Keyphrases from Text. Technical Report ERB-1057, National Research Council, Institute for Information Technology, 1999.
Turney, P. Learning Algorithms for Keyphrase Extraction. Information-Retrieval. vol.2, no.4; 2000; p.303–6
R. Valenza, T. Robinson, M. Hickey, and R. Tucker. Summarisation of spoken audio through information extraction. In Proceedings of the ESCA workshop: Accessing information in spoken audio, pages 111–116. Cambridge, UK, April. 1999.
K. Zechner and A. Waibel. Minimizing word error rate in textual summaries of spoken language. In Proceedings of NAACL-ANLP-2000, Seattle, WA, May 2000.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Désilets, A., de Bruijn, B., Martin, J. (2002). Extracting Keyphrases from Spoken Audio Documents. In: Coden, A.R., Brown, E.W., Srinivasan, S. (eds) Information Retrieval Techniques for Speech Applications. IRTSA 2001. Lecture Notes in Computer Science, vol 2273. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45637-6_4
Download citation
DOI: https://doi.org/10.1007/3-540-45637-6_4
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43156-5
Online ISBN: 978-3-540-45637-7
eBook Packages: Springer Book Archive