Extracting Keyphrases from Spoken Audio Documents

Désilets, Alain; de Bruijn, Berry; Martin, Joel

doi:10.1007/3-540-45637-6_4

Alain Désilets⁶,
Berry de Bruijn⁶ &
Joel Martin⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2273))

Included in the following conference series:

Workshop on Information Retrieval Techniques for Speech Applications

229 Accesses
3 Citations

Abstract

Spoken audio documents are becoming more and more common on the World Wide Web, and this is likely to be accelerated by the widespread deployment of broadband technologies. Unfortunately, speech documents are inherently hard to browse because of their transient nature. One approach to this problem is to label segments of a spoken document with keyphrases that summarise them. In this paper, we investigate an approach for automatically extracting keyphrases from spoken audio documents. We use a keyphrase extraction system (Extractor) originally developed for text, and apply it to errorful Speech Recognition transcripts, which may contain multiple hypotheses for each of the utterances. We show that keyphrase extraction is an “easier” task than full text transcription and that keyphrases can be extracted with reasonable precision from transcripts with Word Error Rates (WER) as high as 62%. This robustness to noise can be attributed to the fact that keyphrase words have a lower WER than non-keyphrase words and that they tend to have more redundancy in the audio. From this we conclude that keyphrase extraction is feasible for a wide range of spoken documents, including less-than-broadcast casual speech. We also show that including multiple utterance hypotheses does not improve the precision of the extracted keyphrases.

We would like to thank Peter Turney for his helpful insights regarding the Extractor system. Thanks also to Bruno Emond who reviewed an early draft of the paper, and to 6 anonymous reviewers whose comments resulted in a much improved paper

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Arons, B. Speech Skimmer: Interactively Skimming Recorder Speech. Proc. UIST’ 93: ACM Symposium on User Interface Software and Technology. ACM Press. Nov 3-5’ 93. Atlanta. pp. 187–196.
Google Scholar
Coden, A., Brown, R. Speech Transcript Analysis for Automatic Search. Proceedings of the 34th Annual Hawaii International Conference on System Sciences. Outrigger Wailea Resort, January 3–6, 2001.
Google Scholar
Cooper, J. W., Viswanathan, M., Byron, D., Chan, M. Building Searchable Collections of Enterprise Speech Data. First ACM/IEEE-CS joint conference on Digital libraries. June 24–28, 2001, Roanoke, VA USA. Pp226–234.
Google Scholar
Emond, B. Brooks, M., Smith, A. A Broadband Web-based application for Video Sharing and Annotation, ACM Multimedia, Submitted, 2001.
Google Scholar
G. W. Furnas, Thomas K. Landauer, L. M. Gomez, and S. T. Dumais. The vocabulary problem in human-system communication. Communications of the ACM, 30(11):964–971, November 1987.
Article Google Scholar
Garofolo, J., Lard, J., Voorhes, E. 2000 TREC-9 Spoken Document Retrieval Track. http://www.nist.gov/speech/sdr2000/
Garofolo, J., Auzanne, C. G. P., Voorhees, E. The TREC Spoken Document Retrieval Track: A Success Story. RIAO 2000, Content-Based Multimedia Information Access, pp. 1–20, 2000
Google Scholar
Hirschberg, J., Whittaker, S., Hindle, D., Pereira, F., Singhal, A. Finding Information in Audio: a New Paradigm for Audio Browsing and Retrieval. In Proceeding of the ESCA ETRW Workshop.
Google Scholar
Hirst G., St-Onge D. Lexical Chains as representation of context for the detection and correction malapropisms. In C. Fellbaum, editor, WordNet: An electronic lexical database and some of its applications. Cambrige, MA: The MIT Press. 1997.
Google Scholar
Kazman, R., Al-Halimi, R., Hunt, W., and Mantei, M., Four Paradigms for Indexing Video Conferences. IEEE Multimedia, Spring 1996, pp. 63–73.
Google Scholar
Koumpis, K., Renals, S., Niranjan, M. Extractive Summarization of Voicemail using Lexical and Prosodic Feature Subset Selection. Eurospeech 2001, Aalborg, Denmark, Sept. 2001.
Google Scholar
Kurimo, M. Fast latent semantic indexing of spoken documents by using selforganizing maps. In Proc. ICASSP, 2000.
Google Scholar
Nakatani, C., Whittaker, S., Hirshberg, J. Now you hear it, now you don’t: Empirical Studies of Audio Browsing Behavior. In Proceedings of the Fifth International Conference on Spoken Language Processing, Sydney, 1998. ICSLP98.
Google Scholar
M. Siegler, A. Berger, A., M. Witbrock. Experiments in Spoken Document Retrieval at CMU. Proc. TREC-7, November 1998, page 319.
Google Scholar
Silipo, R., Crestani, F. Acoustic Stress and Topic Detection in American English Spoken Sentences. Technical Report TR-00-005. International Computer Science Institute, Berkeley, Ca. March 2000.
Google Scholar
Srinivasan, S., Petkovic, D. Phonetic Confusion Matrix Based Spoken Document Retrieval. Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval. July 24–28, 2000, Athens Greece. pp 81–87.
Google Scholar
Turney, P. D. Learning to Extract Keyphrases from Text. Technical Report ERB-1057, National Research Council, Institute for Information Technology, 1999.
Google Scholar
Turney, P. Learning Algorithms for Keyphrase Extraction. Information-Retrieval. vol.2, no.4; 2000; p.303–6
Article Google Scholar
R. Valenza, T. Robinson, M. Hickey, and R. Tucker. Summarisation of spoken audio through information extraction. In Proceedings of the ESCA workshop: Accessing information in spoken audio, pages 111–116. Cambridge, UK, April. 1999.
Google Scholar
K. Zechner and A. Waibel. Minimizing word error rate in textual summaries of spoken language. In Proceedings of NAACL-ANLP-2000, Seattle, WA, May 2000.
Google Scholar

Download references

Author information

Authors and Affiliations

National Research Council of Canada, Bldg M-50, Montreal Road, K1A 0R6, Ottawa, Ont, Canada
Alain Désilets, Berry de Bruijn & Joel Martin

Authors

Alain Désilets
View author publications
You can also search for this author in PubMed Google Scholar
Berry de Bruijn
View author publications
You can also search for this author in PubMed Google Scholar
Joel Martin
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

IBM T.J. Watson Research Center, P.O.Box 704, 10598, Yorktown Heights, NY, USA
Anni R. Coden & Eric W. Brown &
IBM Almaden Research Center, 650 Harry Road, 95120, San Jose, CA, USA
Savitha Srinivasan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Désilets, A., de Bruijn, B., Martin, J. (2002). Extracting Keyphrases from Spoken Audio Documents. In: Coden, A.R., Brown, E.W., Srinivasan, S. (eds) Information Retrieval Techniques for Speech Applications. IRTSA 2001. Lecture Notes in Computer Science, vol 2273. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45637-6_4

Download citation

DOI: https://doi.org/10.1007/3-540-45637-6_4
Published: 22 January 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43156-5
Online ISBN: 978-3-540-45637-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics