Skip to main content

Extracting Keyphrases from Spoken Audio Documents

  • Conference paper
  • First Online:
Book cover Information Retrieval Techniques for Speech Applications (IRTSA 2001)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2273))

Included in the following conference series:

Abstract

Spoken audio documents are becoming more and more common on the World Wide Web, and this is likely to be accelerated by the widespread deployment of broadband technologies. Unfortunately, speech documents are inherently hard to browse because of their transient nature. One approach to this problem is to label segments of a spoken document with keyphrases that summarise them. In this paper, we investigate an approach for automatically extracting keyphrases from spoken audio documents. We use a keyphrase extraction system (Extractor) originally developed for text, and apply it to errorful Speech Recognition transcripts, which may contain multiple hypotheses for each of the utterances. We show that keyphrase extraction is an “easier” task than full text transcription and that keyphrases can be extracted with reasonable precision from transcripts with Word Error Rates (WER) as high as 62%. This robustness to noise can be attributed to the fact that keyphrase words have a lower WER than non-keyphrase words and that they tend to have more redundancy in the audio. From this we conclude that keyphrase extraction is feasible for a wide range of spoken documents, including less-than-broadcast casual speech. We also show that including multiple utterance hypotheses does not improve the precision of the extracted keyphrases.

We would like to thank Peter Turney for his helpful insights regarding the Extractor system. Thanks also to Bruno Emond who reviewed an early draft of the paper, and to 6 anonymous reviewers whose comments resulted in a much improved paper

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Arons, B. Speech Skimmer: Interactively Skimming Recorder Speech. Proc. UIST’ 93: ACM Symposium on User Interface Software and Technology. ACM Press. Nov 3-5’ 93. Atlanta. pp. 187–196.

    Google Scholar 

  2. Coden, A., Brown, R. Speech Transcript Analysis for Automatic Search. Proceedings of the 34th Annual Hawaii International Conference on System Sciences. Outrigger Wailea Resort, January 3–6, 2001.

    Google Scholar 

  3. Cooper, J. W., Viswanathan, M., Byron, D., Chan, M. Building Searchable Collections of Enterprise Speech Data. First ACM/IEEE-CS joint conference on Digital libraries. June 24–28, 2001, Roanoke, VA USA. Pp226–234.

    Google Scholar 

  4. Emond, B. Brooks, M., Smith, A. A Broadband Web-based application for Video Sharing and Annotation, ACM Multimedia, Submitted, 2001.

    Google Scholar 

  5. G. W. Furnas, Thomas K. Landauer, L. M. Gomez, and S. T. Dumais. The vocabulary problem in human-system communication. Communications of the ACM, 30(11):964–971, November 1987.

    Article  Google Scholar 

  6. Garofolo, J., Lard, J., Voorhes, E. 2000 TREC-9 Spoken Document Retrieval Track. http://www.nist.gov/speech/sdr2000/

  7. Garofolo, J., Auzanne, C. G. P., Voorhees, E. The TREC Spoken Document Retrieval Track: A Success Story. RIAO 2000, Content-Based Multimedia Information Access, pp. 1–20, 2000

    Google Scholar 

  8. Hirschberg, J., Whittaker, S., Hindle, D., Pereira, F., Singhal, A. Finding Information in Audio: a New Paradigm for Audio Browsing and Retrieval. In Proceeding of the ESCA ETRW Workshop.

    Google Scholar 

  9. Hirst G., St-Onge D. Lexical Chains as representation of context for the detection and correction malapropisms. In C. Fellbaum, editor, WordNet: An electronic lexical database and some of its applications. Cambrige, MA: The MIT Press. 1997.

    Google Scholar 

  10. Kazman, R., Al-Halimi, R., Hunt, W., and Mantei, M., Four Paradigms for Indexing Video Conferences. IEEE Multimedia, Spring 1996, pp. 63–73.

    Google Scholar 

  11. Koumpis, K., Renals, S., Niranjan, M. Extractive Summarization of Voicemail using Lexical and Prosodic Feature Subset Selection. Eurospeech 2001, Aalborg, Denmark, Sept. 2001.

    Google Scholar 

  12. Kurimo, M. Fast latent semantic indexing of spoken documents by using selforganizing maps. In Proc. ICASSP, 2000.

    Google Scholar 

  13. Nakatani, C., Whittaker, S., Hirshberg, J. Now you hear it, now you don’t: Empirical Studies of Audio Browsing Behavior. In Proceedings of the Fifth International Conference on Spoken Language Processing, Sydney, 1998. ICSLP98.

    Google Scholar 

  14. M. Siegler, A. Berger, A., M. Witbrock. Experiments in Spoken Document Retrieval at CMU. Proc. TREC-7, November 1998, page 319.

    Google Scholar 

  15. Silipo, R., Crestani, F. Acoustic Stress and Topic Detection in American English Spoken Sentences. Technical Report TR-00-005. International Computer Science Institute, Berkeley, Ca. March 2000.

    Google Scholar 

  16. Srinivasan, S., Petkovic, D. Phonetic Confusion Matrix Based Spoken Document Retrieval. Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval. July 24–28, 2000, Athens Greece. pp 81–87.

    Google Scholar 

  17. Turney, P. D. Learning to Extract Keyphrases from Text. Technical Report ERB-1057, National Research Council, Institute for Information Technology, 1999.

    Google Scholar 

  18. Turney, P. Learning Algorithms for Keyphrase Extraction. Information-Retrieval. vol.2, no.4; 2000; p.303–6

    Article  Google Scholar 

  19. R. Valenza, T. Robinson, M. Hickey, and R. Tucker. Summarisation of spoken audio through information extraction. In Proceedings of the ESCA workshop: Accessing information in spoken audio, pages 111–116. Cambridge, UK, April. 1999.

    Google Scholar 

  20. K. Zechner and A. Waibel. Minimizing word error rate in textual summaries of spoken language. In Proceedings of NAACL-ANLP-2000, Seattle, WA, May 2000.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Désilets, A., de Bruijn, B., Martin, J. (2002). Extracting Keyphrases from Spoken Audio Documents. In: Coden, A.R., Brown, E.W., Srinivasan, S. (eds) Information Retrieval Techniques for Speech Applications. IRTSA 2001. Lecture Notes in Computer Science, vol 2273. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45637-6_4

Download citation

  • DOI: https://doi.org/10.1007/3-540-45637-6_4

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-43156-5

  • Online ISBN: 978-3-540-45637-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics