Skip to main content
Log in

Multimedia document retrieval using speech and speaker recognition

  • Original papers
  • Published:
International Journal on Document Analysis and Recognition Aims and scope Submit manuscript

Abstract. Speech and speaker recognition systems are rapidly being deployed in real-world applications. In this paper, we discuss the details of a system and its components for indexing and retrieving multimedia content derived from broadcast news sources. The audio analysis component calls for real-time speech recognition for converting the audio to text and concurrent speaker analysis consisting of the segmentation of audio into acoustically homogeneous sections followed by speaker identification. The output of these two simultaneous processes is used to abstract statistics to automatically build indexes for text-based and speaker-based retrieval without user intervention. The real power of multimedia document processing is the possibility of Boolean queries in the form of combined text- and speaker-based user queries. Retrieval for such queries entails combining the results of individual text and speaker based searches. The underlying techniques discussed here can easily be extended to other speech-centric applications and transactions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Author information

Authors and Affiliations

Authors

Additional information

Received November 14, 1999 / Revised January 21, 2000

Rights and permissions

Reprints and permissions

About this article

Cite this article

Viswanathan, M., Beigi, H., Dharanipragada, S. et al. Multimedia document retrieval using speech and speaker recognition. IJDAR 2, 147–162 (2000). https://doi.org/10.1007/PL00021522

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/PL00021522

Navigation