Multimedia document retrieval using speech and speaker recognition

Viswanathan, Mahesh; Beigi, Homayoon S.M.; Dharanipragada, Satya; Maali, Fereydoun; Tritschler, Alain

doi:10.1007/PL00021522

Multimedia document retrieval using speech and speaker recognition

Original papers
Published: June 2000

Volume 2, pages 147–162, (2000)
Cite this article

International Journal on Document Analysis and Recognition Aims and scope Submit manuscript

Mahesh Viswanathan¹,
Homayoon S.M. Beigi¹,
Satya Dharanipragada¹,
Fereydoun Maali² &
…
Alain Tritschler¹

104 Accesses
2 Citations
3 Altmetric
Explore all metrics

Abstract. Speech and speaker recognition systems are rapidly being deployed in real-world applications. In this paper, we discuss the details of a system and its components for indexing and retrieving multimedia content derived from broadcast news sources. The audio analysis component calls for real-time speech recognition for converting the audio to text and concurrent speaker analysis consisting of the segmentation of audio into acoustically homogeneous sections followed by speaker identification. The output of these two simultaneous processes is used to abstract statistics to automatically build indexes for text-based and speaker-based retrieval without user intervention. The real power of multimedia document processing is the possibility of Boolean queries in the form of combined text- and speaker-based user queries. Retrieval for such queries entails combining the results of individual text and speaker based searches. The underlying techniques discussed here can easily be extended to other speech-centric applications and transactions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Author information

Authors and Affiliations

IBM T.J. Watson Research Center, P.O. Box 218, Yorktown Heights, NY 10598, USA; e-mail: {maheshv, beigi, dsatya}@watson.ibm.com, trischl@ifrance.com , , , , , , US
Mahesh Viswanathan, Homayoon S.M. Beigi, Satya Dharanipragada & Alain Tritschler
Signal Recognition Corporation, P.O. Box 7010, New York, NY 10128, USA; e-mail: maali-sigrec@worldnet.att.net , , , , , , US
Fereydoun Maali

Authors

Mahesh Viswanathan
View author publications
You can also search for this author inPubMed Google Scholar
Homayoon S.M. Beigi
View author publications
You can also search for this author inPubMed Google Scholar
Satya Dharanipragada
View author publications
You can also search for this author inPubMed Google Scholar
Fereydoun Maali
View author publications
You can also search for this author inPubMed Google Scholar
Alain Tritschler
View author publications
You can also search for this author inPubMed Google Scholar

Additional information

Received November 14, 1999 / Revised January 21, 2000

Rights and permissions

Reprints and permissions

About this article

Cite this article

Viswanathan, M., Beigi, H., Dharanipragada, S. et al. Multimedia document retrieval using speech and speaker recognition. IJDAR 2, 147–162 (2000). https://doi.org/10.1007/PL00021522

Download citation

Issue Date: June 2000
DOI: https://doi.org/10.1007/PL00021522

Key words:Audio indexing – Speech recognition – Speaker recognition – Speaker segmentation – Spoken document analysis

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multimedia document retrieval using speech and speaker recognition

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Efficient speaker identification using spectral entropy

Statistical language models for query-by-example spoken document retrieval

Introduction

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Subscribe and save

Buy Now

Multimedia document retrieval using speech and speaker recognition

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Efficient speaker identification using spectral entropy

Statistical language models for query-by-example spoken document retrieval

Introduction

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Subscribe and save

Buy Now