research-article

Novel methods for query selection and query combination in query-by-example spoken term detection

Authors:
Javier Tejedor

Universidad Autónoma de Madrid, Madrid, Spain

Universidad Autónoma de Madrid, Madrid, Spain
View Profile

,
Igor Szöke

Brno University of Technology, Brno, Czech Rep

Brno University of Technology, Brno, Czech Rep
View Profile

,
Michal Fapso

Brno University of Technology, Brno, Czech Rep

Brno University of Technology, Brno, Czech Rep
View Profile

SSCS '10: Proceedings of the 2010 international workshop on Searching spontaneous conversational speechOctober 2010Pages 15–20https://doi.org/10.1145/1878101.1878106

Published:29 October 2010Publication History

SSCS '10: Proceedings of the 2010 international workshop on Searching spontaneous conversational speech

Pages 15–20

ABSTRACT

Query-by-example (QbE) spoken term detection (STD) is necessary for low-resource scenarios where training material is hardly available and word-based speech recognition systems cannot be employed. We present two novel contributions to QbE STD: the first introduces several criteria to select the optimal example used as query throughout the search system. The second presents a novel feature level example combination to construct a more robust query used during the search. Experiments, tested on with-in language and cross-lingual QbE STD setups, show a significant improvement when the query is selected according to an optimal criterion over when the query is selected randomly for both setups and a significant improvement when several examples are combined to build the input query for the search system compared with the use of the single best example. They also show comparable performance to that of a state-of-the-art acoustic keyword spotting system.

References

D. Can, E. Cooper, A. Sethy, C. White, B. Ramabhadran, and M. Saraclar. Effect of pronunciations on OOV queries in spoken term detection. In Proc. ICASSP, pages 3957--3960, 2009. Google ScholarDigital Library
C. Cieri, D. Miller, and K. Walker. From switchboard to Fisher: Telephone collection protocols, their uses and yields. In Proc. Interspeech, pages 1597--1600, 2003.Google Scholar
F. Grézl, M. Karafiát, and L. Burget. Investigation into bottle-neck features for meeting speech recognition. In Proc. Interspeech, pages 2947--2950, 2009.Google Scholar
T. J. Hazen, W. Shen, and C. M. White. Query-by-example spoken term detection using phonetic posteriorgram templates. In Proc. ASRU, pages 421--426, 2009.Google ScholarCross Ref
J. Mamou, B. Ramabhadran, and O. Siohan. Vocabulary independent spoken term detection. In Proc. ACM-SIGIR, pages 615--622, 2007. Google ScholarDigital Library
K. Ng. Subword-Based Approaches for Spoken Document Retrieval. PhD thesis, MIT, February 2000. Google ScholarDigital Library
NIST. The spoken term detection (STD) 2006 evaluation plan, 10 edition, 2006.Google Scholar
C. Parada, A. Sethy, and B. Ramabhadran. Query-by-example spoken term detection for oov terms. In Proc. ASRU, pages 404--409, 2009.Google ScholarCross Ref
J. Rohlicek, W. Russell, S. Roukos, and H. Gish. Continuous hidden markov modelling for speaker-independent word spotting. In Proc. ICASSP, pages 627--630, 1989.Google ScholarCross Ref
W. Shen, C. M. White, and T. J. Hazen. A comparison of query-by-example methods for spoken term detection. In Proc. Interspeech, pages 2143--2146, 2009.Google Scholar
I. Szöke, P. Schwarz, L. Burget, M. Fapšo, M. Karafiát, J. Černocký, and P. Matějka. Comparison of keyword spotting approaches for informal continuous speech. In Proc. Interspeech, pages 633--636, 2005.Google Scholar
K. Thambiratnam and S. Sridharan. Rapid yet accurate speech indexing using dynamic match lattice spotting. IEEE Transactions on Audio, Speech and Language Processing, 15(1):346--357, January 2007. Google ScholarDigital Library
D. Vergyri, I. Shafran, A. Stolcke, R. R. Gadde, M. Akbacak, B. Roark, and W. Wang. The SRI/OGI 2006 spoken term detection system. In Proc. Interspeech, pages 2393--2396, 2007.Google Scholar
Y. Zhang and J. R. Glass. Unsupervised spoken keyword spotting via segmental dtw on gaussian posteriorgrams. In Proc. ASRU, pages 398--403, 2009.Google ScholarCross Ref

Index Terms

Novel methods for query selection and query combination in query-by-example spoken term detection
1. Information systems
  1. Information retrieval
    1. Information retrieval query processing

Recommendations

Comparison of methods for language-dependent and language-independent query-by-example spoken term detection

This article investigates query-by-example (QbE) spoken term detection (STD), in which the query is not entered as text, but selected in speech data or spoken. Two feature extractors based on neural networks (NN) are introduced: the first producing ...
Read More
Vocal Tract Length Normalization using a Gaussian mixture model framework for query-by-example spoken term detection
Abstract
A speech spectrum is known to be changed by the variations in the length of the vocal tract of a speaker. This is because of the fact that speech formants are inversely related to the vocal tract length (VTL). The process of ...
Read More
A lattice-based approach to query-by-example spoken document retrieval
SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval

Recent efforts on the task of spoken document retrieval (SDR) have made use of speech lattices: speech lattices contain information about alternative speech transcription hypotheses other than the 1-best transcripts, and this information can improve ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SSCS '10: Proceedings of the 2010 international workshop on Searching spontaneous conversational speech
October 2010
72 pages
ISBN:9781450301626
DOI:10.1145/1878101
Program Chairs:
Martha Larson
Delft University of Technology, Netherlands
,
Roeland Ordelman
Netherlands Institute for Sound & Vision and University of Twente, Netherlands
,
Florian Metze
Carnegie Mellon University, USA
,
Franciska de Jong
University of Twente, Netherlands
,
Wessel Kraaij
TNO and Radboud University, Netherlands
Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 29 October 2010
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
query combination
query selection
query-by-example
speech recognition
Qualifiers
- research-article
Conference
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 10
  Total Citations
  View Citations
- 111
  Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Novel methods for query selection and query combination in query-by-example spoken term detection

SSCS '10: Proceedings of the 2010 international workshop on Searching spontaneous conversational speech

ABSTRACT

References

Cited By

Index Terms

Recommendations

Comparison of methods for language-dependent and language-independent query-by-example spoken term detection

Vocal Tract Length Normalization using a Gaussian mixture model framework for query-by-example spoken term detection

A lattice-based approach to query-by-example spoken document retrieval