ABSTRACT
Query-by-example (QbE) spoken term detection (STD) is necessary for low-resource scenarios where training material is hardly available and word-based speech recognition systems cannot be employed. We present two novel contributions to QbE STD: the first introduces several criteria to select the optimal example used as query throughout the search system. The second presents a novel feature level example combination to construct a more robust query used during the search. Experiments, tested on with-in language and cross-lingual QbE STD setups, show a significant improvement when the query is selected according to an optimal criterion over when the query is selected randomly for both setups and a significant improvement when several examples are combined to build the input query for the search system compared with the use of the single best example. They also show comparable performance to that of a state-of-the-art acoustic keyword spotting system.
- D. Can, E. Cooper, A. Sethy, C. White, B. Ramabhadran, and M. Saraclar. Effect of pronunciations on OOV queries in spoken term detection. In Proc. ICASSP, pages 3957--3960, 2009. Google ScholarDigital Library
- C. Cieri, D. Miller, and K. Walker. From switchboard to Fisher: Telephone collection protocols, their uses and yields. In Proc. Interspeech, pages 1597--1600, 2003.Google Scholar
- F. Grézl, M. Karafiát, and L. Burget. Investigation into bottle-neck features for meeting speech recognition. In Proc. Interspeech, pages 2947--2950, 2009.Google Scholar
- T. J. Hazen, W. Shen, and C. M. White. Query-by-example spoken term detection using phonetic posteriorgram templates. In Proc. ASRU, pages 421--426, 2009.Google ScholarCross Ref
- J. Mamou, B. Ramabhadran, and O. Siohan. Vocabulary independent spoken term detection. In Proc. ACM-SIGIR, pages 615--622, 2007. Google ScholarDigital Library
- K. Ng. Subword-Based Approaches for Spoken Document Retrieval. PhD thesis, MIT, February 2000. Google ScholarDigital Library
- NIST. The spoken term detection (STD) 2006 evaluation plan, 10 edition, 2006.Google Scholar
- C. Parada, A. Sethy, and B. Ramabhadran. Query-by-example spoken term detection for oov terms. In Proc. ASRU, pages 404--409, 2009.Google ScholarCross Ref
- J. Rohlicek, W. Russell, S. Roukos, and H. Gish. Continuous hidden markov modelling for speaker-independent word spotting. In Proc. ICASSP, pages 627--630, 1989.Google ScholarCross Ref
- W. Shen, C. M. White, and T. J. Hazen. A comparison of query-by-example methods for spoken term detection. In Proc. Interspeech, pages 2143--2146, 2009.Google Scholar
- I. Szöke, P. Schwarz, L. Burget, M. Fapšo, M. Karafiát, J. Černocký, and P. Matějka. Comparison of keyword spotting approaches for informal continuous speech. In Proc. Interspeech, pages 633--636, 2005.Google Scholar
- K. Thambiratnam and S. Sridharan. Rapid yet accurate speech indexing using dynamic match lattice spotting. IEEE Transactions on Audio, Speech and Language Processing, 15(1):346--357, January 2007. Google ScholarDigital Library
- D. Vergyri, I. Shafran, A. Stolcke, R. R. Gadde, M. Akbacak, B. Roark, and W. Wang. The SRI/OGI 2006 spoken term detection system. In Proc. Interspeech, pages 2393--2396, 2007.Google Scholar
- Y. Zhang and J. R. Glass. Unsupervised spoken keyword spotting via segmental dtw on gaussian posteriorgrams. In Proc. ASRU, pages 398--403, 2009.Google ScholarCross Ref
Index Terms
- Novel methods for query selection and query combination in query-by-example spoken term detection
Recommendations
Comparison of methods for language-dependent and language-independent query-by-example spoken term detection
This article investigates query-by-example (QbE) spoken term detection (STD), in which the query is not entered as text, but selected in speech data or spoken. Two feature extractors based on neural networks (NN) are introduced: the first producing ...
Vocal Tract Length Normalization using a Gaussian mixture model framework for query-by-example spoken term detection
AbstractA speech spectrum is known to be changed by the variations in the length of the vocal tract of a speaker. This is because of the fact that speech formants are inversely related to the vocal tract length (VTL). The process of ...
A lattice-based approach to query-by-example spoken document retrieval
SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrievalRecent efforts on the task of spoken document retrieval (SDR) have made use of speech lattices: speech lattices contain information about alternative speech transcription hypotheses other than the 1-best transcripts, and this information can improve ...
Comments