skip to main content
10.1145/1878101.1878106acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Novel methods for query selection and query combination in query-by-example spoken term detection

Published:29 October 2010Publication History

ABSTRACT

Query-by-example (QbE) spoken term detection (STD) is necessary for low-resource scenarios where training material is hardly available and word-based speech recognition systems cannot be employed. We present two novel contributions to QbE STD: the first introduces several criteria to select the optimal example used as query throughout the search system. The second presents a novel feature level example combination to construct a more robust query used during the search. Experiments, tested on with-in language and cross-lingual QbE STD setups, show a significant improvement when the query is selected according to an optimal criterion over when the query is selected randomly for both setups and a significant improvement when several examples are combined to build the input query for the search system compared with the use of the single best example. They also show comparable performance to that of a state-of-the-art acoustic keyword spotting system.

References

  1. D. Can, E. Cooper, A. Sethy, C. White, B. Ramabhadran, and M. Saraclar. Effect of pronunciations on OOV queries in spoken term detection. In Proc. ICASSP, pages 3957--3960, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. C. Cieri, D. Miller, and K. Walker. From switchboard to Fisher: Telephone collection protocols, their uses and yields. In Proc. Interspeech, pages 1597--1600, 2003.Google ScholarGoogle Scholar
  3. F. Grézl, M. Karafiát, and L. Burget. Investigation into bottle-neck features for meeting speech recognition. In Proc. Interspeech, pages 2947--2950, 2009.Google ScholarGoogle Scholar
  4. T. J. Hazen, W. Shen, and C. M. White. Query-by-example spoken term detection using phonetic posteriorgram templates. In Proc. ASRU, pages 421--426, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  5. J. Mamou, B. Ramabhadran, and O. Siohan. Vocabulary independent spoken term detection. In Proc. ACM-SIGIR, pages 615--622, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. K. Ng. Subword-Based Approaches for Spoken Document Retrieval. PhD thesis, MIT, February 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. NIST. The spoken term detection (STD) 2006 evaluation plan, 10 edition, 2006.Google ScholarGoogle Scholar
  8. C. Parada, A. Sethy, and B. Ramabhadran. Query-by-example spoken term detection for oov terms. In Proc. ASRU, pages 404--409, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  9. J. Rohlicek, W. Russell, S. Roukos, and H. Gish. Continuous hidden markov modelling for speaker-independent word spotting. In Proc. ICASSP, pages 627--630, 1989.Google ScholarGoogle ScholarCross RefCross Ref
  10. W. Shen, C. M. White, and T. J. Hazen. A comparison of query-by-example methods for spoken term detection. In Proc. Interspeech, pages 2143--2146, 2009.Google ScholarGoogle Scholar
  11. I. Szöke, P. Schwarz, L. Burget, M. Fapšo, M. Karafiát, J. Černocký, and P. Matějka. Comparison of keyword spotting approaches for informal continuous speech. In Proc. Interspeech, pages 633--636, 2005.Google ScholarGoogle Scholar
  12. K. Thambiratnam and S. Sridharan. Rapid yet accurate speech indexing using dynamic match lattice spotting. IEEE Transactions on Audio, Speech and Language Processing, 15(1):346--357, January 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. D. Vergyri, I. Shafran, A. Stolcke, R. R. Gadde, M. Akbacak, B. Roark, and W. Wang. The SRI/OGI 2006 spoken term detection system. In Proc. Interspeech, pages 2393--2396, 2007.Google ScholarGoogle Scholar
  14. Y. Zhang and J. R. Glass. Unsupervised spoken keyword spotting via segmental dtw on gaussian posteriorgrams. In Proc. ASRU, pages 398--403, 2009.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Novel methods for query selection and query combination in query-by-example spoken term detection

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SSCS '10: Proceedings of the 2010 international workshop on Searching spontaneous conversational speech
      October 2010
      72 pages
      ISBN:9781450301626
      DOI:10.1145/1878101

      Copyright © 2010 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 29 October 2010

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Upcoming Conference

      MM '24
      MM '24: The 32nd ACM International Conference on Multimedia
      October 28 - November 1, 2024
      Melbourne , VIC , Australia

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader