skip to main content
10.1145/3207677.3277918acmotherconferencesArticle/Chapter ViewAbstractPublication PagescsaeConference Proceedingsconference-collections
research-article

Keyword Spotting Based on Hypothesis Boundary Realignment and State-Level Confidence Weighting

Authors Info & Claims
Published:22 October 2018Publication History

ABSTRACT

Keyword1 spotting (KWS) deals with the identification of keywords in speech utterances. A two-stage approach is often used for the flexibility and high efficiency. The two stages are keyword hypotheses detection stage and hit or false-alarm verification stage in sequence. How to reduce the false-alarms is a key and difficult problem in the verification stage, which is formatted as the confidence measure (CM) problem. In this paper, a novel keyword-filler hidden Markov model (HMM) based method is proposed based on two improved approaches. On one hand, for more effective confidence measure, a hypothesis boundary realignment method is used to gain more precise hypothesized segments for possible keyword. Then an overlap ratio criterion is defined to evaluate this process. On the other hand, a state-level confidence weighting method is proposed to improve the posterior probability based CM. Experiments show that either improvement is effective, and the proposed method based on the two processes gives the best performance.

References

  1. R. C. Rose and D. B. Paul. 1990. A hidden Markov model based keyword recognition system. In International Conference on Acoustics, Speech and Signal Processing, IEEE, 1, 129--132.Google ScholarGoogle Scholar
  2. J. R. Rohlicek, W. Russell, S. Roukos, and H. Gish. 1989. Continuous hidden Markov modeling for speaker-independent word spotting. In International Conference on Acoustics, Speech and Signal Processing, IEEE, 1, 627--630.Google ScholarGoogle Scholar
  3. R. L. Warren. 2001. Broadcast speech recognition system for keyword monitoring. U.S. Patent 6332120 B1.Google ScholarGoogle Scholar
  4. J. S. Garofolo, C. G. P. Auzanne, and E. M. Voorhees. 2000. The TREC spoken document retrieval track: a success story. In Text Retrieval Conference, NIST, 26, 1--20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. G. G. Chen, C. Parada, and G. Heigold. 2014. Small-footprint keyword spotting using deep neural networks. In International Conference on Acoustics, Speech and Signal Processing, IEEE, 4087--4091.Google ScholarGoogle Scholar
  6. A. H. Michaely, X. D. Zhang, G. Simko, C. Parada, and P. Aleksic. 2017. Keyword spotting for Google assistant using contextual speech recognition. In Automatic Speech Recognition and Understanding Workshop, IEEE, 272--278.Google ScholarGoogle Scholar
  7. D. R. H. Miller, M. Kleber, C. Kao, O. Kimball, T. Colthurst, S. A. Lowe, R. M. Schwartz, and H. Gish. 2007. Rapid and accurate spoken term detection. In International Conference of the Speech Communication Association, 314--317.Google ScholarGoogle Scholar
  8. J. Mamou, B. Ramabhadran, and O. Siohan. 2007. Vocabulary independent spoken term detection. In International ACM SIGIR Conference on Research and Development in Information Retrieval, 615--622. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. D. Vergyri, I. Shafran, A. Stolcke, R. R. V. Gadde, M. Akbacak, B. Roark, and W. Wang. 2007. The SRI/OGI 2006 spoken term detection system. In International Conference of the Speech Communication Association, 2393--2396.Google ScholarGoogle Scholar
  10. V. T. Pham, H. H. Xu, X. Xiao, N. F. Chen, E. S. Chng, and H. Z. Li. 2016. Keyword search using query expansion for graph-based rescoring of hypothesized detections. In International Conference on Acoustics, Speech and Signal Processing, IEEE, 6035--6039.Google ScholarGoogle Scholar
  11. T. Alume, D. Karakos, W. Hartmann, R. Hsiao, L. Zhang, L. Nguyen, S. Tsakalidis, and R. Schwartz. 2017. The 2016 BBN Georgian telephone speech keyword spotting system. In International Conference on Acoustics, Speech and Signal Processing, IEEE, 5755--5759.Google ScholarGoogle Scholar
  12. D. Karakos and R. M. Schwartz. 2015. Combination of search techniques for improved spotting of OOV keywords. In International Conference on Acoustics, Speech and Signal Processing, IEEE, 5336--5340.Google ScholarGoogle Scholar
  13. I. Szöke. 2010. Hybrid word-subword spoken term detection.Google ScholarGoogle Scholar
  14. K. M. Knill and S. J. Young. 1996. Fast implementation methods for Viterbi-based word-spotting. In International Conference on Acoustics, Speech and Signal Processing, IEEE, 522--525. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Weintraub. 1995. LVCSR log-likelihood ratio scoring for keyword spotting. In International Conference on Acoustics, Speech and Signal Processing, IEEE, 1, 297--300.Google ScholarGoogle Scholar
  16. H. Bourlard, B. D'Hoore, and J. M. Boite. 1994. Optimizing recognition and rejection performance in wordspotting systems. In International Conference on Acoustics, Speech and Signal Processing, IEEE, 1, I/373--I/376.Google ScholarGoogle Scholar
  17. R. A. Sukkar and C. H. Lee. 1996. Vocabulary independent discriminative utterance verification for non-keyword rejection in subword based speech recognition. IEEE Transactions on Speech and Audio Processing, 4(6), 420--429.Google ScholarGoogle ScholarCross RefCross Ref
  18. S. Abdou and M. S. Scordilis. 2004. Beam search pruning in speech recognition using a posterior probability-based confidence measure. Speech Communication, 42(3), 409--428.Google ScholarGoogle ScholarCross RefCross Ref
  19. W. E. Fisher, G. R. Doddington, and K. M. Goudle-Marshall. 1986. The DARPA speech recognition research database: specifications and status. CMU Arctic Speech Databases for Speech Synthesis Research.Google ScholarGoogle Scholar
  20. S. Young, G. Evermann, M. Gales, T. Hain, D. Kershaw, X. Liu, G. Moore, J. Odell, D. Ollason, D. Povey, V. Valtchev, and P. Woodland. 2009. The HTK Book (for HTK version 3.4.1), http://htk.eng.cam.ac.uk: Cambridge University.Google ScholarGoogle Scholar
  21. J. Liang, M. Meng, X. R. Wang, and P. Ding. 2006. An improved Mandarin keyword spotting system using MCE training and context-enhanced verification. In International Conference on Acoustics, Speech and Signal Processing, IEEE, 1145--1148.Google ScholarGoogle Scholar
  22. H. Y. Li, J. Q. Han, and T. R. Zheng. 2011. AUC optimization based confidence measure for keyword spotting. In International Conference of the Speech Communication Association, 1917--1920.Google ScholarGoogle Scholar
  23. Y. C. Liu, M. X. Xu, and L. H. Cai. 2014. Improved keyword spotting system by optimizing posterior confidence measure vector using feed-forward neural network. In International Joint Conference on Neural Networks, IEEE, 2036--2041.Google ScholarGoogle Scholar
  24. C. Cortes and M. Mohri. 2004. Confidence intervals for the area under the ROC curve. In International Conference on Neural Information Processing Systems, MIT Press, 305--312. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. K. F. Lee and H. W. Hon. 1989. Speaker-independent phone recognition using hidden Markov models. IEEE Transactions on Acoustics Speech and Signal Processing, 37(11), 1641--1648.Google ScholarGoogle ScholarCross RefCross Ref
  26. J. G. Fiscus, J. G. Ajot, J. Garofalo, and G. Doddington. 2007. Results of the 2006 spoken term detection evaluation. In SIGIR Workshop on Searching Spontaneous Conversational Speech, 51--57.Google ScholarGoogle Scholar
  27. J. Cui, X. D. Cui, B. Ramabhadran, J. Kim, B. Kingsbury, J. Mamou, L. Mangu, M. Picheny, T. N. Sainath, and A. Sethy. 2013. Developing speech recognition systems for corpus indexing under the IARPA Babel program. In International Conference on Acoustics, Speech and Signal Processing, IEEE, 6753--6757.Google ScholarGoogle Scholar

Index Terms

  1. Keyword Spotting Based on Hypothesis Boundary Realignment and State-Level Confidence Weighting

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        CSAE '18: Proceedings of the 2nd International Conference on Computer Science and Application Engineering
        October 2018
        1083 pages
        ISBN:9781450365123
        DOI:10.1145/3207677

        Copyright © 2018 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 22 October 2018

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited

        Acceptance Rates

        CSAE '18 Paper Acceptance Rate189of383submissions,49%Overall Acceptance Rate368of770submissions,48%
      • Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader