Skip to main content

A Neural Network Keyword Search System for Telephone Speech

  • Conference paper
Speech and Computer (SPECOM 2014)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8773))

Included in the following conference series:

  • 1352 Accesses

Abstract

In this paper we propose a pure “neural network” (NN) based keyword search system developed in the IARPA Babel program for conversational telephone speech. Using a common keyword search evaluation metric, “actual term weighted value” (ATWV), we demonstrate that our NN-keyword search system can achieve a performance similar to a comparible but more complex and slower “hybrid deep neural network - hidden markov model” (DNN-HMM Hybrid) based speech recognition system without using either an HMM decoder or a language model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Bergstra, J., Breuleux, O., Bastien, F., Lamblin, P., Pascanu, R., Desjardins, G., Turian, J., Warde-Farley, D., Bengio, Y.: Theano: A CPU and GPU math expression compiler. In: Proceedings of the Python for Scientific Computing Conference (SciPy) (June 2010) Oral Presentation

    Google Scholar 

  2. Cui, J., Cui, X., Ramabhadran, B., Kim, J., Kingsbury, B., Mamou, J., Mangu, L., Picheny, M., Sainath, T.N., Sethy, A.: Developing speech recognition systems for corpus indexing under the iarpa babel program. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6753–6757. IEEE (2013)

    Google Scholar 

  3. Gehring, J., Lee, W., Kilgour, K., Lane, I., Miao, Y., Waibel, A., Campus, S.V.: Modular combination of deep neural networks for acoustic modeling. In: Proc. Interspeech, pp. 94–98 (2013)

    Google Scholar 

  4. Gehring, J., Miao, Y., Metze, F., Waibel, A.: Extracting deep bottleneck features using stacked auto-encoders. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3377–3381. IEEE (2013)

    Google Scholar 

  5. Goodman, J., Chen, S.: An empirical study of smoothing techniques for language modeling. Tech. rep., Technical Report TR-10-98, Harvard University (August 1998)

    Google Scholar 

  6. Heck, M., Mohr, C., Stüker, S., Müller, M., Kilgour, K., Gehring, J., Nguyen, Q.B., Van Nguyen, H., Waibel, A.: Segmentation of telephone speech based on speech and non-speech models. In: Železný, M., Habernal, I., Ronzhin, A. (eds.) SPECOM 2013. LNCS (LNAI), vol. 8113, pp. 286–293. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  7. Hsiao, R., Ng, T., Grezl, F., Karakos, D., Tsakalidis, S., Nguyen, L., Schwartz, R.: Discriminative semi-supervised training for keyword search in low resource languages. In: 2013 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 440–445. IEEE (2013)

    Google Scholar 

  8. IARPA: Iarpa babel program - broad agency announcement, baa (2011), http://www.iarpa.gov/Programs/ia/Babel/solicitation_babel.html

  9. Karakos, D., Schwartz, R., Tsakalidis, S., Zhang, L., Ranjan, S., Ng, T., Hsiao, R., Saikumar, G., Bulyko, I., Nguyen, L., et al.: Score normalization and system combination for improved keyword spotting. In: 2013 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 210–215. IEEE (2013)

    Google Scholar 

  10. Kingsbury, B., Cui, J., Cui, X., Gales, M.J., Knill, K., Mamou, J., Mangu, L., Nolden, D., Picheny, M., Ramabhadran, B., et al.: A high-performance cantonese keyword search system. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8277–8281. IEEE (2013)

    Google Scholar 

  11. Kolkhorst, H.: Strategies for Out-of-Vocabulary Words in Spoken Term Detection. Undergraduate thesis (2011)

    Google Scholar 

  12. Kurniawati, E., George, S.: Speaker dependent activation keyword detector based on gmm-ubm (2013)

    Google Scholar 

  13. Metze, F., Sheikh, Z.A., Waibel, A., Gehring, J., Kilgour, K., Nguyen, Q.B., Nguyen, V.H.: Models of tone for tonal and non-tonal languages. In: 2013 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 261–266. IEEE (2013)

    Google Scholar 

  14. Soltau, H., Metze, F., Fugen, C., Waibel, A.: A one-pass decoder based on polymorphic linguistic context assignment. In: IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2001, pp. 214–217. IEEE (2001)

    Google Scholar 

  15. Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.A.: Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. The Journal of Machine Learning Research 9999, 3371–3408 (2010)

    MathSciNet  Google Scholar 

  16. Wang, Q., Zhang, X., Zhang, Y., Yi, Q.: Augem: Automatically generate high performance dense linear algebra kernels on x86 cpus. In: Proceedings of SC 2013: International Conference for High Performance Computing, Networking, Storage and Analysis, p. 25. ACM (2013)

    Google Scholar 

  17. Wang, Y.: An in-depth comparison of keyword specific thresholding and sum-to-one score normalization. Tech. rep., Technical Report, Carnegie Mellon University (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Kilgour, K., Waibel, A. (2014). A Neural Network Keyword Search System for Telephone Speech. In: Ronzhin, A., Potapova, R., Delic, V. (eds) Speech and Computer. SPECOM 2014. Lecture Notes in Computer Science(), vol 8773. Springer, Cham. https://doi.org/10.1007/978-3-319-11581-8_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-11581-8_7

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11580-1

  • Online ISBN: 978-3-319-11581-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics