Skip to main content

Evidence Modeling for Missing Data Speech Recognition Using Small Microphone Arrays

  • Chapter
  • First Online:
Robust Speech Recognition of Uncertain or Missing Data

Abstract

During the last decade microphone array processing has emerged as a powerful tool for increasing the noise robustness of automatic speech recognition (ASR) systems. Typically,microphone arrays are used as preprocessors that enhance the incoming speech signal prior to recognition. While such traditional approaches can lead to good results, they usually require large numbers of microphones to reach acceptable performance in practice. Furthermore, important information, such as uncertainty estimates and energy bounds, are often ignored as speech recognition is conventionally performed only on the enhanced output of the array. Using the probabilistic concept of evidence modeling this chapter presents a novel approach to robust ASR that aims for closer integration of microphone array processing and missing data speech recognition in reverberant multi-speaker environments. The output of the array is used to estimate the probability density function (pdf) of the hidden clean speech data using any information which may be available before and after array processing. The chapter discusses different types of evidence pdfs and shows how these models can be used effectively during HMM decoding.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Araki, S., Mukai, R., Makino, S., Nishikawa, T., Saruwatari, H.: The fundamental limitation of frequency domain blind source separation for convolutive mixtures of speech. IEEE Transactions on Speech and Audio Processing 11(2), 109–116 (2003)

    Article  Google Scholar 

  2. Araki, S., Sawada, H., Mukai, R., Makino, S.: DOA estimation for multiple sparse sources with normalized observation vector clustering. In: IEEE International Conference on Acoustics, Speech and Signal Processing. Toulouse, France (2006)

    Google Scholar 

  3. Araki, S., Sawada, H., Mukai, R., Makino, S.: Underdetermined blind sparse source separation for arbitrarily arranged multiple sensors. Signal Processing 87(8), 1833–1847 (2007)

    Article  MATH  Google Scholar 

  4. Arrowood, J.: Using observation uncertainty for robust speech recognition. Ph.D. thesis, Georgia Institute of Technology (2003)

    Google Scholar 

  5. Barker, J., Josifovski, L., Cooke, M., Green, P.: Soft decisions in missing data techniques for robust automatic speech recognition. In: 6th International Conference of Spoken Language Processing. Beijing, China (2000)

    Google Scholar 

  6. Benìtez, M., Segura, J., Ramìrez, J., Rubio, A.: Including uncertainty of speech observations in robust speech recognition. In: 8th International Conference on Spoken Language Processing. Jeju Island, Korea (2004)

    Google Scholar 

  7. Bregman, A.: Auditory Scene Analysis. MIT Press, Cambridge MA (1990)

    Google Scholar 

  8. Cermak, J., Araki, S., Sawada, H., Makino, S.: Blind speech separation by combining beamformers and a time frequency binary mask. In: International Workshop on Acoustic Echo and Noise Control. Paris, France (2006)

    Google Scholar 

  9. Cherry, E.: Some experiments on the recognition of speech, with one and with two ears. Journal of the Acoustical Society of America 25(5), 975–979 (1953)

    Article  Google Scholar 

  10. Cooke, M., Green, P., Josifovski, L., Vizinho, A.: Robust automatic speech recognition with missing and unreliable acoustic data. Speech Communication 34(3), 267–285 (2001)

    Article  MATH  Google Scholar 

  11. Deng, L., Droppo, J., Acero, A.: Dynamic compensation of HMM variances using the feature enhancement uncertainty computed from a parametric model of speech distortion. IEEE Transactions on Speech and Audio Processing 13(3), 412–421 (2005)

    Article  Google Scholar 

  12. Garofolo, J., Lamel, L., Fisher, W., Fiscus, J., Pallett, D., Dahlgren, N., Zue, V.: TIMIT acoustic-phonetic continuous speech corpus. Tech. rep., Linguistic Data Consortium (1993)

    Google Scholar 

  13. Harding, S., Barker, J., Brown, G.: Mask estimation for missing data speech recognition based on statistics of binaural interaction. IEEE Transactions on Audio, Speech, and Language Processing 14(1), 58–67 (2006)

    Article  Google Scholar 

  14. Kolossa, D., Klimas, A., Orglmeister, R.: Separation and robust recognition of noisy, convolutive speech mixtures using time-frequency masking and missing data techniques (2005)

    Google Scholar 

  15. Kolossa, D., Orglmeister, R.: Nonlinear postprocessing for blind speech separation. In: 5th International Conference on Independent Component Analysis and Signal Separation. Granada, Spain (2004)

    Google Scholar 

  16. Kolossa, D., Sawada, H., Astudillo, R., Orglmeister, R., Makino, S.: Recognition of convolutive speech mixtures by missing feature techniques for ICA. In: Asilomar Conference on Signals, Systems and Computers. Pacific Grove, CA (2006)

    Google Scholar 

  17. Kühne, M., Pullella, D., Togneri, R., Nordholm, S.: Towards the use of full covariance models for missing data speaker recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing. Las Vegas, USA (2008)

    Google Scholar 

  18. Kühne, M., Togneri, R., Nordholm, S.: Mel-spectrographic mask estimation for missing data speech recognition using short-time-Fourier-transform ratio estimators. In: IEEE International Conference on Acoustics, Speech and Signal Processing. Honolulu, USA (2007)

    Google Scholar 

  19. Kühne, M., Togneri, R., Nordholm, S.: Adaptive beamforming and soft missing data decoding for robust speech recognition in reverberant environments. In: Interspeech. Brisbane, Australia (2008)

    Google Scholar 

  20. Kühne, M., Togneri, R., Nordholm, S.: Time-frequency masking: Linking blind source separation and robust speech recognition. In: F. Milhelič, J. Žibert (eds.) Speech Recognition: Techniques, Technologies and Applications, pp. 61–80. In-Tech Open Access Publisher (2008)

    Google Scholar 

  21. Kühne, M., Togneri, R., Nordholm, S.: Robust source localization in reverberant environments based on weighted fuzzy clustering. IEEE Signal Processing Letters 16(2), 85–88 (2009)

    Article  Google Scholar 

  22. Kühne, M., Togneri, R., Nordholm, S.: A new evidence model for missing data speech recognition with applications in reverberant multi-source environments. IEEE Transactions on Audio, Speech and Language Processing, in press (2010)

    Google Scholar 

  23. Kühne, M., Togneri, R., Nordholm, S.: A novel fuzzy clustering algorithm using observation weighting and context information for reverberant blind speech separation. Signal Processing 90(2), 653–669 (2010)

    Article  MATH  Google Scholar 

  24. Lehmann, E., Johansson, A.: Prediction of energy decay in room impulse responses simulated with an image-source model. Journal of the Acoustical Society of America 124(1), 269–277 (2008)

    Article  Google Scholar 

  25. Leonard, R.: A database for speaker-independent digit recognition. In: IEEE International Conference on Acoustics, Speech, and Signal Processing. San Diego, CA (1984)

    Google Scholar 

  26. Lippmann, R.: Speech recognition by machines and humans. Speech Communication 22(1), 1–15 (1997)

    Article  Google Scholar 

  27. Low, S., Togneri, R., Nordholm, S.: Spatio-temporal processing for distant speech recognition. In: IEEE International Conference on Acoustics, Speech, and Signal Processing. Montreal, Canada (2004)

    Google Scholar 

  28. Malonakis, D., Ingle, V., Kogon, S.: Statistical and Adaptive Signal Processing. McGraw Hill (2000)

    Google Scholar 

  29. McAdams, S.: Recognition of Auditory Sound Sources and Events. Thinking in Sound: The Cognitive Psychology of Human Audition. Oxford University Press (1993)

    Google Scholar 

  30. McCowan, I.A., Marro, C., Mauuary, L.: Robust speech recognition using near-field superdirective beamforming with post-filtering. In: IEEE International Conference on Acoustics, Speech and Signal Processing. Istanbul, Turkey (2000)

    Google Scholar 

  31. McCowan, I.A., Morris, A., Bourlard, H.: Improving speech recognition performance of small microphone arrays using missing data techniques. In: 7th International Conference on Spoken Language Processing. Denver, USA (2002)

    Google Scholar 

  32. Morris, A.: Data utility modelling for mismatch reduction. In: Workshop on Consistent & Reliable Acoustic Cues for sound analysis. Aalborg, Denmark (2001)

    Google Scholar 

  33. Morris, A., Barker, J., Bourlard, H.: From missing data to maybe useful data: Soft data modelling for noise robust ASR. In: WISP. Stratford-upon-Avon, England (2001)

    Google Scholar 

  34. Omologo, M., Matassoni, M., Svaizer, P.: Speech recognition with microphone arrays. In: M. Brandstein, D. Ward (eds.) Microphone arrays, pp. 331–353. Springer (2001)

    Google Scholar 

  35. Palomäki, K., Brown, G., Wang, D.: A binaural processor for missing data speech recognition in the presence of noise and small-room reverberation. Speech Communication 43(4), 361–378 (2004)

    Article  Google Scholar 

  36. Roman, N., Srinivasan, S., Wang, D.: Speech recognition in multisource reverberant environments with binaural inputs. In: IEEE International Conference on Acoustics, Speech and Signal Processing. Toulouse, France (2006)

    Google Scholar 

  37. Roman, N., Wang, D.: Pitch-based monaural segregation of reverberant speech. Journal of the Acoustical Society of America 120(1), 458–469 (2006)

    Article  Google Scholar 

  38. Roman, N., Wang, D., Brown, G.: Speech segregation based on sound localization. Journal of the Acoustical Society of America 114(4), 2236–2252 (2003)

    Article  Google Scholar 

  39. Seltzer, M.: Microphone array processing for robust speech recognition. Ph.D. thesis, Carnegie Mellon University, Pittsburgh, USA (2003)

    Google Scholar 

  40. Srinivasan, S., Roman, N., Wang, D.: Exploiting uncertainties for binaural speech recognition. In: IEEE International Conference on Acoustics, Speech, and Signal Processing. Honolulu, USA (2007)

    Google Scholar 

  41. Stouten, V., Van hamme, H., Wambacq, P.: Accounting for the uncertainty of speech estimates in the context of model-based feature enhancement. In: International Conference on Spoken Language Processing. Jeju Island, Korea (2004)

    Google Scholar 

  42. Togami, M., Sumiyoshi, T., Amano, A.: Stepwise phase difference restoration method for sound source localization using multiple microphone pairs. In: IEEE International Conference on Acoustics, Speech and Signal Processing. Honolulu, USA (2007)

    Google Scholar 

  43. de Veth, J., de Wet, F., Cranen, B., Boves, L.: Acoustic features and a distance measure that reduces the impact of training-set mismatch in ASR. Speech Communication 34(1-2), 57–74 (2001)

    Article  MATH  Google Scholar 

  44. Wu, M., Wang, D.: A one-microphone algorithm for reverberant speech enhancement. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. 892–895. Hong Kong, China (2003)

    Google Scholar 

  45. Yilmaz, Ö., Rickard, S.: Blind separation of speech mixtures via time-frequency masking. IEEE Transactions on Signal Processing 52(7), 1830–1847 (2004)

    Article  MathSciNet  Google Scholar 

  46. Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., Moore, G., J., O., Ollason, D., Povey, D., Valtchev, V., Woodland, P.: The HTK Book. Cambridge University Engineering Department (2006)

    Google Scholar 

Download references

Acknowledgements

This research was partly funded by the Australian Research Council (ARC) grant no. DP1096348.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marco Kühne .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Kühne, M., Togneri, R., Nordholm, S. (2011). Evidence Modeling for Missing Data Speech Recognition Using Small Microphone Arrays. In: Kolossa, D., Häb-Umbach, R. (eds) Robust Speech Recognition of Uncertain or Missing Data. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21317-5_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-21317-5_11

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-21316-8

  • Online ISBN: 978-3-642-21317-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics