Evidence Modeling for Missing Data Speech Recognition Using Small Microphone Arrays

Kühne, Marco; Togneri, Roberto; Nordholm, Sven

doi:10.1007/978-3-642-21317-5_11

Marco Kühne³,
Roberto Togneri³ &
Sven Nordholm⁴

901 Accesses

Abstract

During the last decade microphone array processing has emerged as a powerful tool for increasing the noise robustness of automatic speech recognition (ASR) systems. Typically,microphone arrays are used as preprocessors that enhance the incoming speech signal prior to recognition. While such traditional approaches can lead to good results, they usually require large numbers of microphones to reach acceptable performance in practice. Furthermore, important information, such as uncertainty estimates and energy bounds, are often ignored as speech recognition is conventionally performed only on the enhanced output of the array. Using the probabilistic concept of evidence modeling this chapter presents a novel approach to robust ASR that aims for closer integration of microphone array processing and missing data speech recognition in reverberant multi-speaker environments. The output of the array is used to estimate the probability density function (pdf) of the hidden clean speech data using any information which may be available before and after array processing. The chapter discusses different types of evidence pdfs and shows how these models can be used effectively during HMM decoding.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Spectral Reconstruction and Noise Model Estimation Based on a Masking Model for Noise Robust Speech Recognition

Article 06 January 2017

A Comparative Study of Speech Processing in Microphone Arrays with Multichannel Alignment and Zelinski Post-Filtering

Microphone Array Post-filter in Frequency Domain for Speech Recognition Using Short-Time Log-Spectral Amplitude Estimator and Spectral Harmonic/Noise Classifier

References

Araki, S., Mukai, R., Makino, S., Nishikawa, T., Saruwatari, H.: The fundamental limitation of frequency domain blind source separation for convolutive mixtures of speech. IEEE Transactions on Speech and Audio Processing 11(2), 109–116 (2003)
Article Google Scholar
Araki, S., Sawada, H., Mukai, R., Makino, S.: DOA estimation for multiple sparse sources with normalized observation vector clustering. In: IEEE International Conference on Acoustics, Speech and Signal Processing. Toulouse, France (2006)
Google Scholar
Araki, S., Sawada, H., Mukai, R., Makino, S.: Underdetermined blind sparse source separation for arbitrarily arranged multiple sensors. Signal Processing 87(8), 1833–1847 (2007)
Article MATH Google Scholar
Arrowood, J.: Using observation uncertainty for robust speech recognition. Ph.D. thesis, Georgia Institute of Technology (2003)
Google Scholar
Barker, J., Josifovski, L., Cooke, M., Green, P.: Soft decisions in missing data techniques for robust automatic speech recognition. In: 6th International Conference of Spoken Language Processing. Beijing, China (2000)
Google Scholar
Benìtez, M., Segura, J., Ramìrez, J., Rubio, A.: Including uncertainty of speech observations in robust speech recognition. In: 8th International Conference on Spoken Language Processing. Jeju Island, Korea (2004)
Google Scholar
Bregman, A.: Auditory Scene Analysis. MIT Press, Cambridge MA (1990)
Google Scholar
Cermak, J., Araki, S., Sawada, H., Makino, S.: Blind speech separation by combining beamformers and a time frequency binary mask. In: International Workshop on Acoustic Echo and Noise Control. Paris, France (2006)
Google Scholar
Cherry, E.: Some experiments on the recognition of speech, with one and with two ears. Journal of the Acoustical Society of America 25(5), 975–979 (1953)
Article Google Scholar
Cooke, M., Green, P., Josifovski, L., Vizinho, A.: Robust automatic speech recognition with missing and unreliable acoustic data. Speech Communication 34(3), 267–285 (2001)
Article MATH Google Scholar
Deng, L., Droppo, J., Acero, A.: Dynamic compensation of HMM variances using the feature enhancement uncertainty computed from a parametric model of speech distortion. IEEE Transactions on Speech and Audio Processing 13(3), 412–421 (2005)
Article Google Scholar
Garofolo, J., Lamel, L., Fisher, W., Fiscus, J., Pallett, D., Dahlgren, N., Zue, V.: TIMIT acoustic-phonetic continuous speech corpus. Tech. rep., Linguistic Data Consortium (1993)
Google Scholar
Harding, S., Barker, J., Brown, G.: Mask estimation for missing data speech recognition based on statistics of binaural interaction. IEEE Transactions on Audio, Speech, and Language Processing 14(1), 58–67 (2006)
Article Google Scholar
Kolossa, D., Klimas, A., Orglmeister, R.: Separation and robust recognition of noisy, convolutive speech mixtures using time-frequency masking and missing data techniques (2005)
Google Scholar
Kolossa, D., Orglmeister, R.: Nonlinear postprocessing for blind speech separation. In: 5th International Conference on Independent Component Analysis and Signal Separation. Granada, Spain (2004)
Google Scholar
Kolossa, D., Sawada, H., Astudillo, R., Orglmeister, R., Makino, S.: Recognition of convolutive speech mixtures by missing feature techniques for ICA. In: Asilomar Conference on Signals, Systems and Computers. Pacific Grove, CA (2006)
Google Scholar
Kühne, M., Pullella, D., Togneri, R., Nordholm, S.: Towards the use of full covariance models for missing data speaker recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing. Las Vegas, USA (2008)
Google Scholar
Kühne, M., Togneri, R., Nordholm, S.: Mel-spectrographic mask estimation for missing data speech recognition using short-time-Fourier-transform ratio estimators. In: IEEE International Conference on Acoustics, Speech and Signal Processing. Honolulu, USA (2007)
Google Scholar
Kühne, M., Togneri, R., Nordholm, S.: Adaptive beamforming and soft missing data decoding for robust speech recognition in reverberant environments. In: Interspeech. Brisbane, Australia (2008)
Google Scholar
Kühne, M., Togneri, R., Nordholm, S.: Time-frequency masking: Linking blind source separation and robust speech recognition. In: F. Milhelič, J. Žibert (eds.) Speech Recognition: Techniques, Technologies and Applications, pp. 61–80. In-Tech Open Access Publisher (2008)
Google Scholar
Kühne, M., Togneri, R., Nordholm, S.: Robust source localization in reverberant environments based on weighted fuzzy clustering. IEEE Signal Processing Letters 16(2), 85–88 (2009)
Article Google Scholar
Kühne, M., Togneri, R., Nordholm, S.: A new evidence model for missing data speech recognition with applications in reverberant multi-source environments. IEEE Transactions on Audio, Speech and Language Processing, in press (2010)
Google Scholar
Kühne, M., Togneri, R., Nordholm, S.: A novel fuzzy clustering algorithm using observation weighting and context information for reverberant blind speech separation. Signal Processing 90(2), 653–669 (2010)
Article MATH Google Scholar
Lehmann, E., Johansson, A.: Prediction of energy decay in room impulse responses simulated with an image-source model. Journal of the Acoustical Society of America 124(1), 269–277 (2008)
Article Google Scholar
Leonard, R.: A database for speaker-independent digit recognition. In: IEEE International Conference on Acoustics, Speech, and Signal Processing. San Diego, CA (1984)
Google Scholar
Lippmann, R.: Speech recognition by machines and humans. Speech Communication 22(1), 1–15 (1997)
Article Google Scholar
Low, S., Togneri, R., Nordholm, S.: Spatio-temporal processing for distant speech recognition. In: IEEE International Conference on Acoustics, Speech, and Signal Processing. Montreal, Canada (2004)
Google Scholar
Malonakis, D., Ingle, V., Kogon, S.: Statistical and Adaptive Signal Processing. McGraw Hill (2000)
Google Scholar
McAdams, S.: Recognition of Auditory Sound Sources and Events. Thinking in Sound: The Cognitive Psychology of Human Audition. Oxford University Press (1993)
Google Scholar
McCowan, I.A., Marro, C., Mauuary, L.: Robust speech recognition using near-field superdirective beamforming with post-filtering. In: IEEE International Conference on Acoustics, Speech and Signal Processing. Istanbul, Turkey (2000)
Google Scholar
McCowan, I.A., Morris, A., Bourlard, H.: Improving speech recognition performance of small microphone arrays using missing data techniques. In: 7th International Conference on Spoken Language Processing. Denver, USA (2002)
Google Scholar
Morris, A.: Data utility modelling for mismatch reduction. In: Workshop on Consistent & Reliable Acoustic Cues for sound analysis. Aalborg, Denmark (2001)
Google Scholar
Morris, A., Barker, J., Bourlard, H.: From missing data to maybe useful data: Soft data modelling for noise robust ASR. In: WISP. Stratford-upon-Avon, England (2001)
Google Scholar
Omologo, M., Matassoni, M., Svaizer, P.: Speech recognition with microphone arrays. In: M. Brandstein, D. Ward (eds.) Microphone arrays, pp. 331–353. Springer (2001)
Google Scholar
Palomäki, K., Brown, G., Wang, D.: A binaural processor for missing data speech recognition in the presence of noise and small-room reverberation. Speech Communication 43(4), 361–378 (2004)
Article Google Scholar
Roman, N., Srinivasan, S., Wang, D.: Speech recognition in multisource reverberant environments with binaural inputs. In: IEEE International Conference on Acoustics, Speech and Signal Processing. Toulouse, France (2006)
Google Scholar
Roman, N., Wang, D.: Pitch-based monaural segregation of reverberant speech. Journal of the Acoustical Society of America 120(1), 458–469 (2006)
Article Google Scholar
Roman, N., Wang, D., Brown, G.: Speech segregation based on sound localization. Journal of the Acoustical Society of America 114(4), 2236–2252 (2003)
Article Google Scholar
Seltzer, M.: Microphone array processing for robust speech recognition. Ph.D. thesis, Carnegie Mellon University, Pittsburgh, USA (2003)
Google Scholar
Srinivasan, S., Roman, N., Wang, D.: Exploiting uncertainties for binaural speech recognition. In: IEEE International Conference on Acoustics, Speech, and Signal Processing. Honolulu, USA (2007)
Google Scholar
Stouten, V., Van hamme, H., Wambacq, P.: Accounting for the uncertainty of speech estimates in the context of model-based feature enhancement. In: International Conference on Spoken Language Processing. Jeju Island, Korea (2004)
Google Scholar
Togami, M., Sumiyoshi, T., Amano, A.: Stepwise phase difference restoration method for sound source localization using multiple microphone pairs. In: IEEE International Conference on Acoustics, Speech and Signal Processing. Honolulu, USA (2007)
Google Scholar
de Veth, J., de Wet, F., Cranen, B., Boves, L.: Acoustic features and a distance measure that reduces the impact of training-set mismatch in ASR. Speech Communication 34(1-2), 57–74 (2001)
Article MATH Google Scholar
Wu, M., Wang, D.: A one-microphone algorithm for reverberant speech enhancement. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. 892–895. Hong Kong, China (2003)
Google Scholar
Yilmaz, Ö., Rickard, S.: Blind separation of speech mixtures via time-frequency masking. IEEE Transactions on Signal Processing 52(7), 1830–1847 (2004)
Article MathSciNet Google Scholar
Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., Moore, G., J., O., Ollason, D., Povey, D., Valtchev, V., Woodland, P.: The HTK Book. Cambridge University Engineering Department (2006)
Google Scholar

Download references

Acknowledgements

This research was partly funded by the Australian Research Council (ARC) grant no. DP1096348.

Author information

Authors and Affiliations

The University of Western Australia, 35 Stirling Highway, Crawley, WA, 6009, Australia
Marco Kühne & Roberto Togneri
Curtin University of Technology, U1987, Perth, WA, 6845, Australia
Sven Nordholm

Authors

Marco Kühne
View author publications
You can also search for this author in PubMed Google Scholar
Roberto Togneri
View author publications
You can also search for this author in PubMed Google Scholar
Sven Nordholm
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marco Kühne .

Editor information

Editors and Affiliations

Institute of Communication Acoustics, Ruhr-Universität Bochum, Universitätsstrasse 150, Bochum, 44801, Germany
Dorothea Kolossa
, Dept. of Communications Engineering, University of Paderborn, Warburger Strasse 100, Paderborn, 33098, Germany
Reinhold Häb-Umbach

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kühne, M., Togneri, R., Nordholm, S. (2011). Evidence Modeling for Missing Data Speech Recognition Using Small Microphone Arrays. In: Kolossa, D., Häb-Umbach, R. (eds) Robust Speech Recognition of Uncertain or Missing Data. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21317-5_11

Download citation

DOI: https://doi.org/10.1007/978-3-642-21317-5_11
Published: 23 June 2011
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21316-8
Online ISBN: 978-3-642-21317-5
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Evidence Modeling for Missing Data Speech Recognition Using Small Microphone Arrays

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Spectral Reconstruction and Noise Model Estimation Based on a Masking Model for Noise Robust Speech Recognition

A Comparative Study of Speech Processing in Microphone Arrays with Multichannel Alignment and Zelinski Post-Filtering

Microphone Array Post-filter in Frequency Domain for Speech Recognition Using Short-Time Log-Spectral Amplitude Estimator and Spectral Harmonic/Noise Classifier

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Evidence Modeling for Missing Data Speech Recognition Using Small Microphone Arrays

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Spectral Reconstruction and Noise Model Estimation Based on a Masking Model for Noise Robust Speech Recognition

A Comparative Study of Speech Processing in Microphone Arrays with Multichannel Alignment and Zelinski Post-Filtering

Microphone Array Post-filter in Frequency Domain for Speech Recognition Using Short-Time Log-Spectral Amplitude Estimator and Spectral Harmonic/Noise Classifier

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation