Skip to main content

Speech Recognition with Microphone Arrays

  • Chapter
Microphone Arrays

Part of the book series: Digital Signal Processing ((DIGSIGNAL))

Abstract

Microphone arrays can be advantageously employed in Automatic Speech Recognition (ASR) systems to allow distant-talking interaction. Their beam-forming capabilities are used to enhance the speech message, while attenuating the undesired contribution of environmental noise and reverberation. In the first part of this chapter the state of the art of ASR systems is briefly reviewed, with a particular concern about robustness in distant-talker applications. The objective is the reduction of the mismatch between the real noisy data and the acoustic models used by the recognizer. Beamforming, speech enhancement, feature compensation, and model adaptation are the techniques adopted to this end. The second part of the chapter is dedicated to the description of a microphone-array based speech recognition system developed at ITC-IRST. It includes a linear array beamformer, an acoustic front-end for speech activity detection and feature extraction, a recognition engine based on Hidden Markov Models and the modules for training and adaptation of the acoustic models. Finally the performance of this system on a typical recognition task is reported.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. L.R. Rabiner, B.H. Juang, Fundamentals of speech recognition, Prentice Hall, 1993.

    Google Scholar 

  2. R. De Mori, Spoken dialogues with computers, Academic Press, 1998.

    Google Scholar 

  3. A. Acero, Acoustical and environmental robustness in automatic speech recognition, Kluwer, 1992.

    Google Scholar 

  4. Y. Gong, “Speech recognition in noisy environments: A survey,” Speech Communication, vol. 16, pp. 261–291, 1995.

    Article  Google Scholar 

  5. J.C. Junqua and J.P. Haton, Robustness in automatic speech recognition. Kluwer, 1996.

    Google Scholar 

  6. C.H. Lee, F.K. Soong, and K.K. Paliwal, Automatic speech and speaker recognition. Kluwer, 1996.

    Book  Google Scholar 

  7. S. Furui, “Recent advances in robust speech recognition,” in Proc. of ESCA-NATO Workshop on Robust Speech Recognition for Unknown Communication Channels, pp. 11–20, 1997.

    Google Scholar 

  8. M. Omologo, P. Svaizer, and M. Matassoni, “Environmental conditions and acoustic transduction in hands-free speech recognition,” Speech Communication, vol. 25, pp. 75–95, 1998.

    Article  Google Scholar 

  9. J. C. Junqua, “The lombard reflex and its role on human listeners and automatic speech recognizers,” J. Acoust. Soc. Am., vol. 93, pp. 510–524, 1993.

    Article  Google Scholar 

  10. L.R. Rabiner and R.W. Schafer, Digital processing of speech signals Prentice Hall, 1978.

    Google Scholar 

  11. L.R. Rabiner and M. Sambur, “An algorithm for determining the endpoints of isolated utterances,” Bell Sys. Tech. Journal, vol. 54, no. 2, pp. 297–315, 1975.

    Google Scholar 

  12. L.F. Lamel, L.R. Rabiner, A.E. Rosenberg, and J.G. Wilpon, “An improved endpoint detector for isolated word recognition,” IEEE Trans. on Acoustics, Speech and Signal Processing, vol. 29, pp. 777–785, 1981.

    Article  Google Scholar 

  13. H. Ney, “An optimization algorithm for determining the endpoints of isolated utterances,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (ICASSP-81), Atlanta GA, USA, pp. 720–723, 1981.

    Chapter  Google Scholar 

  14. J.C. Junqua, B. Mak, and B. Reaves, “A robust algorithm for word boundary detection in the presence of noise,” IEEE Trans. on Speech and Audio Processing, vol. 2, no. 3, pp. 406–412, 1994.

    Article  Google Scholar 

  15. D. O’Shaughnessy, Speech Communications, IEEE Press, 2000.

    Google Scholar 

  16. L. R. Rabiner, “A tutorial on hidden Markov models and selected applications in speech recognition,” Proc. IEEE, vol. 77, pp. 257–286, 1989.

    Article  Google Scholar 

  17. C. H. Lee, “On stochastic feature and model compensation approaches to robust speech recognition,” Speech Communication, vol. 25, pp. 29–47, 1998.

    Article  Google Scholar 

  18. J.S. Lim, Speech Enhancement, Prentice Hall, 1983.

    Google Scholar 

  19. Y. Ephraim, “Gain-adapted hidden Markov models for recognition of clean and noisy speech,” IEEE Trans. on Signal Processing, vol. 40, pp. 1303–1316, 1992.

    Article  MATH  Google Scholar 

  20. S. V. Vaseghi, Advanced Signal Processing and Digital Noise Reduction Wiley and Teubner, 1996.

    Google Scholar 

  21. S. Boll, “Speech enhancement in the 1980s, Noise suppression with pattern matching,” in Advances in Speech Signal Processing, (S. Furui and M.M. Sondhi, eds.), pp. 309–325, Marcel Dakker, 1992.

    Google Scholar 

  22. M. Rahim and B.H. Juang, “Signal bias removal by maximum likelihood estimation for robust telephone speech recognition,” IEEE Trans. on Speech and Audio Processing, vol. 4, pp. 19–30, 1996.

    Article  Google Scholar 

  23. C. Lawrence and M. Rahim, “Integrated bias removal techniques for robust speech recognition,” Computer Speech and Language, vol. 13, pp. 283–298, 1999.

    Article  Google Scholar 

  24. A. Sankar and C.H. Lee, “A maximum-likelihood approach to stochastic matching for robust speech recognition,” IEEE Trans. on Speech and Audio Processing, vol. 4, pp. 190–202, 1996.

    Article  Google Scholar 

  25. A. Nadas, D. Nahamoo, and M. Picheny, “Speech recognition using noise-adaptive prototypes,” IEEE Trans. on Acoustics, Speech and Signal Processing, vol. 37, no. 10, pp. 1495–1503, 1989.

    Article  Google Scholar 

  26. I. Sanches, “Noise-compensated hidden Markov models,” IEEE Trans. on Speech and Audio Processing, vol. 8, no. 5, pp. 533–540, 2000.

    Article  Google Scholar 

  27. Y. Zhao, “Frequency-domain maximum likelihood estimation for automatic speech recognition in additive and convolutive noises,” IEEE Trans. on Speech and Audio Processing, vol. 8, no. 3, pp. 255–266, 2000.

    Article  Google Scholar 

  28. M.J.F. Gales, Model-based techniques for noise robust speech recognition, PhD thesis, Cambridge University, Cambridge, England, 1995.

    Google Scholar 

  29. M. J. F. Gales and S. J. Young, “Robust speech recognition using parallel model combination,” IEEE Trans. on Speech and Audio Processing, vol. 4, no. 5, pp. 352–359, 1996.

    Article  Google Scholar 

  30. J. L. Gauvain and C. H. Lee, “Maximum a posteriori estimation for multivariate gaussian mixture observations of Markov chains,” IEEE Trans. on Speech and Audio Processing, vol. 2, no. 2, pp. 291–298, 1994.

    Article  Google Scholar 

  31. C.J. Leggetter and P.C. Woodland, “Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models,” Computer Speech and Language, vol. 9, pp. 171–185, 1995.

    Article  Google Scholar 

  32. M. J. F. Gales and P. C. Woodland, “Mean and variance adaptation within the MLLR framework,” Computer Speech and Language, vol. 10, pp. 249–264, 1996.

    Article  Google Scholar 

  33. M. J. F. Gales, “Maximum likelihood linear transformations for HMM-based speech recognition,” Computer Speech and Language, vol. 12, pp. 75–98, 1998.

    Article  Google Scholar 

  34. S. Das, R. Bakis, A. Nadas, D. Nahamoo, and M. Picheny, “Influence of background noise and microphone on the performance of the ibm tangora speech recognition system,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (ICASSP-93), Minneapolis MN, USA, pp. 95–98, Apr. 1993.

    Google Scholar 

  35. B. A. Dautrich, L. R. Rabiner, and T. B. Martin, “On the effects of varying filter bank parameters on isolated word recognition,” IEEE Trans. on Acoustics, Speech and Signal Processing, vol. 31, no. 4, pp. 793–897, 1983.

    Article  Google Scholar 

  36. S. Furui, “Robust speech recognition under adverse conditions,” in Proc. ESCA Workshop on Speech Processing in Adverse Conditions, pp. 31–42, 1992.

    Google Scholar 

  37. C.H. Knapp and G.C. Carter, “The generalized correlation method for estimation of time delay,” IEEE Trans. on Acoustics, Speech and Signal Processing, vol. 24, no. 4, pp. 320–327, 1976.

    Article  Google Scholar 

  38. M. Omologo and P. Svaizer, “Use of the cross-power-spectrum phase in acoustic event location,” IEEE Trans. on Speech and Audio Processing, vol. 5, no. 3, pp. 288–292, 1997.

    Article  Google Scholar 

  39. Y. Kaneda and J. Ohga, “Adaptive microphone-array system for noise reduction,” IEEE Trans. on Acoustics, Speech and Signal Processing, vol. 34, no. 6, pp. 1391–1400, 1986.

    Article  Google Scholar 

  40. R. Zelinski, “A microphone array with adaptive post-filtering for noise reduction in reverberant rooms,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (ICASSP-88), New York NY, USA, pp. 2578–2581, Apr. 1988.

    Google Scholar 

  41. S. Haykin, ed., Advances in spectrum analysis and array processing. Prentice Hall, 1995.

    Google Scholar 

  42. M. W. Hoffman and K. M. Buckley, “Robust time-domain processing of broadband microphone array data,” IEEE Trans. on Speech and Audio Processing, vol. 3, no. 3, pp. 193–203, 1995.

    Article  Google Scholar 

  43. S. Fischer and K. U. Simmer, “Beamforming microphone arrays for speech acquisition in noisy environments,” Speech Communication, vol. 20, no. 3–4, pp. 215–27, 1996.

    Article  Google Scholar 

  44. O.L. Frost, “An algorithm for linearly constrained adaptive array processing,” Proc. of IEEE, vol. 60, no. 8, pp. 926–935, 1972.

    Article  Google Scholar 

  45. L.J. Griffiths and C.W. Jim, “An alternative approach to linearly constrained adaptive beamforming,” IEEE Trans. on Antennas and Propagation, vol. 30, no. 1, pp. 27–34, 1982.

    Article  Google Scholar 

  46. J. Bitzer, K.U. Simmer, and K.D. Kammeyer, “Multi-microphone noise reduction techniques for hands-free speech recognition–a comparative study,” in Proc. of the Workshop on Robust Methods for Speech Recognition in Adverse Conditions, Tampere, Finland, pp. 171–174, 1999.

    Google Scholar 

  47. J.L. Flanagan, A.C. Surendran, and E.E. Jan, “Spatially selective sound capture for speech and audio processing,” Speech Communication, vol. 13, pp. 207222, 1993.

    Google Scholar 

  48. E.E. Jan, P. Svaizer, and J.L. Flanagan, “Matched-filter processing of microphone array for spatial volume selectivity,” in Proc. of IEEE ISCAS, pp. 14601463, 1995.

    Google Scholar 

  49. C. Marro, Y. Mahieux, and K.U. Simmer, “Analysis of noise reduction and dereverberation techniques based on microphone arrays with postfiltering,” IEEE Trans. Speech and Audio Proc., vol. 6, no. 3, pp. 240–259, 1998.

    Article  Google Scholar 

  50. D. Van Compernolle, “Switching adaptive filters for enhancing noisy and reverberant speech from microphone array recordings,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (ICASSP-90), Albuquerque NM, USA, pp. 833–836, Apr. 1990.

    Google Scholar 

  51. Y. Grenier, “A microphone array for car environments,” Speech Communication, vol. 12, pp. 25–39, 1993.

    Article  Google Scholar 

  52. T.M. Sullivan and R.M. Stern, “Multi-microphone correlation-based processing for robust speech recognition,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (ICASSP-93), Minneapolis MN, USA, pp. 91–94, Apr. 1993.

    Google Scholar 

  53. P. Raghavan, R.J. Renomeron, C. Che, D.S. Yuk, and J.L. Flanagan, “Speech recognition in a reverberant environment using matched filter array (MFA) processing and linguistic-tree maximum likelihood linear regression (LTMLLR) adaptation,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (ICASSP-99), Phoenix AZ, USA, pp. 777–780, Mar. 1999.

    Google Scholar 

  54. T.B. Hughes, H.S. Kim, J.H. DiBiase, and H.F. Silverman, “Performance of an HMM Speech Recognizer using a real-time tracking microphone array as input,” IEEE Trans. on Speech and Audio Proc., vol. 7, no. 3, pp. 346–349, 1999.

    Article  Google Scholar 

  55. T. Takiguchi, S. Nakamura, and K. Shikano, “Speech recognition for a distant moving speaker based on HMM composition and separation,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (ICASSP-00), Istanbul, Turkey, pp. 1403–1406, June 2000.

    Google Scholar 

  56. J. Kleban and Y. Gong, “HMM adaptation and microphone array processing for distant speech recognition,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (ICASSP-00), Istanbul, Turkey, pp. 1411–1414, June 2000.

    Google Scholar 

  57. D. V. Rabinkin, R. J.Renomeron, J. C. French, and J. L. Flanagan, “Optimum microphone placement for array sound capture,” Proc. of the SPIE, vol. 3162, pp. 227–39, 1997.

    Article  Google Scholar 

  58. M. Inoue, S. Nakamura, T. Yamada, and K Shikano, “Microphone array design measures for hands-free speech recognition,” in Proc. of EUROSPEECH, pp. 331–334, 1997.

    Google Scholar 

  59. D. Giuliani, M. Matassoni, M. Omologo, and P. Svaizer, “Use of different microphone array configurations for hands-free speech recognition in noisy and reverberant environments,” in Proc. of EUROSPEECH, pp. 347–350, 1997.

    Google Scholar 

  60. M. Omologo, M. Matassoni, P. Svaizer, and D. Giuliani, “Microphone array based speech recognition with different talker-array positions,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (ICASSP-97), Munich, Germany, pp. 227–230, Apr. 1997.

    Google Scholar 

  61. S. Nakamura, T. Yamada, P. Heracleous, and K. Shikano, “Recognition of distant-talking speech based on 3-D trellis search using a microphone array and adaptive beamforming,” in Proc. of the Workshop on Robust Methods for Speech Recognition in Adverse Conditions, Tampere, Finland, pp. 219–222, 1999.

    Google Scholar 

  62. S. Oh, and V. Viswanathan, “Hands-free voice communication in an automobile with a microphone array,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (ICASSP-92), San Francisco CA, USA, pp. 281–284, Mar. 1992.

    Google Scholar 

  63. R. Le Bouquin, “Enhancement of noisy speech signals, application to mobile radio communications,” Speech Communication, vol. 18, pp. 3–19, 1996.

    Article  Google Scholar 

  64. D. Mansour and B.H. Juang, “The short-time modified coherence representation and noisy speech recognition,” IEEE Trans. on Acoustics, Speech and Signal Processing, vol. 37, no. 6, pp. 795–804, 1989.

    Article  Google Scholar 

  65. B. Yegnanarayana, P. Satyanarayana Murthy, C. Avendano, and H. Herman-sky, “Enhancement of reverberant speech using LP residual,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (ICASSP-98), Seattle WA, USA, pp. 405–408, May 1998.

    Google Scholar 

  66. M. Brandstein, “On the use of explicit speech modeling in microphone array applications,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (ICASSP-98), Seattle WA, USA pp. 3613–3616, May 1998.

    Google Scholar 

  67. M. Brandstein, “An event-based method for microphone array speech enhancement,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (ICASSP99), Phoenix AZ, USA, pp. 953–956, Mar. 1999.

    Google Scholar 

  68. D. Giuliani, M. Matassoni, M. Omologo, and P. Svaizer, “Training of HMM with filtered speech material for hands-free speech recognition,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (ICASSP-99), Phoenix AZ, USA, pp. 449–452, Mar. 1999.

    Google Scholar 

  69. Y. Shimizu, S. Kajita, K. Takeda, and F. Itakura, “Speech recognition based on space diversity using distributed multi-microphone,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (ICASSP-00), Istanbul, Turkey, pp. 197–200, June 2000.

    Google Scholar 

  70. D. Giuliani, M. Omologo, and P. Svaizer, “Experiments of speech recognition in a noisy and reverberant environment using a microphone array and HMM adaptation,” in Proc. of ICSLP, pp. 1329–1332, 1996.

    Google Scholar 

  71. Q. Lin, C.W. Che, D.S. Yuk, L. Jin, B. de Vries, J. Pearson, and J.L. Flanagan, “Robust distant-talking speech recognition,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (ICASSP-96), Atlanta GA, USA, pp. 21–24, May 1996.

    Google Scholar 

  72. C. Che, Q. Lin, J. Pearson, B. de Vries, and J.L. Flanagan, “Microphone arrays and neural networks for robust speech recognition,” in Proc. ARPA Human Language Technology (HLT), pp. 342–348, 1994.

    Chapter  Google Scholar 

  73. W. Ward, G. Elko, R. Kubli, and W. McDougald, “The new varechoic chamber at AT andT Bell Labs,” in Proc. of Wallance Clement Sabine Centennial Symposium, pp. 343–346, 1994.

    Google Scholar 

  74. N. Aoshima, “Computer-generated pulse signal applied for sound measurement,” J. Acoust. Soc. Am., vol. 69, no. 5, pp. 1484–1488, 1981.

    Article  Google Scholar 

  75. Y. Suzuki, F. Asano, H. Y. Kim, and T. Sone, “An optimum computer-generated pulse signal suitable for the measurement of very long impulse responses,” J. Acoust. Soc. Am., vol. 97, no. 2, pp. 1119–1123, 1995.

    Article  Google Scholar 

  76. J.B. Allen and D.A. Berkley, “Image method for efficiently simulating small-room acoustics,” J. Acoust. Soc. Am., vol. 65, no. 4, pp. 943–950, 1979.

    Article  Google Scholar 

  77. D. Giuliani, M. Matassoni, M. Omologo, and P. Svaizer, “Experiments of HMM adaptation for hands-free connected digit recognition,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (ICASSP-98), Seattle WA, USA, pp. 473476, May 1998.

    Google Scholar 

  78. D. Giuliani, M. Matassoni, M. Omologo, and P. Svaizer, “Use of filtered clean speech for robust HMM training,” in Proc. of the Workshop on Robust Methods for Speech Recognition in Adverse Conditions, pp. 99–102, 1999.

    Google Scholar 

  79. M. Matassoni, M. Omologo, and D. Giuliani, “Hands-free speech recognition using a filtered clean corpus and incremental HMM adaptation,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (ICASSP-00), Istanbul, Turkey, pp. 1407–1410, June 2000.

    Google Scholar 

  80. B. Angelini, F. Brugnara, D. Falavigna, D. Giuliani, R. Gretter, and M. Omologo, “Speaker independent continuous speech recognition using an acoustic-phonetic italian corpus,” in Proc. of ICSLP, pp. 1391–1394, 1994.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Omologo, M., Matassoni, M., Svaizer, P. (2001). Speech Recognition with Microphone Arrays. In: Brandstein, M., Ward, D. (eds) Microphone Arrays. Digital Signal Processing. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-04619-7_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-04619-7_15

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-07547-6

  • Online ISBN: 978-3-662-04619-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics