Abstract
Microphone arrays can be advantageously employed in Automatic Speech Recognition (ASR) systems to allow distant-talking interaction. Their beam-forming capabilities are used to enhance the speech message, while attenuating the undesired contribution of environmental noise and reverberation. In the first part of this chapter the state of the art of ASR systems is briefly reviewed, with a particular concern about robustness in distant-talker applications. The objective is the reduction of the mismatch between the real noisy data and the acoustic models used by the recognizer. Beamforming, speech enhancement, feature compensation, and model adaptation are the techniques adopted to this end. The second part of the chapter is dedicated to the description of a microphone-array based speech recognition system developed at ITC-IRST. It includes a linear array beamformer, an acoustic front-end for speech activity detection and feature extraction, a recognition engine based on Hidden Markov Models and the modules for training and adaptation of the acoustic models. Finally the performance of this system on a typical recognition task is reported.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
L.R. Rabiner, B.H. Juang, Fundamentals of speech recognition, Prentice Hall, 1993.
R. De Mori, Spoken dialogues with computers, Academic Press, 1998.
A. Acero, Acoustical and environmental robustness in automatic speech recognition, Kluwer, 1992.
Y. Gong, “Speech recognition in noisy environments: A survey,” Speech Communication, vol. 16, pp. 261–291, 1995.
J.C. Junqua and J.P. Haton, Robustness in automatic speech recognition. Kluwer, 1996.
C.H. Lee, F.K. Soong, and K.K. Paliwal, Automatic speech and speaker recognition. Kluwer, 1996.
S. Furui, “Recent advances in robust speech recognition,” in Proc. of ESCA-NATO Workshop on Robust Speech Recognition for Unknown Communication Channels, pp. 11–20, 1997.
M. Omologo, P. Svaizer, and M. Matassoni, “Environmental conditions and acoustic transduction in hands-free speech recognition,” Speech Communication, vol. 25, pp. 75–95, 1998.
J. C. Junqua, “The lombard reflex and its role on human listeners and automatic speech recognizers,” J. Acoust. Soc. Am., vol. 93, pp. 510–524, 1993.
L.R. Rabiner and R.W. Schafer, Digital processing of speech signals Prentice Hall, 1978.
L.R. Rabiner and M. Sambur, “An algorithm for determining the endpoints of isolated utterances,” Bell Sys. Tech. Journal, vol. 54, no. 2, pp. 297–315, 1975.
L.F. Lamel, L.R. Rabiner, A.E. Rosenberg, and J.G. Wilpon, “An improved endpoint detector for isolated word recognition,” IEEE Trans. on Acoustics, Speech and Signal Processing, vol. 29, pp. 777–785, 1981.
H. Ney, “An optimization algorithm for determining the endpoints of isolated utterances,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (ICASSP-81), Atlanta GA, USA, pp. 720–723, 1981.
J.C. Junqua, B. Mak, and B. Reaves, “A robust algorithm for word boundary detection in the presence of noise,” IEEE Trans. on Speech and Audio Processing, vol. 2, no. 3, pp. 406–412, 1994.
D. O’Shaughnessy, Speech Communications, IEEE Press, 2000.
L. R. Rabiner, “A tutorial on hidden Markov models and selected applications in speech recognition,” Proc. IEEE, vol. 77, pp. 257–286, 1989.
C. H. Lee, “On stochastic feature and model compensation approaches to robust speech recognition,” Speech Communication, vol. 25, pp. 29–47, 1998.
J.S. Lim, Speech Enhancement, Prentice Hall, 1983.
Y. Ephraim, “Gain-adapted hidden Markov models for recognition of clean and noisy speech,” IEEE Trans. on Signal Processing, vol. 40, pp. 1303–1316, 1992.
S. V. Vaseghi, Advanced Signal Processing and Digital Noise Reduction Wiley and Teubner, 1996.
S. Boll, “Speech enhancement in the 1980s, Noise suppression with pattern matching,” in Advances in Speech Signal Processing, (S. Furui and M.M. Sondhi, eds.), pp. 309–325, Marcel Dakker, 1992.
M. Rahim and B.H. Juang, “Signal bias removal by maximum likelihood estimation for robust telephone speech recognition,” IEEE Trans. on Speech and Audio Processing, vol. 4, pp. 19–30, 1996.
C. Lawrence and M. Rahim, “Integrated bias removal techniques for robust speech recognition,” Computer Speech and Language, vol. 13, pp. 283–298, 1999.
A. Sankar and C.H. Lee, “A maximum-likelihood approach to stochastic matching for robust speech recognition,” IEEE Trans. on Speech and Audio Processing, vol. 4, pp. 190–202, 1996.
A. Nadas, D. Nahamoo, and M. Picheny, “Speech recognition using noise-adaptive prototypes,” IEEE Trans. on Acoustics, Speech and Signal Processing, vol. 37, no. 10, pp. 1495–1503, 1989.
I. Sanches, “Noise-compensated hidden Markov models,” IEEE Trans. on Speech and Audio Processing, vol. 8, no. 5, pp. 533–540, 2000.
Y. Zhao, “Frequency-domain maximum likelihood estimation for automatic speech recognition in additive and convolutive noises,” IEEE Trans. on Speech and Audio Processing, vol. 8, no. 3, pp. 255–266, 2000.
M.J.F. Gales, Model-based techniques for noise robust speech recognition, PhD thesis, Cambridge University, Cambridge, England, 1995.
M. J. F. Gales and S. J. Young, “Robust speech recognition using parallel model combination,” IEEE Trans. on Speech and Audio Processing, vol. 4, no. 5, pp. 352–359, 1996.
J. L. Gauvain and C. H. Lee, “Maximum a posteriori estimation for multivariate gaussian mixture observations of Markov chains,” IEEE Trans. on Speech and Audio Processing, vol. 2, no. 2, pp. 291–298, 1994.
C.J. Leggetter and P.C. Woodland, “Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models,” Computer Speech and Language, vol. 9, pp. 171–185, 1995.
M. J. F. Gales and P. C. Woodland, “Mean and variance adaptation within the MLLR framework,” Computer Speech and Language, vol. 10, pp. 249–264, 1996.
M. J. F. Gales, “Maximum likelihood linear transformations for HMM-based speech recognition,” Computer Speech and Language, vol. 12, pp. 75–98, 1998.
S. Das, R. Bakis, A. Nadas, D. Nahamoo, and M. Picheny, “Influence of background noise and microphone on the performance of the ibm tangora speech recognition system,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (ICASSP-93), Minneapolis MN, USA, pp. 95–98, Apr. 1993.
B. A. Dautrich, L. R. Rabiner, and T. B. Martin, “On the effects of varying filter bank parameters on isolated word recognition,” IEEE Trans. on Acoustics, Speech and Signal Processing, vol. 31, no. 4, pp. 793–897, 1983.
S. Furui, “Robust speech recognition under adverse conditions,” in Proc. ESCA Workshop on Speech Processing in Adverse Conditions, pp. 31–42, 1992.
C.H. Knapp and G.C. Carter, “The generalized correlation method for estimation of time delay,” IEEE Trans. on Acoustics, Speech and Signal Processing, vol. 24, no. 4, pp. 320–327, 1976.
M. Omologo and P. Svaizer, “Use of the cross-power-spectrum phase in acoustic event location,” IEEE Trans. on Speech and Audio Processing, vol. 5, no. 3, pp. 288–292, 1997.
Y. Kaneda and J. Ohga, “Adaptive microphone-array system for noise reduction,” IEEE Trans. on Acoustics, Speech and Signal Processing, vol. 34, no. 6, pp. 1391–1400, 1986.
R. Zelinski, “A microphone array with adaptive post-filtering for noise reduction in reverberant rooms,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (ICASSP-88), New York NY, USA, pp. 2578–2581, Apr. 1988.
S. Haykin, ed., Advances in spectrum analysis and array processing. Prentice Hall, 1995.
M. W. Hoffman and K. M. Buckley, “Robust time-domain processing of broadband microphone array data,” IEEE Trans. on Speech and Audio Processing, vol. 3, no. 3, pp. 193–203, 1995.
S. Fischer and K. U. Simmer, “Beamforming microphone arrays for speech acquisition in noisy environments,” Speech Communication, vol. 20, no. 3–4, pp. 215–27, 1996.
O.L. Frost, “An algorithm for linearly constrained adaptive array processing,” Proc. of IEEE, vol. 60, no. 8, pp. 926–935, 1972.
L.J. Griffiths and C.W. Jim, “An alternative approach to linearly constrained adaptive beamforming,” IEEE Trans. on Antennas and Propagation, vol. 30, no. 1, pp. 27–34, 1982.
J. Bitzer, K.U. Simmer, and K.D. Kammeyer, “Multi-microphone noise reduction techniques for hands-free speech recognition–a comparative study,” in Proc. of the Workshop on Robust Methods for Speech Recognition in Adverse Conditions, Tampere, Finland, pp. 171–174, 1999.
J.L. Flanagan, A.C. Surendran, and E.E. Jan, “Spatially selective sound capture for speech and audio processing,” Speech Communication, vol. 13, pp. 207222, 1993.
E.E. Jan, P. Svaizer, and J.L. Flanagan, “Matched-filter processing of microphone array for spatial volume selectivity,” in Proc. of IEEE ISCAS, pp. 14601463, 1995.
C. Marro, Y. Mahieux, and K.U. Simmer, “Analysis of noise reduction and dereverberation techniques based on microphone arrays with postfiltering,” IEEE Trans. Speech and Audio Proc., vol. 6, no. 3, pp. 240–259, 1998.
D. Van Compernolle, “Switching adaptive filters for enhancing noisy and reverberant speech from microphone array recordings,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (ICASSP-90), Albuquerque NM, USA, pp. 833–836, Apr. 1990.
Y. Grenier, “A microphone array for car environments,” Speech Communication, vol. 12, pp. 25–39, 1993.
T.M. Sullivan and R.M. Stern, “Multi-microphone correlation-based processing for robust speech recognition,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (ICASSP-93), Minneapolis MN, USA, pp. 91–94, Apr. 1993.
P. Raghavan, R.J. Renomeron, C. Che, D.S. Yuk, and J.L. Flanagan, “Speech recognition in a reverberant environment using matched filter array (MFA) processing and linguistic-tree maximum likelihood linear regression (LTMLLR) adaptation,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (ICASSP-99), Phoenix AZ, USA, pp. 777–780, Mar. 1999.
T.B. Hughes, H.S. Kim, J.H. DiBiase, and H.F. Silverman, “Performance of an HMM Speech Recognizer using a real-time tracking microphone array as input,” IEEE Trans. on Speech and Audio Proc., vol. 7, no. 3, pp. 346–349, 1999.
T. Takiguchi, S. Nakamura, and K. Shikano, “Speech recognition for a distant moving speaker based on HMM composition and separation,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (ICASSP-00), Istanbul, Turkey, pp. 1403–1406, June 2000.
J. Kleban and Y. Gong, “HMM adaptation and microphone array processing for distant speech recognition,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (ICASSP-00), Istanbul, Turkey, pp. 1411–1414, June 2000.
D. V. Rabinkin, R. J.Renomeron, J. C. French, and J. L. Flanagan, “Optimum microphone placement for array sound capture,” Proc. of the SPIE, vol. 3162, pp. 227–39, 1997.
M. Inoue, S. Nakamura, T. Yamada, and K Shikano, “Microphone array design measures for hands-free speech recognition,” in Proc. of EUROSPEECH, pp. 331–334, 1997.
D. Giuliani, M. Matassoni, M. Omologo, and P. Svaizer, “Use of different microphone array configurations for hands-free speech recognition in noisy and reverberant environments,” in Proc. of EUROSPEECH, pp. 347–350, 1997.
M. Omologo, M. Matassoni, P. Svaizer, and D. Giuliani, “Microphone array based speech recognition with different talker-array positions,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (ICASSP-97), Munich, Germany, pp. 227–230, Apr. 1997.
S. Nakamura, T. Yamada, P. Heracleous, and K. Shikano, “Recognition of distant-talking speech based on 3-D trellis search using a microphone array and adaptive beamforming,” in Proc. of the Workshop on Robust Methods for Speech Recognition in Adverse Conditions, Tampere, Finland, pp. 219–222, 1999.
S. Oh, and V. Viswanathan, “Hands-free voice communication in an automobile with a microphone array,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (ICASSP-92), San Francisco CA, USA, pp. 281–284, Mar. 1992.
R. Le Bouquin, “Enhancement of noisy speech signals, application to mobile radio communications,” Speech Communication, vol. 18, pp. 3–19, 1996.
D. Mansour and B.H. Juang, “The short-time modified coherence representation and noisy speech recognition,” IEEE Trans. on Acoustics, Speech and Signal Processing, vol. 37, no. 6, pp. 795–804, 1989.
B. Yegnanarayana, P. Satyanarayana Murthy, C. Avendano, and H. Herman-sky, “Enhancement of reverberant speech using LP residual,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (ICASSP-98), Seattle WA, USA, pp. 405–408, May 1998.
M. Brandstein, “On the use of explicit speech modeling in microphone array applications,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (ICASSP-98), Seattle WA, USA pp. 3613–3616, May 1998.
M. Brandstein, “An event-based method for microphone array speech enhancement,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (ICASSP99), Phoenix AZ, USA, pp. 953–956, Mar. 1999.
D. Giuliani, M. Matassoni, M. Omologo, and P. Svaizer, “Training of HMM with filtered speech material for hands-free speech recognition,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (ICASSP-99), Phoenix AZ, USA, pp. 449–452, Mar. 1999.
Y. Shimizu, S. Kajita, K. Takeda, and F. Itakura, “Speech recognition based on space diversity using distributed multi-microphone,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (ICASSP-00), Istanbul, Turkey, pp. 197–200, June 2000.
D. Giuliani, M. Omologo, and P. Svaizer, “Experiments of speech recognition in a noisy and reverberant environment using a microphone array and HMM adaptation,” in Proc. of ICSLP, pp. 1329–1332, 1996.
Q. Lin, C.W. Che, D.S. Yuk, L. Jin, B. de Vries, J. Pearson, and J.L. Flanagan, “Robust distant-talking speech recognition,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (ICASSP-96), Atlanta GA, USA, pp. 21–24, May 1996.
C. Che, Q. Lin, J. Pearson, B. de Vries, and J.L. Flanagan, “Microphone arrays and neural networks for robust speech recognition,” in Proc. ARPA Human Language Technology (HLT), pp. 342–348, 1994.
W. Ward, G. Elko, R. Kubli, and W. McDougald, “The new varechoic chamber at AT andT Bell Labs,” in Proc. of Wallance Clement Sabine Centennial Symposium, pp. 343–346, 1994.
N. Aoshima, “Computer-generated pulse signal applied for sound measurement,” J. Acoust. Soc. Am., vol. 69, no. 5, pp. 1484–1488, 1981.
Y. Suzuki, F. Asano, H. Y. Kim, and T. Sone, “An optimum computer-generated pulse signal suitable for the measurement of very long impulse responses,” J. Acoust. Soc. Am., vol. 97, no. 2, pp. 1119–1123, 1995.
J.B. Allen and D.A. Berkley, “Image method for efficiently simulating small-room acoustics,” J. Acoust. Soc. Am., vol. 65, no. 4, pp. 943–950, 1979.
D. Giuliani, M. Matassoni, M. Omologo, and P. Svaizer, “Experiments of HMM adaptation for hands-free connected digit recognition,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (ICASSP-98), Seattle WA, USA, pp. 473476, May 1998.
D. Giuliani, M. Matassoni, M. Omologo, and P. Svaizer, “Use of filtered clean speech for robust HMM training,” in Proc. of the Workshop on Robust Methods for Speech Recognition in Adverse Conditions, pp. 99–102, 1999.
M. Matassoni, M. Omologo, and D. Giuliani, “Hands-free speech recognition using a filtered clean corpus and incremental HMM adaptation,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (ICASSP-00), Istanbul, Turkey, pp. 1407–1410, June 2000.
B. Angelini, F. Brugnara, D. Falavigna, D. Giuliani, R. Gretter, and M. Omologo, “Speaker independent continuous speech recognition using an acoustic-phonetic italian corpus,” in Proc. of ICSLP, pp. 1391–1394, 1994.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Omologo, M., Matassoni, M., Svaizer, P. (2001). Speech Recognition with Microphone Arrays. In: Brandstein, M., Ward, D. (eds) Microphone Arrays. Digital Signal Processing. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-04619-7_15
Download citation
DOI: https://doi.org/10.1007/978-3-662-04619-7_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-07547-6
Online ISBN: 978-3-662-04619-7
eBook Packages: Springer Book Archive