Abstract
This paper presents a combined microphone array and model adaptation algorithm for hands-free speech recognition. Our purpose is to remove the inconvenience of using head-mounted/hand-holding microphone in conventional speech recognizer. To improve the speech quality with car noise interference, a linear microphone array is applied and acted as robust acquisition system. A time-domain coherence measure (TDCM) is applied to reliably estimate the time delay for speech signals collected by different microphones. The estimated delay is adopted in a delay-and-sum beamformer for speech enhancement. Further, we adapt the speech hidden Markov models to get close to the acoustic conditions of the enhanced test speech for robust speech recognition. In acquisition and recognition experiments using connected Chinese digits, we found that TDCM can effectively estimate the time delay. The increase in the speech sampling rate is helpful to determine the time delay. Incorporating the model adaptation scheme significantly reduces the recognition errors with moderate computation overhead.
Similar content being viewed by others
References
D.H. Johnson and D.E. Dudgeon, Array Signal Processing: Concepts and Techniques, Prentice-Hall, Inc., 1993.
H.F. Silverman, ‘Some Analysis of Microphone Arrays for Speech Data Acquisition,’ IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. ASSP-35, no. 12, 1987, pp. 1699-1711.
M. Omologo and P. Svaizer, ‘Acoustic Event Localization Using a Crosspower-Spectrum Phone Phase Based Technique,’ IEEE Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 2, 1994, pp. 273-276.
T. Yamada, S. Nakamura, and K. Shikano, ‘Robust Speech Recognition with Speaker Localization by a Microphone Array,’ in Proceedings of International Conference on Spoken Language Processing (ICSLP), 1996, pp. 1317-1320.
M.S. Brandstein, J.E. Adcock, and H.F. Silverman, ‘A Closed-Form Location Estimator for Use with Room Environment Microphone Arrays,’ IEEE Transactions on Speech and Audio Processing, vol. 5, no. 1, 1997, pp. 45-50.
T. Nishiura, T. Yamada, S. Nakamura, and K. Shikano, ‘Localization of Multiple Sound Sources Based on a CSP Analysis with a Microphone Array,’ in IEEE Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP),’ vol. 2, 2000, pp. 1053-1056.
Y. Nagata and H. Tsuboi, ‘A Two-Channel Adaptive Microphone Array with Target Tracking,’ in Proceedings of European Conference on Speech Communication and Technology (EUROSPEECH), 1997, pp. 343-346.
M. Mizumachi and M. Akagi, ‘Noise Reduction by Paired-Microphones Using Spectral Substraction,’ in IEEE Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 2, 1998, pp. 1001-1004.
Q.-G. Liu, B. Champagne, and P. Kabal, ‘A Microphone Array Processing Technique for Speech Enhancement in a Reverberant Space,’ Speech Communication, vol. 18, 1996, pp. 317-334.
M. Dahl and I. Claesson, ‘Acoustic Noise and Echo Canceling with Microphone Array,’ IEEE Transactions on Vehicular Technology, vol. 48, no. 5, 1999, pp. 1518-1526.
M. Dorbecker, ‘Small Microphone Arrays with Optimized Directivity for Speech Enhancement,’ in Proceedings of European Conference on Speech Communication and Technology (EUROSPEECH), 1997, pp. 327-330.
H. Saruwatari, S. Kajita, K. Takeda, and F. Itakura, ‘Speech Enhancement Using Nonlinear Microphone Array with Complementary Beamforming,’ in IEEE Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1, 1999, pp. 69-72.
D. Mahmoudi, ‘A Microphone Array for Speech Enhancement Using Multiresolution Wavelet Transform,’ in Proceedings of European Conference on Speech Communication and Technology (EUROSPEECH), 1997, pp. 339-342.
D. Mahmoudi and A. Drygajlo, ‘Combined Wiener and Coherence Filtering in Wavelet Domain for Microphone Array Speech Enhancement,’ in IEEE Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1, 1998, pp. 385-388.
M. Inoue, S. Nakamura, T. Yamada, and K. Shikano, ‘Microphone Array Design Measures for Hands-Free Speech Recognition,’ in Proceedings of European Conference on Speech Communication and Technology (EUROSPEECH), 1997, pp. 331-334.
D. Giuliani, M. Matassoni, M. Omologo, and P. Svaizer, ‘Use of Different Microphone Array Configurations for Hands-Free Speech Recognition in Noisy and Reverberant Environment,’ in Proceedings of European Conference on Speech Communication and Technology (EUROSPEECH), 1997, pp. 347-350.
R. Aubauer, R. Kern, and D. Leckschat, ‘Optimized Second Order Gradient Microphone for Hands-Free Speech Recordings in Cars,’ in Proceeding of Workshop on Robust Methods for Speech Recognition in Adverse Conditions, 1999, pp. 191-194.
J. Bitzer, K.U. Simmer, and K.D. Kammeyer, ‘Multi-Microphone Noise Reduction Techniques for Hands-Free Speech Recognition-A Comparative Study,’ in Proceeding of Workshop on Robust Methods for Speech Recognition in Adverse Conditions, 1999, pp. 171-174.
T. Yamada, S. Nakamura, and K. Shikano, ‘Hands-Free Speech Recognition Based on 3-D Viterbi Search Using a Microphone Array,’ in IEEE Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1, 1998, pp. 245-248.
J.E. Adcock, Y. Gotoh, D.J. Mashao, and H.F. Silverman, ‘Microphone-Array Speech Recognition via Incremental MAP Training,’ in IEEE Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 2, 1996, pp. 897-900.
J.L. Gauvain and C.-H. Lee, ‘Maximum a posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains,’ IEEE Transactions on Speech and Audio Processing, vol. 2, 1994, pp. 291-298.
C.J. Leggetter and P.C. Woodland, ‘Maximum Likelihood Linear Regression for Speaker Adaptation of Continuous Density Hidden Markov Models,’ Computer Speech and Language, vol. 9, 1995, pp. 171-185.
D. Giuliani, M. Matassoni, M. Omologo, and P. Svaizer, ‘Experiments of HMM Adaptation for Hands-Free Connected Digit Recognition,’ in IEEE Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1, 1998, pp. 473-476.
J. Kleban and Y. Gong, ‘HMM Adaptation and Microphone Array Processing for Distant Speech Recognition,’ in IEEE Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 3, 2000, pp. 1411-1414.
H.F. Silverman and S.E. Kirtman, ‘A Two-Atage Algorithm for Determining Talker Location from Linear Microphone Array Data,’ Computer Speech and Language, vol. 6, 1992, pp. 129-152.
A.P. Dempster, N.M. Laird, and D.B. Rubin, ‘Maximum Likelihood from Incomplete Data via the EM Algorithm,’ J. Royal Statist. Society (B), vol. 39, 1977, pp. 1-38.
J.-T. Chien and J.-C. Junqua, ‘Unsupervised Hierarchical Adaptation Using Reliable Selection of Cluster-Dependent Parameters,’ Speech Communication, vol. 30, no. 4, 2000, pp. 235-253.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Chien, JT., Lai, JR. Use of Microphone Array and Model Adaptation for Hands-Free Speech Acquisition and Recognition. The Journal of VLSI Signal Processing-Systems for Signal, Image, and Video Technology 36, 141–151 (2004). https://doi.org/10.1023/B:VLSI.0000015093.07192.eb
Published:
Issue Date:
DOI: https://doi.org/10.1023/B:VLSI.0000015093.07192.eb