Skip to main content
Log in

Use of Microphone Array and Model Adaptation for Hands-Free Speech Acquisition and Recognition

  • Published:
Journal of VLSI signal processing systems for signal, image and video technology Aims and scope Submit manuscript

Abstract

This paper presents a combined microphone array and model adaptation algorithm for hands-free speech recognition. Our purpose is to remove the inconvenience of using head-mounted/hand-holding microphone in conventional speech recognizer. To improve the speech quality with car noise interference, a linear microphone array is applied and acted as robust acquisition system. A time-domain coherence measure (TDCM) is applied to reliably estimate the time delay for speech signals collected by different microphones. The estimated delay is adopted in a delay-and-sum beamformer for speech enhancement. Further, we adapt the speech hidden Markov models to get close to the acoustic conditions of the enhanced test speech for robust speech recognition. In acquisition and recognition experiments using connected Chinese digits, we found that TDCM can effectively estimate the time delay. The increase in the speech sampling rate is helpful to determine the time delay. Incorporating the model adaptation scheme significantly reduces the recognition errors with moderate computation overhead.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. D.H. Johnson and D.E. Dudgeon, Array Signal Processing: Concepts and Techniques, Prentice-Hall, Inc., 1993.

  2. H.F. Silverman, ‘Some Analysis of Microphone Arrays for Speech Data Acquisition,’ IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. ASSP-35, no. 12, 1987, pp. 1699-1711.

    Article  Google Scholar 

  3. M. Omologo and P. Svaizer, ‘Acoustic Event Localization Using a Crosspower-Spectrum Phone Phase Based Technique,’ IEEE Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 2, 1994, pp. 273-276.

    Google Scholar 

  4. T. Yamada, S. Nakamura, and K. Shikano, ‘Robust Speech Recognition with Speaker Localization by a Microphone Array,’ in Proceedings of International Conference on Spoken Language Processing (ICSLP), 1996, pp. 1317-1320.

  5. M.S. Brandstein, J.E. Adcock, and H.F. Silverman, ‘A Closed-Form Location Estimator for Use with Room Environment Microphone Arrays,’ IEEE Transactions on Speech and Audio Processing, vol. 5, no. 1, 1997, pp. 45-50.

    Article  Google Scholar 

  6. T. Nishiura, T. Yamada, S. Nakamura, and K. Shikano, ‘Localization of Multiple Sound Sources Based on a CSP Analysis with a Microphone Array,’ in IEEE Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP),’ vol. 2, 2000, pp. 1053-1056.

    Google Scholar 

  7. Y. Nagata and H. Tsuboi, ‘A Two-Channel Adaptive Microphone Array with Target Tracking,’ in Proceedings of European Conference on Speech Communication and Technology (EUROSPEECH), 1997, pp. 343-346.

  8. M. Mizumachi and M. Akagi, ‘Noise Reduction by Paired-Microphones Using Spectral Substraction,’ in IEEE Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 2, 1998, pp. 1001-1004.

    Google Scholar 

  9. Q.-G. Liu, B. Champagne, and P. Kabal, ‘A Microphone Array Processing Technique for Speech Enhancement in a Reverberant Space,’ Speech Communication, vol. 18, 1996, pp. 317-334.

    Article  Google Scholar 

  10. M. Dahl and I. Claesson, ‘Acoustic Noise and Echo Canceling with Microphone Array,’ IEEE Transactions on Vehicular Technology, vol. 48, no. 5, 1999, pp. 1518-1526.

    Article  Google Scholar 

  11. M. Dorbecker, ‘Small Microphone Arrays with Optimized Directivity for Speech Enhancement,’ in Proceedings of European Conference on Speech Communication and Technology (EUROSPEECH), 1997, pp. 327-330.

  12. H. Saruwatari, S. Kajita, K. Takeda, and F. Itakura, ‘Speech Enhancement Using Nonlinear Microphone Array with Complementary Beamforming,’ in IEEE Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1, 1999, pp. 69-72.

    Google Scholar 

  13. D. Mahmoudi, ‘A Microphone Array for Speech Enhancement Using Multiresolution Wavelet Transform,’ in Proceedings of European Conference on Speech Communication and Technology (EUROSPEECH), 1997, pp. 339-342.

  14. D. Mahmoudi and A. Drygajlo, ‘Combined Wiener and Coherence Filtering in Wavelet Domain for Microphone Array Speech Enhancement,’ in IEEE Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1, 1998, pp. 385-388.

    Google Scholar 

  15. M. Inoue, S. Nakamura, T. Yamada, and K. Shikano, ‘Microphone Array Design Measures for Hands-Free Speech Recognition,’ in Proceedings of European Conference on Speech Communication and Technology (EUROSPEECH), 1997, pp. 331-334.

  16. D. Giuliani, M. Matassoni, M. Omologo, and P. Svaizer, ‘Use of Different Microphone Array Configurations for Hands-Free Speech Recognition in Noisy and Reverberant Environment,’ in Proceedings of European Conference on Speech Communication and Technology (EUROSPEECH), 1997, pp. 347-350.

  17. R. Aubauer, R. Kern, and D. Leckschat, ‘Optimized Second Order Gradient Microphone for Hands-Free Speech Recordings in Cars,’ in Proceeding of Workshop on Robust Methods for Speech Recognition in Adverse Conditions, 1999, pp. 191-194.

  18. J. Bitzer, K.U. Simmer, and K.D. Kammeyer, ‘Multi-Microphone Noise Reduction Techniques for Hands-Free Speech Recognition-A Comparative Study,’ in Proceeding of Workshop on Robust Methods for Speech Recognition in Adverse Conditions, 1999, pp. 171-174.

  19. T. Yamada, S. Nakamura, and K. Shikano, ‘Hands-Free Speech Recognition Based on 3-D Viterbi Search Using a Microphone Array,’ in IEEE Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1, 1998, pp. 245-248.

    Google Scholar 

  20. J.E. Adcock, Y. Gotoh, D.J. Mashao, and H.F. Silverman, ‘Microphone-Array Speech Recognition via Incremental MAP Training,’ in IEEE Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 2, 1996, pp. 897-900.

    Google Scholar 

  21. J.L. Gauvain and C.-H. Lee, ‘Maximum a posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains,’ IEEE Transactions on Speech and Audio Processing, vol. 2, 1994, pp. 291-298.

    Article  Google Scholar 

  22. C.J. Leggetter and P.C. Woodland, ‘Maximum Likelihood Linear Regression for Speaker Adaptation of Continuous Density Hidden Markov Models,’ Computer Speech and Language, vol. 9, 1995, pp. 171-185.

    Article  Google Scholar 

  23. D. Giuliani, M. Matassoni, M. Omologo, and P. Svaizer, ‘Experiments of HMM Adaptation for Hands-Free Connected Digit Recognition,’ in IEEE Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1, 1998, pp. 473-476.

    Google Scholar 

  24. J. Kleban and Y. Gong, ‘HMM Adaptation and Microphone Array Processing for Distant Speech Recognition,’ in IEEE Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 3, 2000, pp. 1411-1414.

    Google Scholar 

  25. H.F. Silverman and S.E. Kirtman, ‘A Two-Atage Algorithm for Determining Talker Location from Linear Microphone Array Data,’ Computer Speech and Language, vol. 6, 1992, pp. 129-152.

    Article  Google Scholar 

  26. A.P. Dempster, N.M. Laird, and D.B. Rubin, ‘Maximum Likelihood from Incomplete Data via the EM Algorithm,’ J. Royal Statist. Society (B), vol. 39, 1977, pp. 1-38.

    MathSciNet  MATH  Google Scholar 

  27. J.-T. Chien and J.-C. Junqua, ‘Unsupervised Hierarchical Adaptation Using Reliable Selection of Cluster-Dependent Parameters,’ Speech Communication, vol. 30, no. 4, 2000, pp. 235-253.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chien, JT., Lai, JR. Use of Microphone Array and Model Adaptation for Hands-Free Speech Acquisition and Recognition. The Journal of VLSI Signal Processing-Systems for Signal, Image, and Video Technology 36, 141–151 (2004). https://doi.org/10.1023/B:VLSI.0000015093.07192.eb

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1023/B:VLSI.0000015093.07192.eb

Navigation