Use of Microphone Array and Model Adaptation for Hands-Free Speech Acquisition and Recognition

Chien, Jen-Tzung; Lai, Jain-Ray

doi:10.1023/B:VLSI.0000015093.07192.eb

Use of Microphone Array and Model Adaptation for Hands-Free Speech Acquisition and Recognition

Published: 01 February 2004

Volume 36, pages 141–151, (2004)
Cite this article

Journal of VLSI signal processing systems for signal, image and video technology Aims and scope Submit manuscript

Jen-Tzung Chien¹ &
Jain-Ray Lai¹

93 Accesses
3 Citations
Explore all metrics

Abstract

This paper presents a combined microphone array and model adaptation algorithm for hands-free speech recognition. Our purpose is to remove the inconvenience of using head-mounted/hand-holding microphone in conventional speech recognizer. To improve the speech quality with car noise interference, a linear microphone array is applied and acted as robust acquisition system. A time-domain coherence measure (TDCM) is applied to reliably estimate the time delay for speech signals collected by different microphones. The estimated delay is adopted in a delay-and-sum beamformer for speech enhancement. Further, we adapt the speech hidden Markov models to get close to the acoustic conditions of the enhanced test speech for robust speech recognition. In acquisition and recognition experiments using connected Chinese digits, we found that TDCM can effectively estimate the time delay. The increase in the speech sampling rate is helpful to determine the time delay. Incorporating the model adaptation scheme significantly reduces the recognition errors with moderate computation overhead.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Review on Kalman Filter Models

Article 01 October 2022

Cochlear Implant Research and Development in the Twenty-first Century: A Critical Update

Article Open access 25 August 2021

Introduction to Acoustic Terminology and Signal Processing

References

D.H. Johnson and D.E. Dudgeon, Array Signal Processing: Concepts and Techniques, Prentice-Hall, Inc., 1993.
H.F. Silverman, ‘Some Analysis of Microphone Arrays for Speech Data Acquisition,’ IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. ASSP-35, no. 12, 1987, pp. 1699-1711.
Article Google Scholar
M. Omologo and P. Svaizer, ‘Acoustic Event Localization Using a Crosspower-Spectrum Phone Phase Based Technique,’ IEEE Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 2, 1994, pp. 273-276.
Google Scholar
T. Yamada, S. Nakamura, and K. Shikano, ‘Robust Speech Recognition with Speaker Localization by a Microphone Array,’ in Proceedings of International Conference on Spoken Language Processing (ICSLP), 1996, pp. 1317-1320.
M.S. Brandstein, J.E. Adcock, and H.F. Silverman, ‘A Closed-Form Location Estimator for Use with Room Environment Microphone Arrays,’ IEEE Transactions on Speech and Audio Processing, vol. 5, no. 1, 1997, pp. 45-50.
Article Google Scholar
T. Nishiura, T. Yamada, S. Nakamura, and K. Shikano, ‘Localization of Multiple Sound Sources Based on a CSP Analysis with a Microphone Array,’ in IEEE Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP),’ vol. 2, 2000, pp. 1053-1056.
Google Scholar
Y. Nagata and H. Tsuboi, ‘A Two-Channel Adaptive Microphone Array with Target Tracking,’ in Proceedings of European Conference on Speech Communication and Technology (EUROSPEECH), 1997, pp. 343-346.
M. Mizumachi and M. Akagi, ‘Noise Reduction by Paired-Microphones Using Spectral Substraction,’ in IEEE Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 2, 1998, pp. 1001-1004.
Google Scholar
Q.-G. Liu, B. Champagne, and P. Kabal, ‘A Microphone Array Processing Technique for Speech Enhancement in a Reverberant Space,’ Speech Communication, vol. 18, 1996, pp. 317-334.
Article Google Scholar
M. Dahl and I. Claesson, ‘Acoustic Noise and Echo Canceling with Microphone Array,’ IEEE Transactions on Vehicular Technology, vol. 48, no. 5, 1999, pp. 1518-1526.
Article Google Scholar
M. Dorbecker, ‘Small Microphone Arrays with Optimized Directivity for Speech Enhancement,’ in Proceedings of European Conference on Speech Communication and Technology (EUROSPEECH), 1997, pp. 327-330.
H. Saruwatari, S. Kajita, K. Takeda, and F. Itakura, ‘Speech Enhancement Using Nonlinear Microphone Array with Complementary Beamforming,’ in IEEE Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1, 1999, pp. 69-72.
Google Scholar
D. Mahmoudi, ‘A Microphone Array for Speech Enhancement Using Multiresolution Wavelet Transform,’ in Proceedings of European Conference on Speech Communication and Technology (EUROSPEECH), 1997, pp. 339-342.
D. Mahmoudi and A. Drygajlo, ‘Combined Wiener and Coherence Filtering in Wavelet Domain for Microphone Array Speech Enhancement,’ in IEEE Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1, 1998, pp. 385-388.
Google Scholar
M. Inoue, S. Nakamura, T. Yamada, and K. Shikano, ‘Microphone Array Design Measures for Hands-Free Speech Recognition,’ in Proceedings of European Conference on Speech Communication and Technology (EUROSPEECH), 1997, pp. 331-334.
D. Giuliani, M. Matassoni, M. Omologo, and P. Svaizer, ‘Use of Different Microphone Array Configurations for Hands-Free Speech Recognition in Noisy and Reverberant Environment,’ in Proceedings of European Conference on Speech Communication and Technology (EUROSPEECH), 1997, pp. 347-350.
R. Aubauer, R. Kern, and D. Leckschat, ‘Optimized Second Order Gradient Microphone for Hands-Free Speech Recordings in Cars,’ in Proceeding of Workshop on Robust Methods for Speech Recognition in Adverse Conditions, 1999, pp. 191-194.
J. Bitzer, K.U. Simmer, and K.D. Kammeyer, ‘Multi-Microphone Noise Reduction Techniques for Hands-Free Speech Recognition-A Comparative Study,’ in Proceeding of Workshop on Robust Methods for Speech Recognition in Adverse Conditions, 1999, pp. 171-174.
T. Yamada, S. Nakamura, and K. Shikano, ‘Hands-Free Speech Recognition Based on 3-D Viterbi Search Using a Microphone Array,’ in IEEE Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1, 1998, pp. 245-248.
Google Scholar
J.E. Adcock, Y. Gotoh, D.J. Mashao, and H.F. Silverman, ‘Microphone-Array Speech Recognition via Incremental MAP Training,’ in IEEE Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 2, 1996, pp. 897-900.
Google Scholar
J.L. Gauvain and C.-H. Lee, ‘Maximum a posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains,’ IEEE Transactions on Speech and Audio Processing, vol. 2, 1994, pp. 291-298.
Article Google Scholar
C.J. Leggetter and P.C. Woodland, ‘Maximum Likelihood Linear Regression for Speaker Adaptation of Continuous Density Hidden Markov Models,’ Computer Speech and Language, vol. 9, 1995, pp. 171-185.
Article Google Scholar
D. Giuliani, M. Matassoni, M. Omologo, and P. Svaizer, ‘Experiments of HMM Adaptation for Hands-Free Connected Digit Recognition,’ in IEEE Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1, 1998, pp. 473-476.
Google Scholar
J. Kleban and Y. Gong, ‘HMM Adaptation and Microphone Array Processing for Distant Speech Recognition,’ in IEEE Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 3, 2000, pp. 1411-1414.
Google Scholar
H.F. Silverman and S.E. Kirtman, ‘A Two-Atage Algorithm for Determining Talker Location from Linear Microphone Array Data,’ Computer Speech and Language, vol. 6, 1992, pp. 129-152.
Article Google Scholar
A.P. Dempster, N.M. Laird, and D.B. Rubin, ‘Maximum Likelihood from Incomplete Data via the EM Algorithm,’ J. Royal Statist. Society (B), vol. 39, 1977, pp. 1-38.
MathSciNet MATH Google Scholar
J.-T. Chien and J.-C. Junqua, ‘Unsupervised Hierarchical Adaptation Using Reliable Selection of Cluster-Dependent Parameters,’ Speech Communication, vol. 30, no. 4, 2000, pp. 235-253.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan, 70101, Taiwan, Republic of China
Jen-Tzung Chien & Jain-Ray Lai

Authors

Jen-Tzung Chien
View author publications
You can also search for this author in PubMed Google Scholar
Jain-Ray Lai
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chien, JT., Lai, JR. Use of Microphone Array and Model Adaptation for Hands-Free Speech Acquisition and Recognition. The Journal of VLSI Signal Processing-Systems for Signal, Image, and Video Technology 36, 141–151 (2004). https://doi.org/10.1023/B:VLSI.0000015093.07192.eb

Download citation

Published: 01 February 2004
Issue Date: February 2004
DOI: https://doi.org/10.1023/B:VLSI.0000015093.07192.eb

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Use of Microphone Array and Model Adaptation for Hands-Free Speech Acquisition and Recognition

Abstract

Access this article

Similar content being viewed by others

A Review on Kalman Filter Models

Cochlear Implant Research and Development in the Twenty-first Century: A Critical Update

Introduction to Acoustic Terminology and Signal Processing

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Use of Microphone Array and Model Adaptation for Hands-Free Speech Acquisition and Recognition

Abstract

Access this article

Similar content being viewed by others

A Review on Kalman Filter Models

Cochlear Implant Research and Development in the Twenty-first Century: A Critical Update

Introduction to Acoustic Terminology and Signal Processing

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation