Microphone Array Beamforming Approach to Blind Speech Separation

Himawan, Ivan; McCowan, Iain; Lincoln, Mike

doi:10.1007/978-3-540-78155-4_26

Ivan Himawan^1,3,
Iain McCowan^1,2 &
Mike Lincoln³

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4892))

Included in the following conference series:

International Workshop on Machine Learning for Multimodal Interaction

1147 Accesses
6 Citations

Abstract

In this paper, we present a microphone array beamforming approach to blind speech separation. Unlike previous beamforming approaches, our system does not require a-priori knowledge of the microphone placement and speaker location, making the system directly comparable other blind source separation methods which require no prior knowledge of recording conditions. Microphone location is automatically estimated using an assumed noise field model, and speaker locations are estimated using cross correlation based methods. The system is evaluated on the data provided for the PASCAL Speech Separation Challenge 2 (SSC2), achieving a word error rate of 58% on the evaluation set.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Janin, A., et al.: The ICSI-SRI Spring 2006 Meeting Recognition System. In: Proc. of the Rich Transcription 2006 Spring Meeting Recognition Evaluation, Washington, USA (2006)
Google Scholar
Hain, T., et al.: The AMI system for the transcription of speech in meetings. In: Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, vol. 4, pp. 357–360 (2007)
Google Scholar
Morgan, N., et al.: The meeting project at ICSI. In: Proc. Human Language Technology Conf. (2001)
Google Scholar
McCowan, I., Lincoln, M., Himawan, I.: Microphone array calibration in diffuse noise fields. IEEE Trans. on Acoustics, Speech, and Signal Processing (to appear, 2008)
Google Scholar
Cook, R.K., et al.: Measurement of correlation coefficients in reverberant sound fields. The Journal of the Acoustical Society of America 27, 1072–1077 (1955)
Article Google Scholar
Torgerson, W.: Theory and Methods of Scaling. Wiley, New York (1958)
Google Scholar
Cox, M.F., Cox, M.A.A.: Multidimensional Scaling. Chapman and Hall (2001)
Google Scholar
Di Biase, J.H., Silverman, H.F., Brandstein, M.S.: Robust localization in reverberant rooms. In: Brandstein, M.S., Ward, D.B. (eds.) Microphone Arrays, pp. 157–180. Springer, Heidelberg (2001)
Google Scholar
Bitzer, J., Simmer, K.U.: Superdirective microphone arrays. In: Brandstein, M.S., Ward, D.B. (eds.) Microphone Arrays, pp. 19–38. Springer, Heidelberg (2001)
Google Scholar
Cox, H., Zeskind, R., Owen, M.: Robust adaptive beamforming. IEEE Trans. on Acoustics, Speech, and Signal Processing 35, 1365–1376 (1987)
Article Google Scholar
Roweis, S.T.: Factorial models and refiltering for speech separation and denoising. In: Proc. of Eurospeech, pp. 1009–1012 (2003)
Google Scholar
Maganti, H.K., Gatica-Perez, D., McCowan, I.: Speech enhancement and recognition in meetings with an audio-visual sensor array. IEEE Trans. on Acoustics, Speech, and Signal Processing 15, 2257–2269 (2007)
Google Scholar
Lincoln, M., McCowan, I., Vepa, J., Maganti, H.K.: The multi-channel wall street journal audio visual corpus (mc-wsj-av): Specification and initial experiments. In: Proc. ASRU, pp. 357–362 (2005)
Google Scholar
Moore, D., McCowan, I.: Microphone array speech recognition: Experiments on overlapping speech in meetings. In: Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, vol. 5, pp. 497–500 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Queensland University of Technology, Brisbane QLD, Australia
Ivan Himawan & Iain McCowan
CSIRO e-HEALTH Research Centre, Brisbane QLD, Australia
Iain McCowan
Centre for Speech Technology Research, Edinburgh, United Kingdom
Ivan Himawan & Mike Lincoln

Authors

Ivan Himawan
View author publications
You can also search for this author in PubMed Google Scholar
Iain McCowan
View author publications
You can also search for this author in PubMed Google Scholar
Mike Lincoln
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Andrei Popescu-Belis Steve Renals Hervé Bourlard

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Himawan, I., McCowan, I., Lincoln, M. (2008). Microphone Array Beamforming Approach to Blind Speech Separation. In: Popescu-Belis, A., Renals, S., Bourlard, H. (eds) Machine Learning for Multimodal Interaction. MLMI 2007. Lecture Notes in Computer Science, vol 4892. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78155-4_26

Download citation

DOI: https://doi.org/10.1007/978-3-540-78155-4_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78154-7
Online ISBN: 978-3-540-78155-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics