Abstract
In this paper, we present a microphone array beamforming approach to blind speech separation. Unlike previous beamforming approaches, our system does not require a-priori knowledge of the microphone placement and speaker location, making the system directly comparable other blind source separation methods which require no prior knowledge of recording conditions. Microphone location is automatically estimated using an assumed noise field model, and speaker locations are estimated using cross correlation based methods. The system is evaluated on the data provided for the PASCAL Speech Separation Challenge 2 (SSC2), achieving a word error rate of 58% on the evaluation set.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Janin, A., et al.: The ICSI-SRI Spring 2006 Meeting Recognition System. In: Proc. of the Rich Transcription 2006 Spring Meeting Recognition Evaluation, Washington, USA (2006)
Hain, T., et al.: The AMI system for the transcription of speech in meetings. In: Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, vol. 4, pp. 357–360 (2007)
Morgan, N., et al.: The meeting project at ICSI. In: Proc. Human Language Technology Conf. (2001)
McCowan, I., Lincoln, M., Himawan, I.: Microphone array calibration in diffuse noise fields. IEEE Trans. on Acoustics, Speech, and Signal Processing (to appear, 2008)
Cook, R.K., et al.: Measurement of correlation coefficients in reverberant sound fields. The Journal of the Acoustical Society of America 27, 1072–1077 (1955)
Torgerson, W.: Theory and Methods of Scaling. Wiley, New York (1958)
Cox, M.F., Cox, M.A.A.: Multidimensional Scaling. Chapman and Hall (2001)
Di Biase, J.H., Silverman, H.F., Brandstein, M.S.: Robust localization in reverberant rooms. In: Brandstein, M.S., Ward, D.B. (eds.) Microphone Arrays, pp. 157–180. Springer, Heidelberg (2001)
Bitzer, J., Simmer, K.U.: Superdirective microphone arrays. In: Brandstein, M.S., Ward, D.B. (eds.) Microphone Arrays, pp. 19–38. Springer, Heidelberg (2001)
Cox, H., Zeskind, R., Owen, M.: Robust adaptive beamforming. IEEE Trans. on Acoustics, Speech, and Signal Processing 35, 1365–1376 (1987)
Roweis, S.T.: Factorial models and refiltering for speech separation and denoising. In: Proc. of Eurospeech, pp. 1009–1012 (2003)
Maganti, H.K., Gatica-Perez, D., McCowan, I.: Speech enhancement and recognition in meetings with an audio-visual sensor array. IEEE Trans. on Acoustics, Speech, and Signal Processing 15, 2257–2269 (2007)
Lincoln, M., McCowan, I., Vepa, J., Maganti, H.K.: The multi-channel wall street journal audio visual corpus (mc-wsj-av): Specification and initial experiments. In: Proc. ASRU, pp. 357–362 (2005)
Moore, D., McCowan, I.: Microphone array speech recognition: Experiments on overlapping speech in meetings. In: Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, vol. 5, pp. 497–500 (2003)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Himawan, I., McCowan, I., Lincoln, M. (2008). Microphone Array Beamforming Approach to Blind Speech Separation. In: Popescu-Belis, A., Renals, S., Bourlard, H. (eds) Machine Learning for Multimodal Interaction. MLMI 2007. Lecture Notes in Computer Science, vol 4892. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78155-4_26
Download citation
DOI: https://doi.org/10.1007/978-3-540-78155-4_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78154-7
Online ISBN: 978-3-540-78155-4
eBook Packages: Computer ScienceComputer Science (R0)