A Comparative Study of Spatial Speech Separation Techniques to Improve Speech Recognition

Zhou, Xinhui; Kwan, Chiman; Ayhan, Bulent; Kim, Chanwoo; Kumar, K.; Stern, Richard

doi:10.1007/978-3-319-92537-0_57

Xinhui Zhou¹⁷,
Chiman Kwan¹⁷,
Bulent Ayhan¹⁷,
Chanwoo Kim¹⁸,
K. Kumar¹⁸ &
…
Richard Stern¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10878))

Included in the following conference series:

International Symposium on Neural Networks

3815 Accesses

Abstract

Robust speech recognition in noisy and reverberant conditions is an important research area in recent years. Here we present a comparative study of several spatial speech separation methods. The main performance metric is word error rate (WER) under different signal-to-noise ratio (SNR) and reverberant conditions. Extensive simulations showed that one technique known as polyaural processing stood out as the best one.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Kim, C., Menon, A., Bacchiani, M., Stern, R.M.: Sound source separation using phase difference and reliable mask selection. In: IEEE International Conference on Acoustics, Speech and Signal Processing (2018)
Google Scholar
Dietz, M., Lestang, J.H., Majdak, P., Stern, R.M., Marquardt, T., Ewert, S.D., Hartmann, W.M., Goodman, D.: A framework for testing and comparing binaural models. J. Hear. Res. 360, 92–106 (2017)
Article Google Scholar
Li, Y., Vicente, L., Ho, K.C., Kwan, C., Lun, D.P.K., Leung, Y.H.: A study of partially adaptive concentric ring array. J. Circuits, Syst. Sig. Process. 27(5), 733–748 (2008)
Google Scholar
Vicente, L.M., Ho, K.C., Kwan, C.: An improved partial adaptive narrow-band beamformer using concentric ring array. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (2006)
Google Scholar
Li, Y., Ho, K.C., Kwan, C., Leung, Y.H.: Generalized partially adaptive concentric ring array. In: IEEE International Symposium Circuits System, pp. 3745–3748 (2005)
Google Scholar
Kwan, C., Mei, G., Zhao, X., Ren, Z., Xu, R., Stanford, V., Rochet, C., Aube, J., Ho, K.C.: Bird classification algorithms: theory and experimental results. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 289–292 (2004)
Google Scholar
Wang, D., Brown, G.: Computational Auditory Scene Analysis: Principles, Algorithms, and Applications. Wiley/IEEE Press, Hoboken (2006)
Book Google Scholar
Ephraim, Y., Malah, D.: Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Trans. Acoust. Speech Sig. Process. 33(2), 443–445 (1985)
Article Google Scholar
Kwan, C., Chu, S., Yin, J., Liu, X., Kruger, M., Sityar, I.: Enhanced speech in noisy multiple speaker environment. In: IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence) (2008)
Google Scholar
Deng, Y., Li, X., Kwan, C., Xu, R., Raj, B., Stern, R., Williamson, D.: An integrated approach to improve speech recognition rate for non-native speakers. In: Ninth International Conference on Spoken Language Processing, INTERSPEECH 2006 – ICSLP (2006)
Google Scholar
Zhou, J., Ayhan, B., Kwan, C., Sands, O.S.: A high performance approach to minimizing interactions between inbound and outbound signals in helmet. In: SPIE Conference on Defense, Security, and Applications (2012)
Google Scholar
Xu, R., Mei, G., Ren, Z., Kwan, C., Aube, J., Rochet, C., Stanford, V.: Speaker Identification and Speech Recognition Using Phased Arrays. In: Cai, Y., Abascal, J. (eds.) Ambient Intelligence in Everyday Life. LNCS (LNAI), vol. 3864, pp. 227–238. Springer, Heidelberg (2006). https://doi.org/10.1007/11825890_11
Chapter Google Scholar
Kwan, C., Yin, J., Ayhan, B., Chu, S., Liu, X., Puckett, K., Zhao, Y., Ho, K.C., Kruger, M., Sityar, I.: An Integrated Approach to Robust Speaker Identification and Speech Recognition. IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence) (2008)
Google Scholar
Kwan, C., Zhou, J.: Compact Plug-In Noise Cancellation Device. Patent # 9,117,457 (2015)
Google Scholar
Deng, Y., Li., X., Kwan, C., Raj, B., Stern, R.: Continuous feature adaptation for non-native speech recognition. Int. J. Comput. Control, Quantum Inf. Eng. 1, 1675–1682 (2007)
Google Scholar
Kwan, C., Yin, J., Ayhan, B., Chu, S., Liu, X., Puckett, K., Zhao, Y., Ho, D., Kruger, M., Sityar, I.: Speech separation algorithms for multiple speaker environments. In: IEEE International Joint Conference on Neural Networks (2008)
Google Scholar
Wang, D.: Computational auditory scene analysis: principles, algorithms and applications. Wiley, Hoboken (2006)
Book Google Scholar
Stern, R., Gouvea, E., Kim, C., Kumar, K., Park, H.-M.: Binaural and multiple-microphone signal processing motivated by auditory perception. In: Proceedings of HSCMA Joint Workshop on Hands-Free Speech Communication and Microphone Arrays (2008)
Google Scholar
Park, H.-M., Stern, R.M.: Spatial separation of speech signals using continuously-variable masks estimated from comparisons of zero crossings. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (2006)
Google Scholar
Bernstein, J., Price, P., Fisher, W.M., Pallett, D.S.: Resource Management RM1 2.0, Linguistic Data Consortium, Philadelphia (1993)
Google Scholar
Clarkson, P.: Statistical language modeling using the cmu-cambridge toolkit. In: Proceedings of Eurospeech, pp. 2707–2710 (1997)
Google Scholar
Ellis, D.P.W.: PLP and RASTA (and MFCC, and inversion) in Matlab (2005). http://labrosa.ee.columbia.edu/matlab/
Gold, B., Morgan, N.: Speech and Audio Signal Processing: Processing and Perception of Speech and Music. Wiley, New York (2001)
Google Scholar

Download references

Acknowledgments

This work was supported in part by National Science Foundation under grant IIP-0810012.

Author information

Authors and Affiliations

Signal Processing, Inc., Rockville, MD, USA
Xinhui Zhou, Chiman Kwan & Bulent Ayhan
Carnegie Mellon University, Pittsburg, PA, USA
Chanwoo Kim, K. Kumar & Richard Stern

Authors

Xinhui Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Chiman Kwan
View author publications
You can also search for this author in PubMed Google Scholar
Bulent Ayhan
View author publications
You can also search for this author in PubMed Google Scholar
Chanwoo Kim
View author publications
You can also search for this author in PubMed Google Scholar
K. Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Richard Stern
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chiman Kwan .

Editor information

Editors and Affiliations

Texas A&M University at Qatar, Doha, Qatar
Tingwen Huang
Sichuan University, Chengdu, China
Jiancheng Lv
Southeast University, Nanjing, China
Changyin Sun
United Institute of Informatics Problems, Minsk, Belarus
Alexander V. Tuzikov

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhou, X., Kwan, C., Ayhan, B., Kim, C., Kumar, K., Stern, R. (2018). A Comparative Study of Spatial Speech Separation Techniques to Improve Speech Recognition. In: Huang, T., Lv, J., Sun, C., Tuzikov, A. (eds) Advances in Neural Networks – ISNN 2018. ISNN 2018. Lecture Notes in Computer Science(), vol 10878. Springer, Cham. https://doi.org/10.1007/978-3-319-92537-0_57

Download citation

DOI: https://doi.org/10.1007/978-3-319-92537-0_57
Published: 26 May 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-92536-3
Online ISBN: 978-3-319-92537-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics