Abstract
Distant speech recognition over microphone arrays is challenging, especially in multi source environments. In this paper, a non reference anchor array (NRA) framework for distant speech recognition is proposed. The NRA framework uses a non reference anchor array to capture the interfering speech sources, in addition to the primary array that captures the speech source of interest. The framework uses a linearly constrained minimum variance beam former (LC-MV) beam former such that the signal coming from the look direction is preserved while rejecting correlated interferences coming from the same direction as the source of interest. The performance of the proposed method discussed herein is evaluated by conducting experiments on clean speech acquisition from distant microphones and also on distant speech recognition on the TIMIT and MONC databases. Experimental results obtained from the proposed method indicate a reasonable improvement over correlation, subspace and standard minimum variance beam forming methods.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Chen, J., Benesty, J., Huang, Y., Doclo, S.: New insights into the noise reduction wiener filter. IEEE Transactions on Audio, Speech, and Language Processing 14(4), 1218–1234 (2006)
Chen, J., Benesty, J., Huang, Y.A.: On the optimal linear filtering techniques for noise reduction. Speech Communication 49(4), 305–316 (2007)
Meyer, J., Elko, G.: Spherical microphone arrays for 3d sound recording. In: Audio Signal Processing for Next-Generation Multimedia Communication Systems, pp. 67–89 (2004)
Meyer, J., Elko, G.: A highly scalable spherical microphone array based on an orthonormal decomposition of the soundfield. In: 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP, vol. 2, p. II-1781. IEEE (2002)
Capon, J.: High-resolution frequency-wavenumber spectrum analysis. Proceedings of the IEEE 57(8), 1408–1418 (1969)
Zhang, W., Rao, B.D.: Robust broadband beamformer with diagonally loaded constraint matrix and its application to speech recognition. In: Proceedings of the 2006 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2006, vol. 1, p. I. IEEE (2006)
Li, J., Stoica, P., Wang, Z.: On robust capon beamforming and diagonal loading. IEEE Transactions on Signal Processing 51(7), 1702–1715 (2003)
Van Trees, H.L.: Optimum Array Processing. Wiley-Interscience (2002)
Zue, V., Seneff, S., Glass, J.: Speech database development at mit: Timit and beyond. Speech Communication 9(4), 351–356 (1990)
Levi, A.: Multi Channel Overlapping Numbers Corpus distribution, Linguistic Data Consortium (2003), http://cslu.cse.ogi.edu/corpora/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Shukla, A., Nathwani, K., Hegde, R.M. (2012). An Adaptive Non Reference Anchor Array Framework for Distant Speech Recognition. In: Lin, W., et al. Advances in Multimedia Information Processing – PCM 2012. PCM 2012. Lecture Notes in Computer Science, vol 7674. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34778-8_20
Download citation
DOI: https://doi.org/10.1007/978-3-642-34778-8_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34777-1
Online ISBN: 978-3-642-34778-8
eBook Packages: Computer ScienceComputer Science (R0)