Abstract
This paper presents a new statistical model for speech signals, which consists of a time-invariant dictionary incorporating a set of the power spectral densities of excitation signals and a set of all-pole filters where the gain of each pair of excitation and filter elements is allowed to vary over time. We use this model to develop a combined blind separation and dereverberation method for speech. Reasonably good separations were obtained under a highly reverberant condition.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Douglas, S., Sawada, H., Makino, S.: Natural gradient multichannel blind deconvolution and speech separation using causal FIR filters. IEEE Trans. Speech, Audio Process. 13(1), 92–104 (2005)
Smaragdis, P.: Blind separation of convolved mixtures in the frequency domain. Neur. Comp. 22, 21–34 (1998)
Nakatani, T., Yoshioka, T., Kinoshita, K., Miyoshi, M., Juang, B.-H.: Blind speech dereverberation with multi-channel linear prediction based on short time Fourier transform representation. In: Proc. Int’l. Conf. Acoust., Speech, Signal Process., pp. 85–88 (2008)
Yoshioka, T., Nakatani, T., Miyoshi, M., Okuno, H.G.: Blind separation and dereverberation of speech mixtures by joint optimization. IEEE Trans. Audio, Speech, Language Process (2010) (accepted for publication)
Dégerine, S., Zaïdi, A.: Separation of an instantaneous mixture of Gaussian autoregressive sources by the exact maximum likelihood approach. IEEE Trans. Signal Processing 52(6), 1499–1512 (2004)
Kameoka, H., Kashino, K.: Composite Autoregressive System for Sparse Source-Filter Representation of Speech. In: Proc. 2009 IEEE International Symposium on Circuits and Systems (ISCAS 2009), pp. 2477–2480 (2009)
Benaroya, L., Bimbot, F., Gribonval, R.: Audio source separation with a single sensor. IEEE Trans. Audio Speech Language Processing 14(1), 191–199 (2006)
Févotte, C., Bertin, N., Durrieu, J.-L.: Nonnegative matrix factorization,with the Itakura-Saito divergence. With application to music analysis. Neural Comput. 21(3), 793–830 (2009)
Ozerov, A., Févotte, C.: Multichannel nonnegative matrix factorization in convolutive mixtures for audio source separation. IEEE Trans. Audio, Speech, Language Process. 18(3), 550–563 (2010)
Olshausen, B.A., Field, D.J.: Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381, 607–609 (1996)
Sawada, H., Araki, S., Makino, S.: Measuring dependence of binwise separated signals for permutation alignment in frequency-domain BSS. In: Proc. Int’l. Symp. Circ., Syst., pp. 3247–3250 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kameoka, H., Yoshioka, T., Hamamura, M., Le Roux, J., Kashino, K. (2010). Statistical Model of Speech Signals Based on Composite Autoregressive System with Application to Blind Source Separation. In: Vigneron, V., Zarzoso, V., Moreau, E., Gribonval, R., Vincent, E. (eds) Latent Variable Analysis and Signal Separation. LVA/ICA 2010. Lecture Notes in Computer Science, vol 6365. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15995-4_31
Download citation
DOI: https://doi.org/10.1007/978-3-642-15995-4_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15994-7
Online ISBN: 978-3-642-15995-4
eBook Packages: Computer ScienceComputer Science (R0)