Abstract
The problem of blind unmixing of multichannel speech recordings in an underdetermined and convolutive case is discussed. A power spectrogram of each source is modeled by superposition of nonnegative rank-1 basic spectrograms, which leads to a Nonnegative Matrix Factorization (NMF) model for each source. Since the number of recording channels may be lower than the number of true sources (speakers), under-determinedness is possible. Hence, any meaningful a priori information about the source or the mixing operator can improve the results of blind separation. In our approach, we assume that the basic rank-1 power spectrograms are locally smoothed both in frequency as well as time domains. To enforce the local smoothness, we incorporate the Markov Random Field (MRF) model in the form of the Gibbs prior to the complete data likelihood function. The simulations demonstrate that this approach considerably improves the separation results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Benesty, J., Sondhi, M.M., Huang, Y. (eds.): Springer Handbook of Speech Processing. Springer, Heidelberg (2008)
Smaragdis, P.: Convolutive speech bases and their application to supervised speech separation. IEEE Transactions on Audio, Speech and Language Processing 15(1), 1–12 (2007)
Ozerov, A., Fevotte, C.: Multichannel nonnegative matrix factorization in convolutive mixtures for audio source separation. IEEE Trans. Audio, Speech and Lang. Proc. 18(3), 550–563 (2010)
Lee, D.D., Seung, H.S.: Learning of the parts of objects by non-negative matrix factorization. Nature 401, 788–791 (1999)
Benaroya, L., Gribonval, R., Bimbot, F.: Non-negative sparse representation for wiener based source separation with a single sensor. In: Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2003), Hong Kong, pp. 613–616 (2003)
Fevotte, C., Bertin, N., Durrieu, J.L.: Nonnegative matrix factorization with the itakura-saito divergence. with application to music analysis. Neural Computation 21(3), 793–830 (2009)
Zdunek, R., Cichocki, A.: Blind image separation using nonnegative matrix factorization with Gibbs smoothing. In: Ishikawa, M., Doya, K., Miyamoto, H., Yamakawa, T. (eds.) ICONIP 2007, Part II. LNCS, vol. 4985, pp. 519–528. Springer, Heidelberg (2008)
Green, P.J.: Bayesian reconstruction from emission tomography data using a modified EM algorithm. IEEE Transaction on Medical Imaging 9, 84–93 (1990)
Cichocki, A., Zdunek, R., Phan, A.H., Amari, S.I.: Nonnegative Matrix and Tensor Factorizations: Applications to Exploratory Multi-way Data Analysis and Blind Source Separation. Wiley and Sons (2009)
Lange, K., Carson, R.: EM reconstruction algorithms for emission and transmission tomography. Journal of Computer Assisted Tomography 8(2), 306–316 (1984)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society 39(1), 1–38 (1977)
Ochal, P.: Application of convolutive nonnegative matrix factorization for separation of muscial instrument sounds from multichannel polyphonic recordings. M.Sc. thesis (supervised by Dr. R. Zdunek), Wroclaw University of Technology, Poland (2010) (in Polish)
Vincent, E., Gribonval, R., Fevotte, C.: Performance measurement in blind audio source separation. IEEE Trans. Audio, Speech and Lang. Proc. 14(4), 1462–1469 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zdunek, R. (2011). Convolutive Nonnegative Matrix Factorization with Markov Random Field Smoothing for Blind Unmixing of Multichannel Speech Recordings. In: Travieso-González, C.M., Alonso-Hernández, J.B. (eds) Advances in Nonlinear Speech Processing. NOLISP 2011. Lecture Notes in Computer Science(), vol 7015. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25020-0_4
Download citation
DOI: https://doi.org/10.1007/978-3-642-25020-0_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25019-4
Online ISBN: 978-3-642-25020-0
eBook Packages: Computer ScienceComputer Science (R0)