Convolutive Nonnegative Matrix Factorization with Markov Random Field Smoothing for Blind Unmixing of Multichannel Speech Recordings

Zdunek, Rafal

doi:10.1007/978-3-642-25020-0_4

Rafal Zdunek²⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7015))

Included in the following conference series:

International Conference on Nonlinear Speech Processing

987 Accesses
1 Citations

Abstract

The problem of blind unmixing of multichannel speech recordings in an underdetermined and convolutive case is discussed. A power spectrogram of each source is modeled by superposition of nonnegative rank-1 basic spectrograms, which leads to a Nonnegative Matrix Factorization (NMF) model for each source. Since the number of recording channels may be lower than the number of true sources (speakers), under-determinedness is possible. Hence, any meaningful a priori information about the source or the mixing operator can improve the results of blind separation. In our approach, we assume that the basic rank-1 power spectrograms are locally smoothed both in frequency as well as time domains. To enforce the local smoothness, we incorporate the Markov Random Field (MRF) model in the form of the Gibbs prior to the complete data likelihood function. The simulations demonstrate that this approach considerably improves the separation results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Benesty, J., Sondhi, M.M., Huang, Y. (eds.): Springer Handbook of Speech Processing. Springer, Heidelberg (2008)
Google Scholar
Smaragdis, P.: Convolutive speech bases and their application to supervised speech separation. IEEE Transactions on Audio, Speech and Language Processing 15(1), 1–12 (2007)
Article Google Scholar
Ozerov, A., Fevotte, C.: Multichannel nonnegative matrix factorization in convolutive mixtures for audio source separation. IEEE Trans. Audio, Speech and Lang. Proc. 18(3), 550–563 (2010)
Article Google Scholar
Lee, D.D., Seung, H.S.: Learning of the parts of objects by non-negative matrix factorization. Nature 401, 788–791 (1999)
Article MATH Google Scholar
Benaroya, L., Gribonval, R., Bimbot, F.: Non-negative sparse representation for wiener based source separation with a single sensor. In: Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2003), Hong Kong, pp. 613–616 (2003)
Google Scholar
Fevotte, C., Bertin, N., Durrieu, J.L.: Nonnegative matrix factorization with the itakura-saito divergence. with application to music analysis. Neural Computation 21(3), 793–830 (2009)
Article MATH Google Scholar
Zdunek, R., Cichocki, A.: Blind image separation using nonnegative matrix factorization with Gibbs smoothing. In: Ishikawa, M., Doya, K., Miyamoto, H., Yamakawa, T. (eds.) ICONIP 2007, Part II. LNCS, vol. 4985, pp. 519–528. Springer, Heidelberg (2008)
Chapter Google Scholar
Green, P.J.: Bayesian reconstruction from emission tomography data using a modified EM algorithm. IEEE Transaction on Medical Imaging 9, 84–93 (1990)
Article Google Scholar
Cichocki, A., Zdunek, R., Phan, A.H., Amari, S.I.: Nonnegative Matrix and Tensor Factorizations: Applications to Exploratory Multi-way Data Analysis and Blind Source Separation. Wiley and Sons (2009)
Google Scholar
Lange, K., Carson, R.: EM reconstruction algorithms for emission and transmission tomography. Journal of Computer Assisted Tomography 8(2), 306–316 (1984)
Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society 39(1), 1–38 (1977)
MathSciNet MATH Google Scholar
Ochal, P.: Application of convolutive nonnegative matrix factorization for separation of muscial instrument sounds from multichannel polyphonic recordings. M.Sc. thesis (supervised by Dr. R. Zdunek), Wroclaw University of Technology, Poland (2010) (in Polish)
Google Scholar
Vincent, E., Gribonval, R., Fevotte, C.: Performance measurement in blind audio source separation. IEEE Trans. Audio, Speech and Lang. Proc. 14(4), 1462–1469 (2006)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Telecommunications, Teleinformatics and Acoustics, Wroclaw University of Technology, Wybrzeze Wyspianskiego 27, 50-370, Wroclaw, Poland
Rafal Zdunek

Authors

Rafal Zdunek
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute for Technological Development and Innovation in Communications (IDETIC), Signals and Communications Department, University of Las Palmas de Gran Canaria, Campus de Tafira, s/n, 35017, Las Palmas de Gran Canaria, Spain
Carlos M. Travieso-González & Jesús B. Alonso-Hernández &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zdunek, R. (2011). Convolutive Nonnegative Matrix Factorization with Markov Random Field Smoothing for Blind Unmixing of Multichannel Speech Recordings. In: Travieso-González, C.M., Alonso-Hernández, J.B. (eds) Advances in Nonlinear Speech Processing. NOLISP 2011. Lecture Notes in Computer Science(), vol 7015. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25020-0_4

Download citation

DOI: https://doi.org/10.1007/978-3-642-25020-0_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25019-4
Online ISBN: 978-3-642-25020-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics