Sparsification for Monaural Source Separation

Asari, Hiroki; Olsson, Rasmus K.; Pearlmutter, Barak A.; Zador, Anthony M.

doi:10.1007/978-1-4020-6479-1_14

Hiroki Asari³,
Rasmus K. Olsson⁴,
Barak A. Pearlmutter⁵ &
…
Anthony M. Zador⁶

Part of the book series: Signals and Communication Technology ((SCT))

2447 Accesses
2 Citations

We explore the use of sparse representations for separation of a monaural mixture signal, where by a sparse representation we mean one where the number of nonzero elements is smaller than might be expected. This is a surprisingly powerful idea, as the ability to express a signal sparsely in some known, and potentially overcomplete, basis constitutes a strong model, while also lending itself to efficient algorithms. In the framework we explore, the representation of the signal is linear in a vector of coefficients. However, because many coefficient values could represent the same signal, the mapping from signal to coefficients is nonlinear, with the coeffi- cients being chosen to simultaneously represent the signal and maximize a measure of sparsity. This conversion of the signal into the coefficients using L1-optimization is viewed not as a preprocessing step performed before the data reaches the heart of the algorithm, but rather as itself the heart of the algorithm: after the coeffi- cients have been found, only trivial processing remains to be done. We show how, by suitable choice of overcomplete basis, this framework can use a variety of cues (e.g., speaker identity, differential filtering, differential attenuation) to accomplish monaural separation. We also discuss two radically different algorithms for finding the required overcomplete dictionaries: one based on nonnegative matrix factorization of isolated sources, and the other based on end-to-end optimization using automatic differentiation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

S. B. Laughlin and T. J. Sejnowski, “Communication in neuronal networks,” Science, vol. 301, no. 5641, pp. 1870-1874, 2003.
Article Google Scholar
W. B. Levy and R. A. Baxter, “Energy efficient neural codes,” Neural Compu-tation, vol. 8, no. 3, pp. 531-543, 1996.
Article Google Scholar
E. P. Simoncelli and B. A. Olshausen, “Natural image statistics and neural representation,” Annual Review of Neurosci., vol. 24, no. 1, pp. 1193-1216, 2001.
Article Google Scholar
M. S. Lewicki, “Efficient coding of natural sounds,” Nature Neuroscience, vol. 5, no. 4, pp. 356-363, 2002.
Article Google Scholar
E. C. Smith and M. S. Lewicki, “Efficient auditory coding,” Nature, vol. 439, pp. 978-982, 2006.
Article Google Scholar
A. J. Bell and T. J. Sejnowski, “The ‘independent components’ of natural scenes are edge filters,” Vision Research, vol. 37, no. 23, pp. 3327-3338, 1997.
Article Google Scholar
B. A. Olshausen and D. J. Field, “Emergence of simple-cell receptive field properties by learning a sparse code for natural images,” Nature, vol. 381, pp. 607-609, 1996.
Article Google Scholar
M. D. Plumbley, S. A. Abdallah, J. P. Bello, M. E. Davies, G. Monti, and M. B. Sandler, “Automatic music transcription and audio source separation,” Cybernetics and Systems, vol. 33, no. 6, pp. 603-627, 2002.
Article Google Scholar
L. Benaroya, L. M. Donagh, F. Bimbot, and R. Gribonval, “Non negative sparse representation for wiener based source separation with a single sensor,” in Acoustics, Speech, and Signal Processing, 2003, pp. 613-616.
Google Scholar
T. Virtanen, “Sound source separation using sparse coding with temporal con-tinuity objective,” in ICMC, 2003.
Google Scholar
G. J. Jang and T. W. Lee, “A maximum likelihood approach to single channel source separation,” Journal of Machine Learning Research, vol. 4, pp. 1365-1392, 2003.
Article MathSciNet Google Scholar
A. S. Bregman, Auditory Scene Analysis: The Perceptual Organization of Sound, Cambridge, Massachusetts: MIT Press, 1990.
Google Scholar
H. Asari, B. A. Pearlmutter, and A. M. Zador, “Sparse representations for the cocktail party problem,” Journal Neuroscience, vol. 26, no. 28, pp. 7477-7490, 2006.
Article Google Scholar
B. A. Pearlmutter and R. K. Olsson, “Linear program differentiation for single-channel speech separation,” in International Workshop on Machine Learning for Signal Processing, Maynooth, Ireland: IEEE Press, 2006.
Google Scholar
S. W. Hainsworth, “Techniques for the automated analysis of musical audio,” Ph.D. dissertation, Department of Engineering, University of Cambridge, 2004.
Google Scholar
B. A. Olshausen and D. J. Field, “Sparse coding with an overcomplete basis set: A strategy employed by V1?” Vision Research, vol. 37, no. 23, pp. 3311-3325, 1997.
Article Google Scholar
O. Schwartz and E. P. Simoncelli, “Natural signal statistics and sensory gain control,” Nature Neuroscience, vol. 4, no. 8, pp. 819-825, 2001.
Article Google Scholar
D. Klein, P. Konig, and K. P. Kording, “Sparse spectrotemporal coding of sounds,” Journal on Applied Signal Processing, vol. 7, pp. 659-667, 2003.
Article Google Scholar
S. S. Stevens, “On the phychophysical law,” Psychological Review, vol. 64, pp. 153-181, 1957.
Article Google Scholar
D. D. Lee and H. S. Seung, “Learning the parts of objects with nonnegative matric factorization,” Nature, vol. 401, pp. 788-791, 1999.
Article Google Scholar
P. Smaragdis, “Non-negative matrix factor deconvolution; extraction of mul-tiple sound sources from monophonic inputs,” in Fifth International Confer-ence on Independent Component Analysis, ser. LNCS 3195. Granada, Spain: Springer-Verlag, 2006, pp. 494-499.
Google Scholar
P. Smaragdis, “Convolutive speech bases and their application to supervised speech separation,” IEEE Transactions on Speech Audio Processing, vol. 15, no. 1, pp. 1-12, 2007.
Article Google Scholar
T. Virtanen, “Techniques for the automated analysis of musical audio,” Ph.D. dissertation, Institute of Signal Processing, Tampere University of Technology, 2006.
Google Scholar
H. B. Barlow, “Possible principles underlying the transformations of sensory messages,” in Sensory Communication, W. A. Rosenblith, Ed. MIT Press, 1961, pp. 217-234.
Google Scholar
M. S. Lewicki and T. J. Sejnowski, “Learning overcomplete representations,” Neural Computation, vol. 12, no. 2, pp. 337-365, 2000.
Article Google Scholar
K. Kreutz-Delgado, J. F. Murray, B. D. Rao, K. Engan, T.-W. Lee, and T. J. Sejnowski, “Dictionary learning algorithms for sparse representation,” Neural Computation, vol. 15, no. 2, pp. 349-396, 2003.
Article MATH Google Scholar
D. L. Donoho and M. Elad, “Maximal sparsity representation via l1 min-imization,” Proceeding of the National Academy of Sciences USA, vol. 100, pp. 2197-2202, 2003.
Article MATH MathSciNet Google Scholar
Y. Li, A. Cichocki, and S. Amari, “Analysis of sparse representation and blind source separation,” Neural Computation, vol. 16, no. 6, pp. 1193-1234, 2004.
Article MATH Google Scholar
R. Fletcher, “Semidefinite matrix constraints in optimization,” SIAM Journal on Control and Optimization, vol. 23, pp. 493-513, 1985.
Article MATH MathSciNet Google Scholar
O. L. Mangasarian, W. N. Street, and W. H. Wolberg, “Breast cancer diagnosis and prognosis via linear programming,” Operations Research, vol. 43, no. 4, pp. 570-577, 1995.
Article MATH MathSciNet Google Scholar
P. S. Bradley, O. L. Mangasarian, and W. N. Street, “Clustering via concave minimization,” in Advances in Neural Information Processing Systems 9. MIT Press, 1997, pp. 368-374.
Google Scholar
I. F. Gorodnitsky and B. D. Rao, “Sparse signal reconstruction from limited data using FOCUSS: A re-weighted minimum norm algorithm,” IEEE Trans-actions on Signal Processing, vol. 45, no. 3, pp. 600-616, 1997.
Article Google Scholar
M. Lewicki and B. A. Olshausen, “Inferring sparse, overcomplete image codes using an efficient coding framework,” in Advances in Neural Information Processing Systems 10. MIT Press, 1998, pp. 815-821.
Google Scholar
S. S. Chen, D. L. Donoho, and M. A. Saunders, “Atomic decomposition by basis pursuit,” SIAM Journal on Scientific Computing, vol. 20, no. 1, pp. 33-61, 1998.
Article MathSciNet Google Scholar
T.-W. Lee, M. S. Lewicki, M. Girolami, and T. J. Sejnowski, “Blind source separation of more sources than mixtures using overcomplete representations,” IEEE Signal Processing Letters, vol. 4, no. 5, pp. 87-90, 1999.
Google Scholar
M. Zibulevsky and B. A. Pearlmutter, “Blind source separation by sparse decomposition in a signal dictionary,” Neural Computation, vol. 13, no. 4, pp. 863-882, 2001.
Article MATH Google Scholar
B. A. Pearlmutter and A. M. Zador, “Monaural source separation using spectral cues,” in Fifth International Conference on Independent Component Analysis, ser. LNCS 3195. Granada, Spain: Springer-Verlag, 2004, pp. 478-485.
Google Scholar
G. B. Dantzig, “Programming in a linear structure,” 1948, uSAF, Washington D.C.
Google Scholar
S. I. Gass, An Illustrated Guide to Linear Programming. McGraw-Hill, 1970.
Google Scholar
R. Dorfman, “The discovery of linear programming,” Annals of the History of Computing, vol. 6, no. 3, pp. 283-295, 1984.
Article MATH MathSciNet Google Scholar
A. Griewank, Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation, ser. Frontiers in Appl. Math. Philadelphia, PA: SIAM, 2000, no. 19.
Google Scholar
R. E. Wengert, “A simple automatic derivative evaluation program,” Commu-nications of the ACM, vol. 7, no. 8, pp. 463-464, 1964.
Article MATH Google Scholar
B. Speelpenning, “Compiling fast partial derivatives of functions given by algorithms,” Ph.D. dissertation, Department of Computer Science, University of Illinois, Urbana-Champaign, 1980.
Google Scholar
D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by back-propagating errors,” Nature, vol. 323, pp. 533-536, 1986.
Article Google Scholar
A. J. Bell and T. J. Sejnowski, “An information-maximization approach to blind separation and blind deconvolution,” Neural Computation, vol. 7, no. 6, pp. 1129-1159, 1995.
Article Google Scholar
H. Robbins and S. Monro, “A stochastic approximation method,” Annals of Mathematical Statistics, vol. 22, pp. 400-407, 1951.
Article MATH MathSciNet Google Scholar
S. T. Roweis, “One microphone source separation,” in Advances in Neural Information Processing Systems, 2001, pp. 793-799.
Google Scholar
T. Kristjansson, J. Hershey, P. Olsen, S. Rennie, and R. Gopinath, “Super-human multi-talker speech recognition,” in ICSLP, 2006.
Google Scholar
F. Bach and M. I. Jordan, “Blind one-microphone speech separation: A spectral learning approach,” in Advances in Neural Information Processing Systems 17, 2005, pp. 65-72.
Google Scholar
M. P. Cooke, J. Barker, S. P. Cunningham, and X. Shao, “An audio-visual corpus for speech perception and automatic speech recognition,” Journal of the Acoustical Society of America, vol. 120, pp. 2421-2424, 2006.
Article Google Scholar
D. P. W. Ellis and R. J. Weiss, “Model-based monaural source separation using a vector-quantized phase-vocoder representation,” in ICASSP, 2006.
Google Scholar
W. A. Yost, R. H. Dye, Jr., and S. Sheft, “A simulated “cocktail party” with up to three sound sources,” Perception and Psychophysics, vol. 58, no. 7, pp. 1026-1036, 1996.
Google Scholar
F. L. Wightman and D. J. Kistler, “Headphone simulation of free-field listening. II: Psychophysical validation,” Journal of the Acoustical Society of America, vol. 85, no. 2, pp. 868-878, 1989.
Article Google Scholar
A. Kulkarni and H. S. Colburn, “Role of spectral detail in sound-source local-ization,” Nature, vol. 396, no. 6713, pp. 747-749, 1998.
Article Google Scholar
E. I. Knudsen and M. Konishi, “Mechanisms of sound localization in the barn owl,” Journal of Comparative Physiology, vol. 133, pp. 13-21, 1979.
Article Google Scholar
E. M. Wenzel, M. Arruda, D. J. Kistler, and F. L. Wightman, “Localization using nonindividualized head-related transfer functions,” Journal of the Acoustical Society of America, vol. 94, no. 1, pp. 111-123, 1993.
Article Google Scholar
P. M. Hofman and A. J. V. Opstal, “Bayesian reconstruction of sound localization cues from responses to random spectra,” Biological Cybernetics, vol. 86, no. 4, pp. 305-316, 2002.
Article MATH Google Scholar
H. Attias and C. Schreiner, “Temporal low-order statistics of natural sounds,” in Advances in Neural Information Processing Systems, 1997.
Google Scholar
T. Nishino, Y. Nakai, K. Takeda, and F. Itakura, “Estimating head related transfer function using multiple regression analysis,” IEICE Transactions A, vol. J84-A, no. 3, pp. 260-268, 2001, in Japanese.
Google Scholar
H. Farid and E. H. Adelson, “Separating reflections from images by use of inde-pendent components analysis,” Journal of Optical Society of America, vol. 16, no. 9, pp. 2136-2145, 1999.
Article Google Scholar
S. T. Rickard and F. Dietrich, “DOA estimation of many W -disjoint orthogo-nal sources from two mixtures using DUET,” in Proceedings of the 10th IEEE Workshop on Statistical Signal and Array Processing (SSAP2000), Pocono Manor, PA, 2000, pp. 311-314.
Google Scholar
A. Levin and Y. Weiss, “User assisted separation of reflections from a single image using a sparsity prior,” in Proceedings of the European Conference on Computer Vision (ECCV), Prague, 2004.
Google Scholar

Download references

Author information

Authors and Affiliations

Watson School of Biological Sciences, Cold Spring Harbor Laboratory, One Bungtown Road, 11724, Cold Spring Harbor, NY, USA
Hiroki Asari
Informatics and Mathematical Modelling, Technical University of Denmark, 2800, Lyngby, Denmark
Rasmus K. Olsson
Hamilton Institute, NUI Maynooth, Co. Kildare, Ireland
Barak A. Pearlmutter
Cold Spring Harbor Laboratory, One Bungtown Road, 11724, Cold Spring Harbor, NY, USA
Anthony M. Zador

Authors

Hiroki Asari
View author publications
You can also search for this author in PubMed Google Scholar
Rasmus K. Olsson
View author publications
You can also search for this author in PubMed Google Scholar
Barak A. Pearlmutter
View author publications
You can also search for this author in PubMed Google Scholar
Anthony M. Zador
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

NTT Corporation, 2-4 Hikaridai, 619-0237, Soraku-gun, Kyoto, Japan
Shoji Makino & Hiroshi Sawada &
University of California, San Diego, 9500 Gilman Drive, 0523, 92093-0523, La Jolla, CA, USA
Te-Won Lee

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Asari, H., Olsson, R.K., Pearlmutter, B.A., Zador, A.M. (2007). Sparsification for Monaural Source Separation. In: Makino, S., Sawada, H., Lee, TW. (eds) Blind Speech Separation. Signals and Communication Technology. Springer, Dordrecht. https://doi.org/10.1007/978-1-4020-6479-1_14

Download citation

DOI: https://doi.org/10.1007/978-1-4020-6479-1_14
Publisher Name: Springer, Dordrecht
Print ISBN: 978-1-4020-6478-4
Online ISBN: 978-1-4020-6479-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics