We explore the use of sparse representations for separation of a monaural mixture signal, where by a sparse representation we mean one where the number of nonzero elements is smaller than might be expected. This is a surprisingly powerful idea, as the ability to express a signal sparsely in some known, and potentially overcomplete, basis constitutes a strong model, while also lending itself to efficient algorithms. In the framework we explore, the representation of the signal is linear in a vector of coefficients. However, because many coefficient values could represent the same signal, the mapping from signal to coefficients is nonlinear, with the coeffi- cients being chosen to simultaneously represent the signal and maximize a measure of sparsity. This conversion of the signal into the coefficients using L1-optimization is viewed not as a preprocessing step performed before the data reaches the heart of the algorithm, but rather as itself the heart of the algorithm: after the coeffi- cients have been found, only trivial processing remains to be done. We show how, by suitable choice of overcomplete basis, this framework can use a variety of cues (e.g., speaker identity, differential filtering, differential attenuation) to accomplish monaural separation. We also discuss two radically different algorithms for finding the required overcomplete dictionaries: one based on nonnegative matrix factorization of isolated sources, and the other based on end-to-end optimization using automatic differentiation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
S. B. Laughlin and T. J. Sejnowski, “Communication in neuronal networks,” Science, vol. 301, no. 5641, pp. 1870-1874, 2003.
W. B. Levy and R. A. Baxter, “Energy efficient neural codes,” Neural Compu-tation, vol. 8, no. 3, pp. 531-543, 1996.
E. P. Simoncelli and B. A. Olshausen, “Natural image statistics and neural representation,” Annual Review of Neurosci., vol. 24, no. 1, pp. 1193-1216, 2001.
M. S. Lewicki, “Efficient coding of natural sounds,” Nature Neuroscience, vol. 5, no. 4, pp. 356-363, 2002.
E. C. Smith and M. S. Lewicki, “Efficient auditory coding,” Nature, vol. 439, pp. 978-982, 2006.
A. J. Bell and T. J. Sejnowski, “The ‘independent components’ of natural scenes are edge filters,” Vision Research, vol. 37, no. 23, pp. 3327-3338, 1997.
B. A. Olshausen and D. J. Field, “Emergence of simple-cell receptive field properties by learning a sparse code for natural images,” Nature, vol. 381, pp. 607-609, 1996.
M. D. Plumbley, S. A. Abdallah, J. P. Bello, M. E. Davies, G. Monti, and M. B. Sandler, “Automatic music transcription and audio source separation,” Cybernetics and Systems, vol. 33, no. 6, pp. 603-627, 2002.
L. Benaroya, L. M. Donagh, F. Bimbot, and R. Gribonval, “Non negative sparse representation for wiener based source separation with a single sensor,” in Acoustics, Speech, and Signal Processing, 2003, pp. 613-616.
T. Virtanen, “Sound source separation using sparse coding with temporal con-tinuity objective,” in ICMC, 2003.
G. J. Jang and T. W. Lee, “A maximum likelihood approach to single channel source separation,” Journal of Machine Learning Research, vol. 4, pp. 1365-1392, 2003.
A. S. Bregman, Auditory Scene Analysis: The Perceptual Organization of Sound, Cambridge, Massachusetts: MIT Press, 1990.
H. Asari, B. A. Pearlmutter, and A. M. Zador, “Sparse representations for the cocktail party problem,” Journal Neuroscience, vol. 26, no. 28, pp. 7477-7490, 2006.
B. A. Pearlmutter and R. K. Olsson, “Linear program differentiation for single-channel speech separation,” in International Workshop on Machine Learning for Signal Processing, Maynooth, Ireland: IEEE Press, 2006.
S. W. Hainsworth, “Techniques for the automated analysis of musical audio,” Ph.D. dissertation, Department of Engineering, University of Cambridge, 2004.
B. A. Olshausen and D. J. Field, “Sparse coding with an overcomplete basis set: A strategy employed by V1?” Vision Research, vol. 37, no. 23, pp. 3311-3325, 1997.
O. Schwartz and E. P. Simoncelli, “Natural signal statistics and sensory gain control,” Nature Neuroscience, vol. 4, no. 8, pp. 819-825, 2001.
D. Klein, P. Konig, and K. P. Kording, “Sparse spectrotemporal coding of sounds,” Journal on Applied Signal Processing, vol. 7, pp. 659-667, 2003.
S. S. Stevens, “On the phychophysical law,” Psychological Review, vol. 64, pp. 153-181, 1957.
D. D. Lee and H. S. Seung, “Learning the parts of objects with nonnegative matric factorization,” Nature, vol. 401, pp. 788-791, 1999.
P. Smaragdis, “Non-negative matrix factor deconvolution; extraction of mul-tiple sound sources from monophonic inputs,” in Fifth International Confer-ence on Independent Component Analysis, ser. LNCS 3195. Granada, Spain: Springer-Verlag, 2006, pp. 494-499.
P. Smaragdis, “Convolutive speech bases and their application to supervised speech separation,” IEEE Transactions on Speech Audio Processing, vol. 15, no. 1, pp. 1-12, 2007.
T. Virtanen, “Techniques for the automated analysis of musical audio,” Ph.D. dissertation, Institute of Signal Processing, Tampere University of Technology, 2006.
H. B. Barlow, “Possible principles underlying the transformations of sensory messages,” in Sensory Communication, W. A. Rosenblith, Ed. MIT Press, 1961, pp. 217-234.
M. S. Lewicki and T. J. Sejnowski, “Learning overcomplete representations,” Neural Computation, vol. 12, no. 2, pp. 337-365, 2000.
K. Kreutz-Delgado, J. F. Murray, B. D. Rao, K. Engan, T.-W. Lee, and T. J. Sejnowski, “Dictionary learning algorithms for sparse representation,” Neural Computation, vol. 15, no. 2, pp. 349-396, 2003.
D. L. Donoho and M. Elad, “Maximal sparsity representation via l1 min-imization,” Proceeding of the National Academy of Sciences USA, vol. 100, pp. 2197-2202, 2003.
Y. Li, A. Cichocki, and S. Amari, “Analysis of sparse representation and blind source separation,” Neural Computation, vol. 16, no. 6, pp. 1193-1234, 2004.
R. Fletcher, “Semidefinite matrix constraints in optimization,” SIAM Journal on Control and Optimization, vol. 23, pp. 493-513, 1985.
O. L. Mangasarian, W. N. Street, and W. H. Wolberg, “Breast cancer diagnosis and prognosis via linear programming,” Operations Research, vol. 43, no. 4, pp. 570-577, 1995.
P. S. Bradley, O. L. Mangasarian, and W. N. Street, “Clustering via concave minimization,” in Advances in Neural Information Processing Systems 9. MIT Press, 1997, pp. 368-374.
I. F. Gorodnitsky and B. D. Rao, “Sparse signal reconstruction from limited data using FOCUSS: A re-weighted minimum norm algorithm,” IEEE Trans-actions on Signal Processing, vol. 45, no. 3, pp. 600-616, 1997.
M. Lewicki and B. A. Olshausen, “Inferring sparse, overcomplete image codes using an efficient coding framework,” in Advances in Neural Information Processing Systems 10. MIT Press, 1998, pp. 815-821.
S. S. Chen, D. L. Donoho, and M. A. Saunders, “Atomic decomposition by basis pursuit,” SIAM Journal on Scientific Computing, vol. 20, no. 1, pp. 33-61, 1998.
T.-W. Lee, M. S. Lewicki, M. Girolami, and T. J. Sejnowski, “Blind source separation of more sources than mixtures using overcomplete representations,” IEEE Signal Processing Letters, vol. 4, no. 5, pp. 87-90, 1999.
M. Zibulevsky and B. A. Pearlmutter, “Blind source separation by sparse decomposition in a signal dictionary,” Neural Computation, vol. 13, no. 4, pp. 863-882, 2001.
B. A. Pearlmutter and A. M. Zador, “Monaural source separation using spectral cues,” in Fifth International Conference on Independent Component Analysis, ser. LNCS 3195. Granada, Spain: Springer-Verlag, 2004, pp. 478-485.
G. B. Dantzig, “Programming in a linear structure,” 1948, uSAF, Washington D.C.
S. I. Gass, An Illustrated Guide to Linear Programming. McGraw-Hill, 1970.
R. Dorfman, “The discovery of linear programming,” Annals of the History of Computing, vol. 6, no. 3, pp. 283-295, 1984.
A. Griewank, Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation, ser. Frontiers in Appl. Math. Philadelphia, PA: SIAM, 2000, no. 19.
R. E. Wengert, “A simple automatic derivative evaluation program,” Commu-nications of the ACM, vol. 7, no. 8, pp. 463-464, 1964.
B. Speelpenning, “Compiling fast partial derivatives of functions given by algorithms,” Ph.D. dissertation, Department of Computer Science, University of Illinois, Urbana-Champaign, 1980.
D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by back-propagating errors,” Nature, vol. 323, pp. 533-536, 1986.
A. J. Bell and T. J. Sejnowski, “An information-maximization approach to blind separation and blind deconvolution,” Neural Computation, vol. 7, no. 6, pp. 1129-1159, 1995.
H. Robbins and S. Monro, “A stochastic approximation method,” Annals of Mathematical Statistics, vol. 22, pp. 400-407, 1951.
S. T. Roweis, “One microphone source separation,” in Advances in Neural Information Processing Systems, 2001, pp. 793-799.
T. Kristjansson, J. Hershey, P. Olsen, S. Rennie, and R. Gopinath, “Super-human multi-talker speech recognition,” in ICSLP, 2006.
F. Bach and M. I. Jordan, “Blind one-microphone speech separation: A spectral learning approach,” in Advances in Neural Information Processing Systems 17, 2005, pp. 65-72.
M. P. Cooke, J. Barker, S. P. Cunningham, and X. Shao, “An audio-visual corpus for speech perception and automatic speech recognition,” Journal of the Acoustical Society of America, vol. 120, pp. 2421-2424, 2006.
D. P. W. Ellis and R. J. Weiss, “Model-based monaural source separation using a vector-quantized phase-vocoder representation,” in ICASSP, 2006.
W. A. Yost, R. H. Dye, Jr., and S. Sheft, “A simulated “cocktail party” with up to three sound sources,” Perception and Psychophysics, vol. 58, no. 7, pp. 1026-1036, 1996.
F. L. Wightman and D. J. Kistler, “Headphone simulation of free-field listening. II: Psychophysical validation,” Journal of the Acoustical Society of America, vol. 85, no. 2, pp. 868-878, 1989.
A. Kulkarni and H. S. Colburn, “Role of spectral detail in sound-source local-ization,” Nature, vol. 396, no. 6713, pp. 747-749, 1998.
E. I. Knudsen and M. Konishi, “Mechanisms of sound localization in the barn owl,” Journal of Comparative Physiology, vol. 133, pp. 13-21, 1979.
E. M. Wenzel, M. Arruda, D. J. Kistler, and F. L. Wightman, “Localization using nonindividualized head-related transfer functions,” Journal of the Acoustical Society of America, vol. 94, no. 1, pp. 111-123, 1993.
P. M. Hofman and A. J. V. Opstal, “Bayesian reconstruction of sound localization cues from responses to random spectra,” Biological Cybernetics, vol. 86, no. 4, pp. 305-316, 2002.
H. Attias and C. Schreiner, “Temporal low-order statistics of natural sounds,” in Advances in Neural Information Processing Systems, 1997.
T. Nishino, Y. Nakai, K. Takeda, and F. Itakura, “Estimating head related transfer function using multiple regression analysis,” IEICE Transactions A, vol. J84-A, no. 3, pp. 260-268, 2001, in Japanese.
H. Farid and E. H. Adelson, “Separating reflections from images by use of inde-pendent components analysis,” Journal of Optical Society of America, vol. 16, no. 9, pp. 2136-2145, 1999.
S. T. Rickard and F. Dietrich, “DOA estimation of many W -disjoint orthogo-nal sources from two mixtures using DUET,” in Proceedings of the 10th IEEE Workshop on Statistical Signal and Array Processing (SSAP2000), Pocono Manor, PA, 2000, pp. 311-314.
A. Levin and Y. Weiss, “User assisted separation of reflections from a single image using a sparsity prior,” in Proceedings of the European Conference on Computer Vision (ECCV), Prague, 2004.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Springer
About this chapter
Cite this chapter
Asari, H., Olsson, R.K., Pearlmutter, B.A., Zador, A.M. (2007). Sparsification for Monaural Source Separation. In: Makino, S., Sawada, H., Lee, TW. (eds) Blind Speech Separation. Signals and Communication Technology. Springer, Dordrecht. https://doi.org/10.1007/978-1-4020-6479-1_14
Download citation
DOI: https://doi.org/10.1007/978-1-4020-6479-1_14
Publisher Name: Springer, Dordrecht
Print ISBN: 978-1-4020-6478-4
Online ISBN: 978-1-4020-6479-1
eBook Packages: EngineeringEngineering (R0)