This chapter discusses source separation methods when only single channel observation is available. The problem is underdeterministic, in that multiple source signals should be extracted from a single stream of observations. To overcome the mathematical intractability, prior information on the source characteristics is generally assumed and applied to derive a source separation algorithm. This chapter describes one of the monaural source separation approach, which is based on exploiting a priori sets of time-domain basis functions learned by independent component analysis (ICA). The inherent time structure of sound sources is reflected in the ICA basis functions, which encode the sources in a statistically effi- cient manner. Detailed derivation of the source separation algorithm is described, given the observed single channel data and sets of basis functions. The prior knowledge given by the basis functions and the associated coefficient densities enables inferring the original source signals. A flexible model for density estimation allows accurate modeling of the observation and the experimental results exhibit a high level of separation performance for simulated mixtures as well as real environment recordings employing mixtures of two different sources.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
A. S. Bregman, Auditory Scene Analysis: The Perceptual Organization of Sound. MIT Press, Cambridge MA, 1990.
A. S. Bregman, Computational Auditory Scene Analysis. MIT Press, Cambridge MA, 1994.
G. J. Brown and M. Cooke, “Computational auditory scene analysis,” Com-puter Speech and Language, vol. 8, no. 4, pp. 297-336, 1994.
P. Comon, “Independent component analysis, A new concept?” Signal Process-ing, vol. 36, pp. 287-314, 1994.
A. J. Bell and T. J. Sejnowski, “An information-maximization approach to blind separation and blind deconvolution,” Neural Computation, vol. 7, no. 6, pp. 1004-1034, 1995.
J.-F. Cardoso and B. Laheld, “Equivariant adaptive source separation,” IEEE Trans. on S.P., vol. 45, no. 2, pp. 424-444, 1996.
S. T. Roweis, “One microphone source separation,” Advances in Neural Infor-mation Processing Systems, vol. 13, pp. 793-799, 2001.
D. D. Lee and S. S. Seung, “Learning the parts of objects by non-negative matrix factorization,” Nature, vol. 401, pp. 788-791, 1999.
P. Smaragdis, “Non-negative matrix factor deconvolution; extraction of mul-tiple sound sources from monophonic inputs,” in Proc. ICA2004, vol. 3195, pp. 494-501, Sept. 2004.
M. N. Schmidt and M. Mørup, “Nonnegative matrix factor 2-D deconvolution for blind single channel source separation,” in Proc. ICA2006, Apr. 2006.
A. J. Bell and T. J. Sejnowski, “The “independent components” of natural scenes are edge filters,” Vision Research, vol. 37, no. 23, pp. 3327-3338, 1997.
A. J. Bell and T. J. Sejnowski, “Learning the higher-order structures of a natural sound,” Network: Computation in Neural Systems, vol. 7, pp. 261-266, July 1996.
S. A. Abdallah and M. D. Plumbley, “If the independent components of natural images are edges, what are the independent components of natural sounds?” in Proceedings of International Conference on Independent Component Analysis and Signal Separation (ICA2001), (San Diego, CA), pp. 534-539, Dec. 2001.
T.-W. Lee and G.-J. Jang, “The statistical structures of male and female speech signals,” in Proc. ICASSP, (Salt Lake City, Utah), May 2001.
B. A. Olshausen and D. J. Field, “Emergence of simple-cell receptive-field prop-erties by learning a sparse code for natural images,” Nature, vol. 381, pp. 607-609,1996.
M. Zibulevsky and B. A. Pearlmutter, “Blind source separation by sparse de-composition,” Neural Computations, vol. 13, no. 4, 2001.
M. S. Lewicki, “Efficient coding of natural sounds,” Nature Neuroscience, vol. 5, no. 4, pp. 356-363, 2002.
J. Hopgood and P. Rayner, “Single channel signal separation using linear time-varying filters: Separability of non-stationary stochastic signals,” in Proc. ICASSP, vol. 3, (Phoenix, Arizona), pp. 1449-1452, Mar. 1999.
B. Pearlmutter and L. Parra, “A context-sensitive generalization of ICA,” in Proc. ICONIP, (Hong Kong), pp. 151-157, Sept. 1996.
J.-F. Cardoso, “Infomax and maximum likelihood for blind source separation,” IEEE Signal Processing Letters, vol. 4, pp. 112-114, Apr. 1997.
T.-W. Lee, M. Girolami, A. Bell, and T. Sejnowski, “A unifying information-theoretic framework for independent component analysis,” Computers & Math-ematics with Applications, vol. 31, pp. 1-21, Mar. 2000.
D. T. Pham and P. Garrat, “Blind source separation of mixture of indepen-dent sources through a quasi-maximum likelihood approach,” IEEE Trans. on Signal Proc., vol. 45, no. 7, pp. 1712-1725, 1997.
A. Hyvärinen, “Sparse code shrinkage: denoising of nongaussian data by maxi-mum likelihood estimation,” Neural Computation, vol. 11, no. 7, pp. 1739-1768, 1999.
J.-H. Lee, H.-Y. Jung, T.-W. Lee, and S.-Y. Lee, “Speech feature extraction using independent component analysis,” in Proc. ICASSP, vol. 3, (Istanbul, Turkey), pp. 1631-1634, June 2000.
G. Box and G. Tiao, Baysian Inference in Statistical Analysis. John Wiley and Sons, 1973.
T.-W. Lee and M. S. Lewicki, “The generalized Gaussian mixture model us-ing ICA,” in International Workshop on Independent Component Analysis (ICA’00), (Helsinki, Finland), pp. 239-244, June 2000.
S. Rickard, R. Balan, and J. Rosca, “Real-time time-frequency based blind source separation,” in Proceedings of International Conference on Indepen-dent Component Analysis and Signal Separation (ICA2001), (San Diego, CA), pp. 651-656, Dec. 2001.
T. Virtanen, “Sound source separation using sparse coding with temporal conti-nuity objective,” in Proceedings of International Computer Music Conference, Oct. 2003.
T. Virtanen, “Separation of sound sources by convolutive sparse coding,” in ISCA Tutorial and Research Workshop on Statistical and Perceptual Audio Processing, 2004.
T. Virtanen, Signal Processing Methods for Music Transcription, Eds. A. Klapuri and M. Davy, ch. Unsupervised Learning Methods for Source Separation. Springer-Verlag, 2006.
T. Virtanen, “Speech recognition using factorial hidden markov models for separation in the feature space,” in Interspeech (ICSLP), (Pittsburgh, USA), 2006.
R. Balan, A. Jourjine, and J. Rosca, “AR processes and sources can be recon-structed from degenerate mixtures,” in Proceedings of the First International Workshop on Independent Component Analysis and Signal Separation (ICA99), (Aussois, France), pp. 467-472, Jan. 1999.
E. Wan and A. T. Nelson, “Neural dual extended Kalman filtering: Applications in speech enhancement and monaural blind signal separation,” in Proceedings of IEEE Workshop on Neural Networks and Signal Processing, 1997.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Springer
About this chapter
Cite this chapter
Jang, GJ., Lee, TW. (2007). Monaural Source Separation. In: Makino, S., Sawada, H., Lee, TW. (eds) Blind Speech Separation. Signals and Communication Technology. Springer, Dordrecht. https://doi.org/10.1007/978-1-4020-6479-1_12
Download citation
DOI: https://doi.org/10.1007/978-1-4020-6479-1_12
Publisher Name: Springer, Dordrecht
Print ISBN: 978-1-4020-6478-4
Online ISBN: 978-1-4020-6479-1
eBook Packages: EngineeringEngineering (R0)