Skip to main content

Monaural Source Separation

  • Chapter
Book cover Blind Speech Separation

Part of the book series: Signals and Communication Technology ((SCT))

This chapter discusses source separation methods when only single channel observation is available. The problem is underdeterministic, in that multiple source signals should be extracted from a single stream of observations. To overcome the mathematical intractability, prior information on the source characteristics is generally assumed and applied to derive a source separation algorithm. This chapter describes one of the monaural source separation approach, which is based on exploiting a priori sets of time-domain basis functions learned by independent component analysis (ICA). The inherent time structure of sound sources is reflected in the ICA basis functions, which encode the sources in a statistically effi- cient manner. Detailed derivation of the source separation algorithm is described, given the observed single channel data and sets of basis functions. The prior knowledge given by the basis functions and the associated coefficient densities enables inferring the original source signals. A flexible model for density estimation allows accurate modeling of the observation and the experimental results exhibit a high level of separation performance for simulated mixtures as well as real environment recordings employing mixtures of two different sources.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. A. S. Bregman, Auditory Scene Analysis: The Perceptual Organization of Sound. MIT Press, Cambridge MA, 1990.

    Google Scholar 

  2. A. S. Bregman, Computational Auditory Scene Analysis. MIT Press, Cambridge MA, 1994.

    Google Scholar 

  3. G. J. Brown and M. Cooke, “Computational auditory scene analysis,” Com-puter Speech and Language, vol. 8, no. 4, pp. 297-336, 1994.

    Article  Google Scholar 

  4. P. Comon, “Independent component analysis, A new concept?” Signal Process-ing, vol. 36, pp. 287-314, 1994.

    Article  MATH  Google Scholar 

  5. A. J. Bell and T. J. Sejnowski, “An information-maximization approach to blind separation and blind deconvolution,” Neural Computation, vol. 7, no. 6, pp. 1004-1034, 1995.

    Article  Google Scholar 

  6. J.-F. Cardoso and B. Laheld, “Equivariant adaptive source separation,” IEEE Trans. on S.P., vol. 45, no. 2, pp. 424-444, 1996.

    Google Scholar 

  7. S. T. Roweis, “One microphone source separation,” Advances in Neural Infor-mation Processing Systems, vol. 13, pp. 793-799, 2001.

    Google Scholar 

  8. D. D. Lee and S. S. Seung, “Learning the parts of objects by non-negative matrix factorization,” Nature, vol. 401, pp. 788-791, 1999.

    Article  Google Scholar 

  9. P. Smaragdis, “Non-negative matrix factor deconvolution; extraction of mul-tiple sound sources from monophonic inputs,” in Proc. ICA2004, vol. 3195, pp. 494-501, Sept. 2004.

    Google Scholar 

  10. M. N. Schmidt and M. Mørup, “Nonnegative matrix factor 2-D deconvolution for blind single channel source separation,” in Proc. ICA2006, Apr. 2006.

    Google Scholar 

  11. A. J. Bell and T. J. Sejnowski, “The “independent components” of natural scenes are edge filters,” Vision Research, vol. 37, no. 23, pp. 3327-3338, 1997.

    Article  Google Scholar 

  12. A. J. Bell and T. J. Sejnowski, “Learning the higher-order structures of a natural sound,” Network: Computation in Neural Systems, vol. 7, pp. 261-266, July 1996.

    Article  MATH  Google Scholar 

  13. S. A. Abdallah and M. D. Plumbley, “If the independent components of natural images are edges, what are the independent components of natural sounds?” in Proceedings of International Conference on Independent Component Analysis and Signal Separation (ICA2001), (San Diego, CA), pp. 534-539, Dec. 2001.

    Google Scholar 

  14. T.-W. Lee and G.-J. Jang, “The statistical structures of male and female speech signals,” in Proc. ICASSP, (Salt Lake City, Utah), May 2001.

    Google Scholar 

  15. B. A. Olshausen and D. J. Field, “Emergence of simple-cell receptive-field prop-erties by learning a sparse code for natural images,” Nature, vol. 381, pp. 607-609,1996.

    Article  Google Scholar 

  16. M. Zibulevsky and B. A. Pearlmutter, “Blind source separation by sparse de-composition,” Neural Computations, vol. 13, no. 4, 2001.

    Google Scholar 

  17. M. S. Lewicki, “Efficient coding of natural sounds,” Nature Neuroscience, vol. 5, no. 4, pp. 356-363, 2002.

    Article  Google Scholar 

  18. J. Hopgood and P. Rayner, “Single channel signal separation using linear time-varying filters: Separability of non-stationary stochastic signals,” in Proc. ICASSP, vol. 3, (Phoenix, Arizona), pp. 1449-1452, Mar. 1999.

    Google Scholar 

  19. B. Pearlmutter and L. Parra, “A context-sensitive generalization of ICA,” in Proc. ICONIP, (Hong Kong), pp. 151-157, Sept. 1996.

    Google Scholar 

  20. J.-F. Cardoso, “Infomax and maximum likelihood for blind source separation,” IEEE Signal Processing Letters, vol. 4, pp. 112-114, Apr. 1997.

    Article  Google Scholar 

  21. T.-W. Lee, M. Girolami, A. Bell, and T. Sejnowski, “A unifying information-theoretic framework for independent component analysis,” Computers & Math-ematics with Applications, vol. 31, pp. 1-21, Mar. 2000.

    Article  MathSciNet  Google Scholar 

  22. D. T. Pham and P. Garrat, “Blind source separation of mixture of indepen-dent sources through a quasi-maximum likelihood approach,” IEEE Trans. on Signal Proc., vol. 45, no. 7, pp. 1712-1725, 1997.

    Article  MATH  Google Scholar 

  23. A. Hyvärinen, “Sparse code shrinkage: denoising of nongaussian data by maxi-mum likelihood estimation,” Neural Computation, vol. 11, no. 7, pp. 1739-1768, 1999.

    Article  Google Scholar 

  24. J.-H. Lee, H.-Y. Jung, T.-W. Lee, and S.-Y. Lee, “Speech feature extraction using independent component analysis,” in Proc. ICASSP, vol. 3, (Istanbul, Turkey), pp. 1631-1634, June 2000.

    Google Scholar 

  25. G. Box and G. Tiao, Baysian Inference in Statistical Analysis. John Wiley and Sons, 1973.

    Google Scholar 

  26. T.-W. Lee and M. S. Lewicki, “The generalized Gaussian mixture model us-ing ICA,” in International Workshop on Independent Component Analysis (ICA’00), (Helsinki, Finland), pp. 239-244, June 2000.

    Google Scholar 

  27. S. Rickard, R. Balan, and J. Rosca, “Real-time time-frequency based blind source separation,” in Proceedings of International Conference on Indepen-dent Component Analysis and Signal Separation (ICA2001), (San Diego, CA), pp. 651-656, Dec. 2001.

    Google Scholar 

  28. T. Virtanen, “Sound source separation using sparse coding with temporal conti-nuity objective,” in Proceedings of International Computer Music Conference, Oct. 2003.

    Google Scholar 

  29. T. Virtanen, “Separation of sound sources by convolutive sparse coding,” in ISCA Tutorial and Research Workshop on Statistical and Perceptual Audio Processing, 2004.

    Google Scholar 

  30. T. Virtanen, Signal Processing Methods for Music Transcription, Eds. A. Klapuri and M. Davy, ch. Unsupervised Learning Methods for Source Separation. Springer-Verlag, 2006.

    Google Scholar 

  31. T. Virtanen, “Speech recognition using factorial hidden markov models for separation in the feature space,” in Interspeech (ICSLP), (Pittsburgh, USA), 2006.

    Google Scholar 

  32. R. Balan, A. Jourjine, and J. Rosca, “AR processes and sources can be recon-structed from degenerate mixtures,” in Proceedings of the First International Workshop on Independent Component Analysis and Signal Separation (ICA99), (Aussois, France), pp. 467-472, Jan. 1999.

    Google Scholar 

  33. E. Wan and A. T. Nelson, “Neural dual extended Kalman filtering: Applications in speech enhancement and monaural blind signal separation,” in Proceedings of IEEE Workshop on Neural Networks and Signal Processing, 1997.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer

About this chapter

Cite this chapter

Jang, GJ., Lee, TW. (2007). Monaural Source Separation. In: Makino, S., Sawada, H., Lee, TW. (eds) Blind Speech Separation. Signals and Communication Technology. Springer, Dordrecht. https://doi.org/10.1007/978-1-4020-6479-1_12

Download citation

  • DOI: https://doi.org/10.1007/978-1-4020-6479-1_12

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-1-4020-6478-4

  • Online ISBN: 978-1-4020-6479-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics