Two speaker speech separation by LP residual weighting and harmonics enhancement

Krishnamoorthy, P.; Mahadeva Prasanna, S. R.

doi:10.1007/s10772-010-9074-0

Two speaker speech separation by LP residual weighting and harmonics enhancement

Published: 26 May 2010

Volume 13, pages 117–139, (2010)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

P. Krishnamoorthy¹ &
S. R. Mahadeva Prasanna²

185 Accesses
Explore all metrics

Abstract

This paper presents a method for separating speech of individual speakers from the combined speech of two speakers. The main objective of this work is to demonstrate the significance of the combined excitation source based temporal processing and short-time spectrum based spectral processing method for the separation of speech produced by individual speakers. Speech in a two speaker environment is simultaneously collected over two spatially separated microphones. The speech signals are first subjected to excitation source information (linear prediction residual) based temporal processing. In temporal processing, speech of each speaker is enhanced with respect to the other by relatively emphasizing the speech around the instants of significant excitation of desired speaker by deriving speaker-specific weight function. To further improve the separation, the temporally processed speech is subjected to spectral processing. This involves enhancing the regions around the pitch and harmonic peaks of short time spectra computed from the temporally processed speech. To do so the pitch estimate is obtained from the temporally processed speech. The performance of the proposed method is evaluated using (i) objective quality measures: percentage of energy loss, percentage of noise residue, the signal-to-noise ratio (SNR) gain and perceptual evaluation of speech quality (PESQ), and (ii) subjective quality measure: mean opinion score (MOS). Experimental results are reported for both real and synthetic speech mixtures. The SNR gain and MOS values show that the proposed combined temporal and spectral processing method provides an average improvement in the performance of 5.83% and 8.06% respectively, compared to the best performing individual temporal or spectral processing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Single-channel speech separation using empirical mode decomposition and multi pitch information with estimation of number of speakers

Article 29 November 2016

Hybrid speech enhancement with empirical mode decomposition and spectral subtraction for efficient speaker identification

Article 29 August 2015

Single-channel speech separation using combined EMD and speech-specific information

Article 23 October 2017

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Ananthapadmanabha, T. V., & Yegnanarayana, B. (1979). Epoch extraction from linear prediction residual for identification of closed glottis interval. IEEE Transactions on Acoustics, Speech, and Signal Processing, 27, 309–319.
Article Google Scholar
Araki, S., Mukai, R., Makino, S., Nishikawa, T., & Saruwatari, H. (2003). The fundamental limitation of frequency domain blind source separation for convolutive mixtures of speech. IEEE Transactions on Speech and Audio Processing, 11(2), 109–116.
Article Google Scholar
Asano, F., Ikeda, S., Ogawa, M., Asoh, H., & Kitawaki, N. (2003). Combined approach of array processing and independent component analysis for blind separation of acoustic signals. IEEE Transactions on Speech and Audio Processing, 11(3), 204–215.
Article Google Scholar
Berouti, M., Schwartz, R., & Makhoul, J. (1979). Enhancement of speech corrupted by acoustic noise. In Proc. IEEE int. conf. acoust., speech, signal process (pp. 208–211).
Brown, G. J., & Cooke, M. (1994). Computational auditory scene analysis. Computer Speech and Language, 8(4), 297–336.
Article Google Scholar
Brown, G. J., & Wang, D. (2005). Separation of speech by computational auditory scene analysis. In Benesty, J., Makino, S., & Chen, J. (Eds.) Speech enhancement (pp. 371–402). Berlin: Springer.
Chapter Google Scholar
Buchner, H., Aichner, R., & Kellermann, W. (2005). A generalization of blind source separation algorithms for convolutive mixtures based on second-order statistics. IEEE Transactions on Speech and Audio Processing, 13(1), 120–134.
Article Google Scholar
Chen, J., Benesty, J., & Huang, Y. A. (2006). Time delay estimation in room acoustic environments: an overview. EURASIP Journal of Applied Signal Processing. doi:10.1155/ASP/2006/26503
Google Scholar
Das, N., Routray, A., & Dash, P. K. (2007). ICA methods for blind source separation of instantaneous mixtures: a case study. Neural Information Process. Letters and Reviews, 11(11), 225–246.
Google Scholar
Deller, J. R., Hansen, J. H., & Proakis, J. G. (1993). Discrete time processing of speech signals (1st ed.). Upper Saddle River: Prentice Hall.
Google Scholar
Hanson, B., & Wong, D. (1984). The harmonic magnitude suppression (HMS) technique for intelligibility enhancement in the presence of interfering speech. In Proc. IEEE int. conf. acoust., speech, signal process (Vol. 9, pp. 65–68).
Jang, G.-J., & Lee, T.-W. (2003). A maximum likelihood approach to single-channel source separation. Journal of Machine Learning Research, 4, 1365–1392. Special issue on independent components analysis.
Article MathSciNet Google Scholar
Jang, G.-J., Lee, T.-W., & Oh, Y.-H. (2003). Single-channel signal separation using time-domain basis functions. IEEE Signal Processing Letters, 10(6), 168–171.
Article Google Scholar
Koldovsky, Z., & Tichavsky, P. (2007). Time-domain blind audio source separation using advanced ICA methods. In Proc. interspeech, Antwerp, Belgium (pp. 27–31).
Krishnamoorthy, P., & Prasanna, S. R. M. (2007). Processing noisy speech by noise components subtraction and speech components enhancement. In Proc. int. conf. systemics, cybernetics and informatics, Hyberabad, India.
Kumara Swamy, R., Sri Rama Murty, K., & Yegnanarayana, B. (2007). Determining number of speakers from multispeaker speech signals using excitation source information. IEEE Signal Processing Letters, 14(7), 481–484.
Article Google Scholar
Lee, C. K., & Childers, D. G. (1988). Cochannel speech separation. The Journal of the Acoustical Society of America, 83, 274–280.
Article Google Scholar
Lim, J. S., & Oppenheim, A. V. (1979). Enhancement and bandwidth compression of noisy speech. Proceedings of the IEEE, 67(12), 1586–1604.
Article Google Scholar
Mahgoub, Y. A., & Dansereau, R. M. (2008). Time domain method for precise estimation of sinusoidal model parameters of co-channel speech. Research Letters in Signal Processing. doi:10.1155/2008/364674.
Google Scholar
Makhoul, J. (1975). Linear prediction: A tutorial review. Proceedings of the IEEE, 63, 561–580.
Article Google Scholar
Markel, J. (1972). The SIFT algorithm for fundamental frequency estimation. IEEE Transactions on Audio and Electroacoustics, 20, 367–377.
Article Google Scholar
Morgan, D. P., George, E. B., Lee, L. T., & Kay, S. M. (1997). Cochannel speaker separation by harmonic enhancement and suppression. IEEE Transactions on Speech and Audio Processing, 5, 407–424.
Article Google Scholar
Parsons, T. W. (1976). Separation of speech from interfering speech by means of harmonic selection. The Journal of the Acoustical Society of America, 60, 911–918.
Article Google Scholar
Pedersen, M. S., Wang, D., Larsen, J., & Kjems, U. (2008). Two-microphone separation of speech mixtures. IEEE Transactions on Neural Networks, 19(3), 475–492.
Article Google Scholar
Prasanna, S. R. M., & Subramanian, A. (2005). Finding pitch markers using first order Gaussian differentiator. In IEEE proc. third int. conf. intelligent sensing information process, Bangalore, India (pp. 140–145).
Prasanna, S. R. M., & Yegnanarayana, B. (2004). Extraction of pitch in adverse conditions. In Proc. IEEE int. conf. acoust., speech, signal process, Montreal, Quebec, Canada (Vol. 1, pp. I-109–I-112).
Proakis, J. G., & Manolakis, D. G. (1996). Digital signal processing-principles, algorithms, and applications (3rd ed.). Upper Saddle River: Prentice Hall.
Google Scholar
Quatieri, T. F., & Danisewicz, R. G. (1990). An approach to co-channel talker interference suppression using a sinusoidal model for speech. IEEE Transactions on Acoustics, Speech, and Signal Processing, 38, 56–69.
Article Google Scholar
Radfar, M. H., Dansereau, R. M., & Sayadiyan, A. (2007). Monaural speech segregation based on fusion of source-driven with model-driven techniques. Speech Communication, 49(6), 464–476.
Article Google Scholar
Rix, A. W., Hollier, M. P., Hekstra, A. P., & Beerends, J. G. (2002). Perceptual evaluation of speech quality (PESQ) the new ITU standard for end-to-end speech quality assessment, part I—time-delay compensation. Journal of Audio Engineering Society, 50(10), 755–764.
Google Scholar
Rouat, J., Pichevar, R., Rouat, P. J., & Sherbrooke, U. D. (2005). Source separation with one ear: proposition for an anthropomorphic approach. EURASIP Journal on Applied Signal Processing, 9, 1365–1373.
Google Scholar
Saruwatari, H., Kurita, S., Takeda, K., Itakura, F., Nishikawa, T., & Shikano, K. (2003). Blind source separation combining independent component analysis and beamforming. EURASIP Journal of Applied Signal Processing, 2003(11), 1135–1146.
Article MATH Google Scholar
Slaney, M. (2005). The history and future of CASA. In Divenyi, P. (Ed.) Speech separation by humans and machines (pp. 199–211). Norwell: Kluwer Academic.
Chapter Google Scholar
Smith, D., Lukasiak, J., & Burnett, I. (2005). Blind speech separation using a joint model of speech production. IEEE Signal Processing Letters, 12(11), 784–787.
Article Google Scholar
Smits, R., & Yegnanarayana, B. (1995). Determination of instants of significant excitation in speech using group delay function. IEEE Transactions on Speech and Audio Processing, 3, 325–333.
Article Google Scholar
Strube, H. W. (1981). Separation of several speakers recorded by two microphones (cocktail-party processing). Signal Processing, 3, 355–364.
Article Google Scholar
Wang, D., & Brown, G. J. (2006). Computational auditory scene analysis: principles, algorithms, and applications. New York: Wiley-IEEE Press.
Google Scholar
Wang, D. L., & Brown, G. (1999). Separation of speech from interfering sounds based on oscillatory correlation. IEEE Transactions on Neural Networks, 10(3), 684–697.
Article MathSciNet Google Scholar
Yegnanarayana, B., Prasanna, S. R. M., & Mathew, M. (2003). Enhancement of speech in multispeaker environment. In Proc. european conf. speech process., technology, Geneva, Switzerland (pp. 581–584).

Download references

Author information

Authors and Affiliations

R&D Center, Samsung India Electronics Pvt. Ltd., Noida, India
P. Krishnamoorthy
Department of ECE, Indian Institute of Technology Guwahati, Guwahati, Assam, India
S. R. Mahadeva Prasanna

Authors

P. Krishnamoorthy
View author publications
You can also search for this author in PubMed Google Scholar
S. R. Mahadeva Prasanna
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to P. Krishnamoorthy.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Krishnamoorthy, P., Mahadeva Prasanna, S.R. Two speaker speech separation by LP residual weighting and harmonics enhancement. Int J Speech Technol 13, 117–139 (2010). https://doi.org/10.1007/s10772-010-9074-0

Download citation

Received: 03 March 2010
Accepted: 04 May 2010
Published: 26 May 2010
Issue Date: September 2010
DOI: https://doi.org/10.1007/s10772-010-9074-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Two speaker speech separation by LP residual weighting and harmonics enhancement

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Single-channel speech separation using empirical mode decomposition and multi pitch information with estimation of number of speakers

Hybrid speech enhancement with empirical mode decomposition and spectral subtraction for efficient speaker identification

Single-channel speech separation using combined EMD and speech-specific information

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Two speaker speech separation by LP residual weighting and harmonics enhancement

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Single-channel speech separation using empirical mode decomposition and multi pitch information with estimation of number of speakers

Hybrid speech enhancement with empirical mode decomposition and spectral subtraction for efficient speaker identification

Single-channel speech separation using combined EMD and speech-specific information

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation