Monaural voiced speech segregation based on elaborate harmonic grouping strategies

Liu, WenJu; Zhang, XueLiang; Jiang, Wei; Li, Peng; Xu, Bo

doi:10.1007/s11432-011-4506-2

Monaural voiced speech segregation based on elaborate harmonic grouping strategies

Research Papers
Special Focus
Published: 03 December 2011

Volume 54, pages 2471–2480, (2011)
Cite this article

Science China Information Sciences Aims and scope Submit manuscript

WenJu Liu¹,
XueLiang Zhang¹,
Wei Jiang¹,
Peng Li² &
…
Bo Xu²

57 Accesses
1 Citation
Explore all metrics

Abstract

In this paper, an enhanced algorithm based on several elaborate harmonic grouping strategies for monaural voiced speech segregation is proposed. Main achievements of the proposed algorithm lie in three aspects. Firstly, the algorithm classifies the time-frequency (T-F) units into resolved and unresolved ones by carrier-to-envelope energy ratio, which leads to more accurate classification results than by cross-channel correlation. Secondly, resolved T-F units are grouped together according to minimum amplitude principle, which has been verified to exist in human perception, as well as the harmonic principle. Finally, “enhanced” envelope autocorrelation function is employed to detect amplitude modulation rates, which helps a lot in reducing half-frequency error in grouping of unresolved units. Systematic evaluation and comparison show that performance of separation is greatly improved by the proposed algorithm. Specifically, signal-to-noise ratio (SNR) is improved by 0.96 dB compared with that of previous method. Besides, our algorithm is also effective in improving the PESQ score and subjective perception score.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Article Open access 03 January 2024

Mahendra Kumar Gourisaria, Rakshit Agrawal, … Pradeep Kumar Singh

Introduction to Acoustic Terminology and Signal Processing

The cocktail-party problem revisited: early processing and selection of multi-talker speech

Article Open access 01 April 2015

Adelbert W. Bronkhorst

References

Boll S. Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans Acoustics Speech Signal Process, 1979, 27: 113–120
Article Google Scholar
Paliwal K, Wojcicki K, Schwerin B. Single-channel speech enhancement using spectral subtraction in the short-time modulation domain. Speech Commun, 2010, 52: 450–475
Article Google Scholar
Benesty J, Makino S, Chen J. Speech Enhancement. New York: Springer, 2005
Google Scholar
Asano F, Ikeda S, Ogawa M, et al. Combined approach of array processing and independent component analysis for blind separation of acoustic signals. IEEE Trans Speech Audio Process, 2003, 11: 204–215
Article Google Scholar
Koldovsky Z, Tichavsky P. Time-domain blind separation of audio sources based on a complete ICA decomposition of an observation space. IEEE Trans Audio Speech Lang Process, 2011, 19: 406–416
Article Google Scholar
Wang D L, Brown G J. Computational auditory scene analysis: principles, algorithms and applications. New Jersey: Wiley-IEEE Press, 2006
Google Scholar
Bregman S. Auditory Scene Analysis. MA: MIT Press, 1990
Google Scholar
Weintraub M. A theory and computational model of monaural auditory sound separation. Dissertation for Doctoral Degree. Palo Alto: Stanford University, 1985
Google Scholar
Cooke M P. Modeling auditory processing and organization. Dissertation for Doctoral Degree. Sheffield: University of Sheffield, 1991
Google Scholar
Hu G N, Wang D L. Monaural speech segregation based on pitch tracking and amplitude modulation. IEEE Trans Neural Netw, 2004, 15: 1135–1150
Article Google Scholar
Li P, Guan Y, Wang S, et al. Monaural speech separation based on MAXVQ and CASA for robust speech recognition. Comput Speech Lang, 2010, 24: 30–44
Article Google Scholar
Carlyon R P, Shackleton T M. Comparing the fundamental frequencies of resolved and unresolved harmonics: evidence for two pitch mechanisms? J Acoust Soc Am, 1994, 95: 3541–3554
Article Google Scholar
Klapuri A. Auditory-model based methods for multiple fundamental frequency estimation. In: Signal Processing Methods for Music Transcription. New York: Springer, 2006. 229–265
Chapter Google Scholar
de Boer E, de Jongh H R. On cochlear encoding: potentialities and limitations of the reverse-correlation techniques. J Acoust Soc Amer, 1978, 63: 115–135
Article Google Scholar
Kohlrausch A, Fassel R, Dau T. The influence of carrier level and frequency on modulation and beat-detection thresholds for sinusoidal carriers. J Acoust soc Am, 2000, 108: 723–734
Article Google Scholar
Tolonen T, Karjalainen M. A computationally efficient multipitch analysis model. IEEE Trans Speech Audio Process, 2000, 8: 708–716
Article Google Scholar
Hu G, Wang D L. A tandem algorithm for pitch estimation and voiced speech segregation. IEEE Trans Audio Speech Lang Process, 2010, 18: 2067–2079
Article Google Scholar
Wang D L. On ideal binary mask as the computational goal of auditory scene analysis. In: Divenyi P, ed. Speech Separation by Humans and Machines. Boston: Kluwer, 2005. 181–197
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China
WenJu Liu, XueLiang Zhang & Wei Jiang
Digital Media Content Technology Research Center, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China
Peng Li & Bo Xu

Authors

WenJu Liu
View author publications
You can also search for this author in PubMed Google Scholar
XueLiang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Wei Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Peng Li
View author publications
You can also search for this author in PubMed Google Scholar
Bo Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to WenJu Liu.

Additional information

LIU WenJiu was born in 1960. He received the B.S., M.S. degrees in mathematics from Peking University and Beijing University of Posts and Telecommunications, and Ph.D. degree in computer applications from Tsinghua University, Beijing, China, in 1983, 1989 and 1993, respectively. Currently, he is a research professor at the National Key Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China. His research interests include speech recognition, speech synthesis, speaker recognition, key words spotting, computational auditory scene analysis, speech enhancement, noise reduction, etc. Dr. Liu Wenju is a member of Neural Network Committee of China and the Signal Processing Society of the IEEE. He is an editorial board member of journal of Computer Science Application as well as a reviewer of numerous academic journals such as IEEE Transaction on Audio, Speech, and Language Processing, Cognitive Computation, etc.

JIANG Wei was born in 1982. He reveived the B.S. degree from Yanshan University in Qinhuangdao, China in 2005 and the M.S. degree from Harbin Institute of Technology in Harbin, China in 2008. He is currently working toward the Ph.D. degree at the Institute of Automation, Chinese Academy of Sciences, Beijing, China. His research interests include speech segregation, computational auditory scene analysis and acoustic properties of speech.

ZHANG XueLiang was born in 1981. He received the B.S. degree from Inner Mongolia University in Hohhot, China in 2003 and the M.S. degree from Harbin Institute of Technology in Harbin, China in 2005 and the Ph.D. degree in Pattern Recognition and Intelligent System from Institute of Automation, Chinese Academy of Sciences, Beijing, China in 2010. Currently, he is a lecturer at the Computer Sciences Department, Inner Mongolia University. His research interests include speech separation, computational auditory scene analysis and speech signal processing. Dr. Zhang Xueliang is a member of International Speech Communication Association.

Electronic supplementary material

Supplementary material, approximately 2.75 MB.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, W., Zhang, X., Jiang, W. et al. Monaural voiced speech segregation based on elaborate harmonic grouping strategies. Sci. China Inf. Sci. 54, 2471–2480 (2011). https://doi.org/10.1007/s11432-011-4506-2

Download citation

Received: 17 June 2011
Accepted: 26 September 2011
Published: 03 December 2011
Issue Date: December 2011
DOI: https://doi.org/10.1007/s11432-011-4506-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Monaural voiced speech segregation based on elaborate harmonic grouping strategies

Abstract

Access this article

Similar content being viewed by others

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Introduction to Acoustic Terminology and Signal Processing

The cocktail-party problem revisited: early processing and selection of multi-talker speech

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Monaural voiced speech segregation based on elaborate harmonic grouping strategies

Abstract

Access this article

Similar content being viewed by others

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Introduction to Acoustic Terminology and Signal Processing

The cocktail-party problem revisited: early processing and selection of multi-talker speech

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation