Separation of Reverberant Speech Based on Computational Auditory Scene Analysis

Hongyan, Li; Meng, Cao; Yue, Wang

doi:10.3103/S0146411618060068

Separation of Reverberant Speech Based on Computational Auditory Scene Analysis

Published: 28 January 2019

Volume 52, pages 561–571, (2018)
Cite this article

Automatic Control and Computer Sciences Aims and scope Submit manuscript

Li Hongyan¹,
Cao Meng¹ &
Wang Yue¹

71 Accesses
1 Citation
Explore all metrics

Abstract

This paper proposes a computational auditory scene analysis approach to separation of room reverberant speech, which performs multi-pitch tracking and supervised classification. The algorithm trains speech and non-speech model separately, which learns to map from harmonic features to grouping cue encoding the posterior probability of time-frequency unit being dominated by the target and periodic interference. Then, a likelihood ratio test selects the correct model for labeling time-frequency unit. Experimental results show that the proposed approach produces strong pitch tracking results and leads to significant improvements of predicted speech intelligibility and quality. Compared with the classical Jin-Wang algorithm, the average SNR of this algorithm is improved by 1.22 dB.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comparison of several computational auditory scene analysis (CASA) techniques for monaural speech segregation

Article Open access 04 August 2015

Binaural Scene Analysis with Multidimensional Statistical Filters

A hybrid algorithm for blind source separation of a convolutive mixture of three speech sources

Article Open access 17 June 2014

REFERENCES

Mingyang Wu and DeLiang Wang, A two-stage algorithm for one-microphone reverberant speech enhancement, IEEE Trans. Audio Speech Lang. Process., 2006, vol. 14, no. 3, pp. 774–784.
Article Google Scholar
Zhaozhang Jin and DeLiang Wang, A supervised learning approach to monaural segregation of reverberant speech, IEEE Trans. Audio Speech Lang. Process., 2009, vol. 17, no. 4, pp. 625–638.
Article Google Scholar
Cooke, M.P., Modeling Auditory Processing and Organization, Cambridge, UK: Cambridge University Press, 1993.
Google Scholar
Wei Guo and Fengjin Yu, Speech-music signal separation based on improved time-frequency ratio, Comput. Eng., 2015, vol. 41, no. 3, pp. 287–291.
Google Scholar
Moore, B.C.J., An Introduction to the Psychology of Hearing, London: Academic Press, 5th ed.
Xiaojia Zhao and Yang Shao, CASA-based robust speaker identification, IEEE Trans. Audio Speech Lang. Process., 2012, vol. 20, no. 5, pp. 1608–1616.
Article Google Scholar
Jianfen Ma, Research on Blind Separation and Enhancement of Speech Signals, Beijing: Electronic Industry Press, 2012.
Google Scholar
Yu Wang, Jiajun Lin, and Wenhao Yuan, Improved speech enhancement based on computational auditory scene analysis, J. East China Univ. Sci. Technol. (Natl. Sci. Ed.), 2012, vol. 38, no. 5, pp. 617–621.
Chun Wu, Cochannel Speech Separation Based on Computational Auditory Scene Analysis, Guangxi University, 2014.
Google Scholar
Qi Hu, Single-Channel Speech Separation Based on Computational Auditory Scene Analysis, Beijing Jiaotong University, 2014.
Google Scholar
Ubul Kurban, Hamdulla Askar, and Aysa Alim, A digital signal processing teaching methodology using Praat, 2009 4th International Conference on Computer Science and Education, Nanning: IEEE, 2009.
Li Hong-yan, Qu Jun-ling, and Zhang Xue-ying, The voiced speech blind signal separation algorithm based on signal energy, J. Jilin Univ. Eng. Technol. Ed., 2015, vol. 45, no. 5, pp. 1665–1670.
Google Scholar
Liheng Zhao and Zhengfu Wang, Monaural voiced speech separation based on harmonic and energy features, Acta Acust., 2012, vol. 37, no. 2, pp. 218–224.
Google Scholar
Lehmanna, E.A. and Johansson, A.M., Prediction of energy decay in room impulse responses simulated with an image-source model, Acoust. Soc. Am., 2008, vol. 124, no. 1, pp. 269–277.
Article Google Scholar
Xueliang Zhang, Yiju Liu, and Peng Li, Monaural voiced speech segregation based on improved harmonic grouping rules, Acta Acust., 2011, vol. 36, no. 1, pp. 88–96.
Google Scholar

Download references

ACKNOWLEDGMENTS

This work was supported by Shanxi Natural Science Foundation (no. 201701D121058).

Author information

Authors and Affiliations

College of Information Engineering, Taiyuan University of Technology Taiyuan, 030024, Taiyuan, China
Li Hongyan, Cao Meng & Wang Yue

Authors

Li Hongyan
View author publications
You can also search for this author in PubMed Google Scholar
Cao Meng
View author publications
You can also search for this author in PubMed Google Scholar
Wang Yue
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Li Hongyan.

Additional information

The article is published in the original.

About this article

Cite this article

Hongyan, L., Meng, C. & Yue, W. Separation of Reverberant Speech Based on Computational Auditory Scene Analysis. Aut. Control Comp. Sci. 52, 561–571 (2018). https://doi.org/10.3103/S0146411618060068

Download citation

Received: 07 March 2018
Revised: 12 April 2018
Accepted: 17 April 2018
Published: 28 January 2019
Issue Date: November 2018
DOI: https://doi.org/10.3103/S0146411618060068

Keywords:

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions