Skip to main content
Log in

Separation of Reverberant Speech Based on Computational Auditory Scene Analysis

  • Published:
Automatic Control and Computer Sciences Aims and scope Submit manuscript

Abstract

This paper proposes a computational auditory scene analysis approach to separation of room reverberant speech, which performs multi-pitch tracking and supervised classification. The algorithm trains speech and non-speech model separately, which learns to map from harmonic features to grouping cue encoding the posterior probability of time-frequency unit being dominated by the target and periodic interference. Then, a likelihood ratio test selects the correct model for labeling time-frequency unit. Experimental results show that the proposed approach produces strong pitch tracking results and leads to significant improvements of predicted speech intelligibility and quality. Compared with the classical Jin-Wang algorithm, the average SNR of this algorithm is improved by 1.22 dB.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.
Fig. 7.
Fig. 8.

Similar content being viewed by others

REFERENCES

  1. Mingyang Wu and DeLiang Wang, A two-stage algorithm for one-microphone reverberant speech enhancement, IEEE Trans. Audio Speech Lang. Process., 2006, vol. 14, no. 3, pp. 774–784.

    Article  Google Scholar 

  2. Zhaozhang Jin and DeLiang Wang, A supervised learning approach to monaural segregation of reverberant speech, IEEE Trans. Audio Speech Lang. Process., 2009, vol. 17, no. 4, pp. 625–638.

    Article  Google Scholar 

  3. Cooke, M.P., Modeling Auditory Processing and Organization, Cambridge, UK: Cambridge University Press, 1993.

    Google Scholar 

  4. Wei Guo and Fengjin Yu, Speech-music signal separation based on improved time-frequency ratio, Comput. Eng., 2015, vol. 41, no. 3, pp. 287–291.

    Google Scholar 

  5. Moore, B.C.J., An Introduction to the Psychology of Hearing, London: Academic Press, 5th ed.

  6. Xiaojia Zhao and Yang Shao, CASA-based robust speaker identification, IEEE Trans. Audio Speech Lang. Process., 2012, vol. 20, no. 5, pp. 1608–1616.

    Article  Google Scholar 

  7. Jianfen Ma, Research on Blind Separation and Enhancement of Speech Signals, Beijing: Electronic Industry Press, 2012.

    Google Scholar 

  8. Yu Wang, Jiajun Lin, and Wenhao Yuan, Improved speech enhancement based on computational auditory scene analysis, J. East China Univ. Sci. Technol. (Natl. Sci. Ed.), 2012, vol. 38, no. 5, pp. 617–621.

  9. Chun Wu, Cochannel Speech Separation Based on Computational Auditory Scene Analysis, Guangxi University, 2014.

    Google Scholar 

  10. Qi Hu, Single-Channel Speech Separation Based on Computational Auditory Scene Analysis, Beijing Jiaotong University, 2014.

    Google Scholar 

  11. Ubul Kurban, Hamdulla Askar, and Aysa Alim, A digital signal processing teaching methodology using Praat, 2009 4th International Conference on Computer Science and Education, Nanning: IEEE, 2009.

  12. Li Hong-yan, Qu Jun-ling, and Zhang Xue-ying, The voiced speech blind signal separation algorithm based on signal energy, J. Jilin Univ. Eng. Technol. Ed., 2015, vol. 45, no. 5, pp. 1665–1670.

    Google Scholar 

  13. Liheng Zhao and Zhengfu Wang, Monaural voiced speech separation based on harmonic and energy features, Acta Acust., 2012, vol. 37, no. 2, pp. 218–224.

    Google Scholar 

  14. Lehmanna, E.A. and Johansson, A.M., Prediction of energy decay in room impulse responses simulated with an image-source model, Acoust. Soc. Am., 2008, vol. 124, no. 1, pp. 269–277.

    Article  Google Scholar 

  15. Xueliang Zhang, Yiju Liu, and Peng Li, Monaural voiced speech segregation based on improved harmonic grouping rules, Acta Acust., 2011, vol. 36, no. 1, pp. 88–96.

    Google Scholar 

Download references

ACKNOWLEDGMENTS

This work was supported by Shanxi Natural Science Foundation (no. 201701D121058).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Li Hongyan.

Additional information

The article is published in the original.

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hongyan, L., Meng, C. & Yue, W. Separation of Reverberant Speech Based on Computational Auditory Scene Analysis. Aut. Control Comp. Sci. 52, 561–571 (2018). https://doi.org/10.3103/S0146411618060068

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.3103/S0146411618060068

Keywords:

Navigation