Research on speech separation technology based on deep learning

Zhou, Yan; Zhao, Heming; Chen, Jie; Pan, Xinyu

doi:10.1007/s10586-018-2013-6

Research on speech separation technology based on deep learning

Published: 14 February 2018

Volume 22, pages 8887–8897, (2019)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Yan Zhou^1,2,
Heming Zhao²,
Jie Chen^1,2 &
…
Xinyu Pan³

448 Accesses
1 Citation
Explore all metrics

Abstract

In order to solve the problem of instability of the traditional speech separation algorithm, a kind of reverberation speech separation model based on deep learning is proposed. The problem of speech separation in reverberation environment has been studied. The auditory scene analysis is used to simulate the human auditory perception ability. According to the ideal two value mode principle, the target speech signal can be extracted. Moreover, the deep neural network (DNN) shows great learning ability in speech recognition and artificial intelligence. In this paper, a DNN model is proposed to learn the inverse reverberation and denoising by learning the spectrum mapping between “contaminated” speech and pure speech. By extracting a series of spectrum features, the time dynamic information of adjacent frames is fused. The DNN is used to transform the coded spectrum, and restore the pure voice frequency spectrum. Finally, the time domain signal is reconstructed. In addition, the feature classification ability of DNN is also proposed to complete the separation of double sound reverberation speech. The binaural features ITD and ILD and the mono features GFCC are fused to form a long eigenvector. The DNN is pre-trained by RBM to complete the classification task. The results show that the proposed model improves the quality and intelligibility of the speech separation, and enhances the stability of the system significantly.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Methods for image denoising using convolutional neural network: a review

Article Open access 10 June 2021

Ademola E. Ilesanmi & Taiwo O. Ilesanmi

Automatic speech recognition: a survey

Article 10 November 2020

Mishaim Malik, Muhammad Kamran Malik, … Imran Makhdoom

A comprehensive survey on automatic speech recognition using neural networks

Article 15 August 2023

Amandeep Singh Dhanjal & Williamjeet Singh

References

Barker, J.P.: Evaluation of scene analysis using real and simulated acoustic mixtures: lessons learnt from the chime speech recognition challenges. J. Acoust. Soc. Am. 141(5), 3693–3693 (2017)
Article Google Scholar
Asaei, A., Taghizadeh, M. J., Cevher, V.: Computational methods for underdetermined convolutive speech localization and separation via model-based sparse component analysis. Speech Commun. 76(C), 201–217 (2016)
Josupeit, A., Kopčo, N., Hohmann, V.: Modeling of speech localization in a multi-talker mixture using periodicity and energy-based auditory features. J. Acoust. Soc. Am. 139(5), 2911 (2016)
Article Google Scholar
Scholes, C., Palmer, A.R., Sumner, C.J.: Stream segregation in the anesthetized auditory cortex. Hear. Res. 328(2), 48–58 (2015)
Article Google Scholar
Denham, S., Coath, M.: The role of form in modeling auditory scene analysis. J. Acoust. Soc. Am. 137(4), 2249–2249 (2015)
Article Google Scholar
Vander, G.M., Bourguignon, M., de Beeck, M., Wens, V., Marty, B., Hassid, S., et al.: Left superior temporal gyrus is coupled to attended speech in a cocktail-party auditory scene. J. Neurosci. 36(5), 1596–1606 (2016)
Rogalsky, C., Poppa, T., Chen, K.H., Anderson, S.W., Damasio, H., Love, T., et al.: Speech repetition as a window on the neurobiology of auditory-motor integration for speech: a voxel-based lesion symptom mapping study. Neuropsychologia 71(01), 18 (2015)
Article Google Scholar
White-Schwoch, T., Davies, E.C., Thompson, E.C., Carr, K.W., Nicol, T., Bradlow, A.R., et al.: Auditory-neurophysiological responses to speech during early childhood: effects of background noise. Hear. Res. 328, 34–47 (2015)
Article Google Scholar
Moossavi, A., Mehrkian, S., Lotfi, Y., Faghih Zadeh, S., Adjedi, H.: The effect of working memory training on auditory stream segregation in auditory processing disorders children. Optics Commun 281(9), 2491–2497 (2015)
Google Scholar
Kenway, B., Tam, Y.C., Vanat, Z., Harris, F., Gray, R., Birchall, J., et al.: Pitch discrimination: an independent factor in cochlear implant performance outcomes. Otol. Neurotol. 36(9), 1472–1479 (2015)
Article Google Scholar
Mathon, B., Ulvin, L.B., Adam, C., Baulac, M., Dupont, S., Navarro, V., et al.: Surgical treatment for mesial temporal lobe epilepsy associated with hippocampal sclerosis. Revue Neurol. 171(3), 315–325 (2015)
Article Google Scholar
Leclère, T., Lavandier, M., Culling, J.F.: Speech intelligibility prediction in reverberation: towards an integrated model of speech transmission, spatial unmasking, and binaural de-reverberation. J. Acoust. Soc. Am. 137(6), 3335–3345 (2015)
Article Google Scholar
Léger, A.C., Reed, C.M., Desloge, J.G., Swaminathan, J., Braida, L.D.: Consonant identification in noise using hilbert-transform temporal fine-structure speech and recovered-envelope speech for listeners with normal and impaired hearing. J. Acoust. Soc. Am. 138(1), 389–403 (2015)
Article Google Scholar
Koralus, P.: Can visual cognitive neuroscience learn anything from the philosophy of language? ambiguity and the topology of neural network models of multistable perception. Synthese 193(5), 1409–1432 (2016)
Article Google Scholar

Download references

Acknowledgement

The authors acknowledge the National Natural Science Foundation of China (Grant: 61372146, 61373098), the Youth Natural Science Foundation of Jiangsu Province of China (Grant: BK20160361), the Qinglan Project Young and Middle-aged Academic Leader Foundation of Jiangsu Province, the Professional Leader Advanced Research Project Foundation of Higher Vocational College of Jiangsu Province (Grant: 2017GRFX046).

Author information

Authors and Affiliations

College of Electronic and Information Engineering, Suzhou Vocational University, Suzhou, China
Yan Zhou & Jie Chen
School of Electronic and Information Engineering, Soochow University, Suzhou, China
Yan Zhou, Heming Zhao & Jie Chen
College of Electronics and Information Engineering, Suzhou Science and Technology University, Suzhou, China
Xinyu Pan

Authors

Yan Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Heming Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Jie Chen
View author publications
You can also search for this author in PubMed Google Scholar
Xinyu Pan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yan Zhou.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhou, Y., Zhao, H., Chen, J. et al. Research on speech separation technology based on deep learning. Cluster Comput 22 (Suppl 4), 8887–8897 (2019). https://doi.org/10.1007/s10586-018-2013-6

Download citation

Received: 25 December 2017
Revised: 28 January 2018
Accepted: 02 February 2018
Published: 14 February 2018
Issue Date: July 2019
DOI: https://doi.org/10.1007/s10586-018-2013-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Research on speech separation technology based on deep learning

Abstract

Access this article

Similar content being viewed by others

Methods for image denoising using convolutional neural network: a review

Automatic speech recognition: a survey

A comprehensive survey on automatic speech recognition using neural networks

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Research on speech separation technology based on deep learning

Abstract

Access this article

Similar content being viewed by others

Methods for image denoising using convolutional neural network: a review

Automatic speech recognition: a survey

A comprehensive survey on automatic speech recognition using neural networks

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation