A near-end listening enhancement system by RNN-based noise cancellation and speech modification

Li, Gang; Hu, Ruimin; Wang, Xiaochen; Zhang, Rui

doi:10.1007/s11042-018-6947-8

A near-end listening enhancement system by RNN-based noise cancellation and speech modification

Published: 05 December 2018

Volume 78, pages 15483–15505, (2019)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Gang Li^1,2,3,
Ruimin Hu^1,2,3,
Xiaochen Wang¹ &
…
Rui Zhang¹

506 Accesses
6 Citations
Explore all metrics

Abstract

When people listen to the phone in noisy environments, near-end listening enhancement (NELE) is a technology to enhance speech intelligibility against environmental noise. The complex environments in mobile communications have inspired many scholars to engage in NELE researches. Although they have proposed a lot of NELE systems, they only focus on the speech modification to enhance the intelligibility. Few scholars have attempted to further enhance the intelligibility by noise cancellation. Because traditional noise cancellation is based on adaptive filtering. If the adaptive filtering is used in the most common handset mode, the noise cancellation result will be poor because of inadequate feedback caused by the feedback microphone exposed to complex environments. With the booming of the deep neural network (DNN), DNN is able to predict noise signals for noise cancellation without the feedback microphone, especially for recurrent neural network (RNN). In this study, we propose a NELE System by RNN-based noise cancellation and speech modification (RNC-SM), which introduce a noise cancellation function after speech modification. Compared with existing NELE systems, RNC-SM system effectively improves the objective speech intelligibility index (SII) scores and the subjective listening quality.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic speech recognition: a survey

Article 10 November 2020

A comprehensive survey on automatic speech recognition using neural networks

Article 15 August 2023

A Deep Learning Framework for Audio Deepfake Detection

Article 08 November 2021

References

Aicha AB (2017) Noise estimation for speech enhancement algorithms with post-smoothness processor incorporating global posterior SNR. Multimed Tools Appl 76(22):23661–23678
Article Google Scholar
ANSI (1997) American national standard methods for calculation of the speech intelligibility index. American National Standard Institute Inc s3:5–1997
Google Scholar
Ballou G (2015) Handbook for sound engineers Focal Press
Chen Z, Luo Y, Mesgarani N (2017) Deep attractor network for single-microphone speaker separation. In: IEEE international conference on acoustics, speech and signal processing, pp 246–250
Cooke M, King S, Garnier M, Aubanel V (2014) The listening talker: a review of human and algorithmic context-induced modifications of speech. Comput Speech Lang 28(2, SI):543–571
Article Google Scholar
Deng L, Yu D (2014) Deep learning: methods and applications. Now Publishers, Inc, Delft
MATH Google Scholar
ETSI (2014) TS 103 224 (V1.2.1): Speech and multimedia Transmission Quality (STQ); A sound field reproduction method for terminal testing including a background noise databas. Standard, ETSI
ETSI (2015) EG 202 396-1 (V1.6.1): Speech processing, transmission and quality aspects (STQ); Speech quality performance in the presence of background noise; Part 1: Background noise simulation technique and background noise databas. Standard, ETSI
George NV, Panda G (2013) Advances in active noise control: a survey, with emphasis on recent nonlinear techniques. Signal Process 93(2):363–377
Article Google Scholar
Han Y, Lee K (2016) Convolutional neural network with multiple-width frequency-delta data augmentation for acoustic scene classification. IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events
ITU-T P (1996) 800: Methods for subjective determination of transmission quality. International Telecommunication Union, Geneva
Google Scholar
Jokinen E, Remes U, Alku P (2016) The use of read versus conversational lombard speech in spectral tilt modeling for intelligibility enhancement in near-end noise conditions. In: Proceedings of the 17th annual conference of the international speech communication association, pp 2771–2775
Jokinen E, Remes U, Alku P (2017) Intelligibility enhancement of telephone speech using gaussian process regression for normal-to-lombard spectral tilt conversion. IEEE/ACM Trans Audio Speech Language Process 25(10):1985–1996
Article Google Scholar
Kakouros S, Rasanen O, Alku P (2017) Evaluation of spectral tilt measures for sentence prominence under different noise conditions. In: Proceedings of the annual conference of the international speech communication association, vol 2017, pp 3211–3215
Khademi S, Hendriks RC, Kleijn WB (2017) Intelligibility enhancement based on mutual information. IEEE/ACM Trans Audio Speech Language Process 25 (8):1694–1708
Article Google Scholar
Kleijn WB, Crespo JB, Hendriks RC, Petkov P, Sauert B, Vary P (2015) Optimizing speech intelligibility in a noisy environment: a unified view. IEEE Signal Process Mag 32(2):43–54
Article Google Scholar
Koutsogiannaki M, Francois H, Choo K, Oh E (2017) Real-time modulation enhancement of temporal envelopes for increasing speech intelligibility. In: Proceedings of the 18th annual conference of the international speech communication association, pp 1973–1977
Koutsogiannaki M, Stylianou Y (2014) Simple and artefact-free spectral modifications for enhancing the intelligibility of casual speech. In: IEEE international conference on acoustics, speech and signal processing, pp 4648–4652
Kuo SM, Morgan DR (1999) Active noise control: a tutorial review. Proc IEEE 87(6):943–973
Article Google Scholar
Niederjohn R, Grotelueschen J (1978) Speech intelligibility enhancement in a power generating noise environment. IEEE Trans Acoust Speech Signal Process 26(4):378–380
Article Google Scholar
Painter T, Spanias A (2000) Perceptual coding of digital audio. Proc IEEE 88(4):451–515
Article Google Scholar
Petkov PN, Kleijn WB (2015) Spectral dynamics recovery for enhanced speech intelligibility in noise. IEEE/ACM Trans Audio Speech Language Process 23(2):327–338
Article Google Scholar
Piczak KJ (2015) Environmental sound classification with convolutional neural networks. In: IEEE 25th international workshop on machine learning for signal processing, pp 1–6
Priyanka SS (2017) A review on adaptive beamforming techniques for speech enhancement. In: Innovations in power and advanced computing technologies, pp 1–6
Rao KR, Yip P (2014) Discrete cosine transform: algorithms, advantages, applications. Academic Press, Cambridge
MATH Google Scholar
Salamon J, Jacoby C, Bello JP (2014) A dataset and taxonomy for urban sound research. In: Proceedings of the 22nd ACM international conference on multimedia, pp 1041–1044
Sauert B, Vary P (2006) Near end listening enhancement: speech intelligibility improvement in noisy environments. In: IEEE international conference on acoustics speech and signal processing, vol 1, pp I–I
Spanias AS (1994) Speech coding: a tutorial review. Proc IEEE 82(10):1541–1582
Article Google Scholar
Taal CH, Hendriks RC, Heusdens R (2014) Speech energy redistribution for intelligibility improvement in noise based on a perceptual distortion measure. Comput Speech Lang 28(4):858–872
Article Google Scholar
Thomas IB, Niederjohn RJ (1968) Enhancement of speech intelligibility at high noise levels by filtering and clipping. J Audio Eng Soc 16(4):412–415
Google Scholar
Varga A, Steeneken HJ (1993) Assessment for automatic speech recognition: II. NOISEX-92: a database and an experiment to study the effect of additive noise on speech recognition systems. Speech Comm 12(3):247–251
Article Google Scholar
Wang D, Zhang X (2015) THCHS-30: a free chinese speech corpus. Computer Science
West NE, O’Shea T (2017) Deep architectures for modulation recognition. In: IEEE international symposium on dynamic spectrum access networks, pp 1–6
Yan C, Xie H, Chen J, Zha Z, Hao X, Zhang Y, Dai Q (2018) A fast uyghur text detector for complex background images. IEEE Trans Multimedia 20 (12):3389–3398. https://doi.org/10.1109/TMM.2018.2838320 ISSN=1520–9210
Article Google Scholar
Yan C, Xie H, Liu S, Yin J, Zhang Y, Dai Q (2018) Effective uyghur language text detection in complex background images for traffic prompt identification. IEEE Trans Intell Transp Syst 19(1):220–229
Article Google Scholar
Yan C, Xie H, Yang D, Yin J, Zhang Y, Dai Q (2018) Supervised hash coding with deep neural network for environment perception of intelligent vehicles. IEEE Trans Intell Transp Syst 19(1):284–295
Article Google Scholar
Yan C, Zhang Y, Xu J, Dai F, Li L, Dai Q, Wu F (2014) A highly parallel framework for HEVC coding unit partitioning tree decision on many-core processors. IEEE Signal Process Lett 21(5):573–576
Article Google Scholar
Yan C, Zhang Y, Xu J, Dai F, Zhang J, Dai Q, Wu F (2014) Efficient parallel framework for HEVC motion estimation on many-core processors. IEEE Trans Circuits Syst Video Technol 24(12):2077–2089
Article Google Scholar
Yu D, Li J (2017) Recent progresses in deep learning based acoustic models. IEEE/CAA Journal of Automatica Sinica 4(3):396–409
Article Google Scholar
Zorilă TC, Kandia V, Stylianou Y (2012) Speech-in-noise intelligibility improvement based on spectral shaping and dynamic range compression. In: Proceedings of the 13th annual conference of the international speech communication association, pp 634–637
Zorilă TC, Stylianou Y, Flanagan S, Moore BC (2017) Evaluation of near-end speech enhancement under equal-loudness constraint for listeners with normal-hearing and mild-to-moderate hearing loss. J Acoust Soc Am 141(1):189–196
Article Google Scholar
Zorilă TC, Stylianou Y, Ishihara T, Akamine M (2016) Near and far field speech-in-noise intelligibility improvements based on a time-frequency energy reallocation approach. IEEE/ACM Trans Audio Speech, and Language Process 24(10):1808–1818
Article Google Scholar

Download references

Acknowledgments

This work was supported by National Key R&D Program of China (No. 2017YFB1002803) and National Nature Science Foundation of China (No. 61801334, No. 61762005, No. U1736206).

Author information

Authors and Affiliations

National Engineering Research Center for Multimedia Software, School of Computer Science, Wuhan University, Wuhan, 430072, China
Gang Li, Ruimin Hu, Xiaochen Wang & Rui Zhang
Hubei Key Laboratory of Multimedia and Network Communication Engineering, Wuhan University, Wuhan, 430072, China
Gang Li & Ruimin Hu
Collaborative Innovation Center of Geospatial Technology, Wuhan, 430079, China
Gang Li & Ruimin Hu

Authors

Gang Li
View author publications
You can also search for this author in PubMed Google Scholar
Ruimin Hu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaochen Wang
View author publications
You can also search for this author in PubMed Google Scholar
Rui Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ruimin Hu.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, G., Hu, R., Wang, X. et al. A near-end listening enhancement system by RNN-based noise cancellation and speech modification. Multimed Tools Appl 78, 15483–15505 (2019). https://doi.org/10.1007/s11042-018-6947-8

Download citation

Received: 12 June 2018
Revised: 10 November 2018
Accepted: 23 November 2018
Published: 05 December 2018
Issue Date: 15 June 2019
DOI: https://doi.org/10.1007/s11042-018-6947-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A near-end listening enhancement system by RNN-based noise cancellation and speech modification

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

A comprehensive survey on automatic speech recognition using neural networks

A Deep Learning Framework for Audio Deepfake Detection

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A near-end listening enhancement system by RNN-based noise cancellation and speech modification

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

A comprehensive survey on automatic speech recognition using neural networks

A Deep Learning Framework for Audio Deepfake Detection

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation