Abstract
When people listen to the phone in noisy environments, near-end listening enhancement (NELE) is a technology to enhance speech intelligibility against environmental noise. The complex environments in mobile communications have inspired many scholars to engage in NELE researches. Although they have proposed a lot of NELE systems, they only focus on the speech modification to enhance the intelligibility. Few scholars have attempted to further enhance the intelligibility by noise cancellation. Because traditional noise cancellation is based on adaptive filtering. If the adaptive filtering is used in the most common handset mode, the noise cancellation result will be poor because of inadequate feedback caused by the feedback microphone exposed to complex environments. With the booming of the deep neural network (DNN), DNN is able to predict noise signals for noise cancellation without the feedback microphone, especially for recurrent neural network (RNN). In this study, we propose a NELE System by RNN-based noise cancellation and speech modification (RNC-SM), which introduce a noise cancellation function after speech modification. Compared with existing NELE systems, RNC-SM system effectively improves the objective speech intelligibility index (SII) scores and the subjective listening quality.
Similar content being viewed by others
References
Aicha AB (2017) Noise estimation for speech enhancement algorithms with post-smoothness processor incorporating global posterior SNR. Multimed Tools Appl 76(22):23661–23678
ANSI (1997) American national standard methods for calculation of the speech intelligibility index. American National Standard Institute Inc s3:5–1997
Ballou G (2015) Handbook for sound engineers Focal Press
Chen Z, Luo Y, Mesgarani N (2017) Deep attractor network for single-microphone speaker separation. In: IEEE international conference on acoustics, speech and signal processing, pp 246–250
Cooke M, King S, Garnier M, Aubanel V (2014) The listening talker: a review of human and algorithmic context-induced modifications of speech. Comput Speech Lang 28(2, SI):543–571
Deng L, Yu D (2014) Deep learning: methods and applications. Now Publishers, Inc, Delft
ETSI (2014) TS 103 224 (V1.2.1): Speech and multimedia Transmission Quality (STQ); A sound field reproduction method for terminal testing including a background noise databas. Standard, ETSI
ETSI (2015) EG 202 396-1 (V1.6.1): Speech processing, transmission and quality aspects (STQ); Speech quality performance in the presence of background noise; Part 1: Background noise simulation technique and background noise databas. Standard, ETSI
George NV, Panda G (2013) Advances in active noise control: a survey, with emphasis on recent nonlinear techniques. Signal Process 93(2):363–377
Han Y, Lee K (2016) Convolutional neural network with multiple-width frequency-delta data augmentation for acoustic scene classification. IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events
ITU-T P (1996) 800: Methods for subjective determination of transmission quality. International Telecommunication Union, Geneva
Jokinen E, Remes U, Alku P (2016) The use of read versus conversational lombard speech in spectral tilt modeling for intelligibility enhancement in near-end noise conditions. In: Proceedings of the 17th annual conference of the international speech communication association, pp 2771–2775
Jokinen E, Remes U, Alku P (2017) Intelligibility enhancement of telephone speech using gaussian process regression for normal-to-lombard spectral tilt conversion. IEEE/ACM Trans Audio Speech Language Process 25(10):1985–1996
Kakouros S, Rasanen O, Alku P (2017) Evaluation of spectral tilt measures for sentence prominence under different noise conditions. In: Proceedings of the annual conference of the international speech communication association, vol 2017, pp 3211–3215
Khademi S, Hendriks RC, Kleijn WB (2017) Intelligibility enhancement based on mutual information. IEEE/ACM Trans Audio Speech Language Process 25 (8):1694–1708
Kleijn WB, Crespo JB, Hendriks RC, Petkov P, Sauert B, Vary P (2015) Optimizing speech intelligibility in a noisy environment: a unified view. IEEE Signal Process Mag 32(2):43–54
Koutsogiannaki M, Francois H, Choo K, Oh E (2017) Real-time modulation enhancement of temporal envelopes for increasing speech intelligibility. In: Proceedings of the 18th annual conference of the international speech communication association, pp 1973–1977
Koutsogiannaki M, Stylianou Y (2014) Simple and artefact-free spectral modifications for enhancing the intelligibility of casual speech. In: IEEE international conference on acoustics, speech and signal processing, pp 4648–4652
Kuo SM, Morgan DR (1999) Active noise control: a tutorial review. Proc IEEE 87(6):943–973
Niederjohn R, Grotelueschen J (1978) Speech intelligibility enhancement in a power generating noise environment. IEEE Trans Acoust Speech Signal Process 26(4):378–380
Painter T, Spanias A (2000) Perceptual coding of digital audio. Proc IEEE 88(4):451–515
Petkov PN, Kleijn WB (2015) Spectral dynamics recovery for enhanced speech intelligibility in noise. IEEE/ACM Trans Audio Speech Language Process 23(2):327–338
Piczak KJ (2015) Environmental sound classification with convolutional neural networks. In: IEEE 25th international workshop on machine learning for signal processing, pp 1–6
Priyanka SS (2017) A review on adaptive beamforming techniques for speech enhancement. In: Innovations in power and advanced computing technologies, pp 1–6
Rao KR, Yip P (2014) Discrete cosine transform: algorithms, advantages, applications. Academic Press, Cambridge
Salamon J, Jacoby C, Bello JP (2014) A dataset and taxonomy for urban sound research. In: Proceedings of the 22nd ACM international conference on multimedia, pp 1041–1044
Sauert B, Vary P (2006) Near end listening enhancement: speech intelligibility improvement in noisy environments. In: IEEE international conference on acoustics speech and signal processing, vol 1, pp I–I
Spanias AS (1994) Speech coding: a tutorial review. Proc IEEE 82(10):1541–1582
Taal CH, Hendriks RC, Heusdens R (2014) Speech energy redistribution for intelligibility improvement in noise based on a perceptual distortion measure. Comput Speech Lang 28(4):858–872
Thomas IB, Niederjohn RJ (1968) Enhancement of speech intelligibility at high noise levels by filtering and clipping. J Audio Eng Soc 16(4):412–415
Varga A, Steeneken HJ (1993) Assessment for automatic speech recognition: II. NOISEX-92: a database and an experiment to study the effect of additive noise on speech recognition systems. Speech Comm 12(3):247–251
Wang D, Zhang X (2015) THCHS-30: a free chinese speech corpus. Computer Science
West NE, O’Shea T (2017) Deep architectures for modulation recognition. In: IEEE international symposium on dynamic spectrum access networks, pp 1–6
Yan C, Xie H, Chen J, Zha Z, Hao X, Zhang Y, Dai Q (2018) A fast uyghur text detector for complex background images. IEEE Trans Multimedia 20 (12):3389–3398. https://doi.org/10.1109/TMM.2018.2838320 ISSN=1520–9210
Yan C, Xie H, Liu S, Yin J, Zhang Y, Dai Q (2018) Effective uyghur language text detection in complex background images for traffic prompt identification. IEEE Trans Intell Transp Syst 19(1):220–229
Yan C, Xie H, Yang D, Yin J, Zhang Y, Dai Q (2018) Supervised hash coding with deep neural network for environment perception of intelligent vehicles. IEEE Trans Intell Transp Syst 19(1):284–295
Yan C, Zhang Y, Xu J, Dai F, Li L, Dai Q, Wu F (2014) A highly parallel framework for HEVC coding unit partitioning tree decision on many-core processors. IEEE Signal Process Lett 21(5):573–576
Yan C, Zhang Y, Xu J, Dai F, Zhang J, Dai Q, Wu F (2014) Efficient parallel framework for HEVC motion estimation on many-core processors. IEEE Trans Circuits Syst Video Technol 24(12):2077–2089
Yu D, Li J (2017) Recent progresses in deep learning based acoustic models. IEEE/CAA Journal of Automatica Sinica 4(3):396–409
Zorilă TC, Kandia V, Stylianou Y (2012) Speech-in-noise intelligibility improvement based on spectral shaping and dynamic range compression. In: Proceedings of the 13th annual conference of the international speech communication association, pp 634–637
Zorilă TC, Stylianou Y, Flanagan S, Moore BC (2017) Evaluation of near-end speech enhancement under equal-loudness constraint for listeners with normal-hearing and mild-to-moderate hearing loss. J Acoust Soc Am 141(1):189–196
Zorilă TC, Stylianou Y, Ishihara T, Akamine M (2016) Near and far field speech-in-noise intelligibility improvements based on a time-frequency energy reallocation approach. IEEE/ACM Trans Audio Speech, and Language Process 24(10):1808–1818
Acknowledgments
This work was supported by National Key R&D Program of China (No. 2017YFB1002803) and National Nature Science Foundation of China (No. 61801334, No. 61762005, No. U1736206).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Li, G., Hu, R., Wang, X. et al. A near-end listening enhancement system by RNN-based noise cancellation and speech modification. Multimed Tools Appl 78, 15483–15505 (2019). https://doi.org/10.1007/s11042-018-6947-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-018-6947-8