Skip to main content
Log in

A near-end listening enhancement system by RNN-based noise cancellation and speech modification

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

When people listen to the phone in noisy environments, near-end listening enhancement (NELE) is a technology to enhance speech intelligibility against environmental noise. The complex environments in mobile communications have inspired many scholars to engage in NELE researches. Although they have proposed a lot of NELE systems, they only focus on the speech modification to enhance the intelligibility. Few scholars have attempted to further enhance the intelligibility by noise cancellation. Because traditional noise cancellation is based on adaptive filtering. If the adaptive filtering is used in the most common handset mode, the noise cancellation result will be poor because of inadequate feedback caused by the feedback microphone exposed to complex environments. With the booming of the deep neural network (DNN), DNN is able to predict noise signals for noise cancellation without the feedback microphone, especially for recurrent neural network (RNN). In this study, we propose a NELE System by RNN-based noise cancellation and speech modification (RNC-SM), which introduce a noise cancellation function after speech modification. Compared with existing NELE systems, RNC-SM system effectively improves the objective speech intelligibility index (SII) scores and the subjective listening quality.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Aicha AB (2017) Noise estimation for speech enhancement algorithms with post-smoothness processor incorporating global posterior SNR. Multimed Tools Appl 76(22):23661–23678

    Article  Google Scholar 

  2. ANSI (1997) American national standard methods for calculation of the speech intelligibility index. American National Standard Institute Inc s3:5–1997

    Google Scholar 

  3. Ballou G (2015) Handbook for sound engineers Focal Press

  4. Chen Z, Luo Y, Mesgarani N (2017) Deep attractor network for single-microphone speaker separation. In: IEEE international conference on acoustics, speech and signal processing, pp 246–250

  5. Cooke M, King S, Garnier M, Aubanel V (2014) The listening talker: a review of human and algorithmic context-induced modifications of speech. Comput Speech Lang 28(2, SI):543–571

    Article  Google Scholar 

  6. Deng L, Yu D (2014) Deep learning: methods and applications. Now Publishers, Inc, Delft

    MATH  Google Scholar 

  7. ETSI (2014) TS 103 224 (V1.2.1): Speech and multimedia Transmission Quality (STQ); A sound field reproduction method for terminal testing including a background noise databas. Standard, ETSI

  8. ETSI (2015) EG 202 396-1 (V1.6.1): Speech processing, transmission and quality aspects (STQ); Speech quality performance in the presence of background noise; Part 1: Background noise simulation technique and background noise databas. Standard, ETSI

  9. George NV, Panda G (2013) Advances in active noise control: a survey, with emphasis on recent nonlinear techniques. Signal Process 93(2):363–377

    Article  Google Scholar 

  10. Han Y, Lee K (2016) Convolutional neural network with multiple-width frequency-delta data augmentation for acoustic scene classification. IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events

  11. ITU-T P (1996) 800: Methods for subjective determination of transmission quality. International Telecommunication Union, Geneva

    Google Scholar 

  12. Jokinen E, Remes U, Alku P (2016) The use of read versus conversational lombard speech in spectral tilt modeling for intelligibility enhancement in near-end noise conditions. In: Proceedings of the 17th annual conference of the international speech communication association, pp 2771–2775

  13. Jokinen E, Remes U, Alku P (2017) Intelligibility enhancement of telephone speech using gaussian process regression for normal-to-lombard spectral tilt conversion. IEEE/ACM Trans Audio Speech Language Process 25(10):1985–1996

    Article  Google Scholar 

  14. Kakouros S, Rasanen O, Alku P (2017) Evaluation of spectral tilt measures for sentence prominence under different noise conditions. In: Proceedings of the annual conference of the international speech communication association, vol 2017, pp 3211–3215

  15. Khademi S, Hendriks RC, Kleijn WB (2017) Intelligibility enhancement based on mutual information. IEEE/ACM Trans Audio Speech Language Process 25 (8):1694–1708

    Article  Google Scholar 

  16. Kleijn WB, Crespo JB, Hendriks RC, Petkov P, Sauert B, Vary P (2015) Optimizing speech intelligibility in a noisy environment: a unified view. IEEE Signal Process Mag 32(2):43–54

    Article  Google Scholar 

  17. Koutsogiannaki M, Francois H, Choo K, Oh E (2017) Real-time modulation enhancement of temporal envelopes for increasing speech intelligibility. In: Proceedings of the 18th annual conference of the international speech communication association, pp 1973–1977

  18. Koutsogiannaki M, Stylianou Y (2014) Simple and artefact-free spectral modifications for enhancing the intelligibility of casual speech. In: IEEE international conference on acoustics, speech and signal processing, pp 4648–4652

  19. Kuo SM, Morgan DR (1999) Active noise control: a tutorial review. Proc IEEE 87(6):943–973

    Article  Google Scholar 

  20. Niederjohn R, Grotelueschen J (1978) Speech intelligibility enhancement in a power generating noise environment. IEEE Trans Acoust Speech Signal Process 26(4):378–380

    Article  Google Scholar 

  21. Painter T, Spanias A (2000) Perceptual coding of digital audio. Proc IEEE 88(4):451–515

    Article  Google Scholar 

  22. Petkov PN, Kleijn WB (2015) Spectral dynamics recovery for enhanced speech intelligibility in noise. IEEE/ACM Trans Audio Speech Language Process 23(2):327–338

    Article  Google Scholar 

  23. Piczak KJ (2015) Environmental sound classification with convolutional neural networks. In: IEEE 25th international workshop on machine learning for signal processing, pp 1–6

  24. Priyanka SS (2017) A review on adaptive beamforming techniques for speech enhancement. In: Innovations in power and advanced computing technologies, pp 1–6

  25. Rao KR, Yip P (2014) Discrete cosine transform: algorithms, advantages, applications. Academic Press, Cambridge

    MATH  Google Scholar 

  26. Salamon J, Jacoby C, Bello JP (2014) A dataset and taxonomy for urban sound research. In: Proceedings of the 22nd ACM international conference on multimedia, pp 1041–1044

  27. Sauert B, Vary P (2006) Near end listening enhancement: speech intelligibility improvement in noisy environments. In: IEEE international conference on acoustics speech and signal processing, vol 1, pp I–I

  28. Spanias AS (1994) Speech coding: a tutorial review. Proc IEEE 82(10):1541–1582

    Article  Google Scholar 

  29. Taal CH, Hendriks RC, Heusdens R (2014) Speech energy redistribution for intelligibility improvement in noise based on a perceptual distortion measure. Comput Speech Lang 28(4):858–872

    Article  Google Scholar 

  30. Thomas IB, Niederjohn RJ (1968) Enhancement of speech intelligibility at high noise levels by filtering and clipping. J Audio Eng Soc 16(4):412–415

    Google Scholar 

  31. Varga A, Steeneken HJ (1993) Assessment for automatic speech recognition: II. NOISEX-92: a database and an experiment to study the effect of additive noise on speech recognition systems. Speech Comm 12(3):247–251

    Article  Google Scholar 

  32. Wang D, Zhang X (2015) THCHS-30: a free chinese speech corpus. Computer Science

  33. West NE, O’Shea T (2017) Deep architectures for modulation recognition. In: IEEE international symposium on dynamic spectrum access networks, pp 1–6

  34. Yan C, Xie H, Chen J, Zha Z, Hao X, Zhang Y, Dai Q (2018) A fast uyghur text detector for complex background images. IEEE Trans Multimedia 20 (12):3389–3398. https://doi.org/10.1109/TMM.2018.2838320 ISSN=1520–9210

    Article  Google Scholar 

  35. Yan C, Xie H, Liu S, Yin J, Zhang Y, Dai Q (2018) Effective uyghur language text detection in complex background images for traffic prompt identification. IEEE Trans Intell Transp Syst 19(1):220–229

    Article  Google Scholar 

  36. Yan C, Xie H, Yang D, Yin J, Zhang Y, Dai Q (2018) Supervised hash coding with deep neural network for environment perception of intelligent vehicles. IEEE Trans Intell Transp Syst 19(1):284–295

    Article  Google Scholar 

  37. Yan C, Zhang Y, Xu J, Dai F, Li L, Dai Q, Wu F (2014) A highly parallel framework for HEVC coding unit partitioning tree decision on many-core processors. IEEE Signal Process Lett 21(5):573–576

    Article  Google Scholar 

  38. Yan C, Zhang Y, Xu J, Dai F, Zhang J, Dai Q, Wu F (2014) Efficient parallel framework for HEVC motion estimation on many-core processors. IEEE Trans Circuits Syst Video Technol 24(12):2077–2089

    Article  Google Scholar 

  39. Yu D, Li J (2017) Recent progresses in deep learning based acoustic models. IEEE/CAA Journal of Automatica Sinica 4(3):396–409

    Article  Google Scholar 

  40. Zorilă TC, Kandia V, Stylianou Y (2012) Speech-in-noise intelligibility improvement based on spectral shaping and dynamic range compression. In: Proceedings of the 13th annual conference of the international speech communication association, pp 634–637

  41. Zorilă TC, Stylianou Y, Flanagan S, Moore BC (2017) Evaluation of near-end speech enhancement under equal-loudness constraint for listeners with normal-hearing and mild-to-moderate hearing loss. J Acoust Soc Am 141(1):189–196

    Article  Google Scholar 

  42. Zorilă TC, Stylianou Y, Ishihara T, Akamine M (2016) Near and far field speech-in-noise intelligibility improvements based on a time-frequency energy reallocation approach. IEEE/ACM Trans Audio Speech, and Language Process 24(10):1808–1818

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported by National Key R&D Program of China (No. 2017YFB1002803) and National Nature Science Foundation of China (No. 61801334, No. 61762005, No. U1736206).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ruimin Hu.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, G., Hu, R., Wang, X. et al. A near-end listening enhancement system by RNN-based noise cancellation and speech modification. Multimed Tools Appl 78, 15483–15505 (2019). https://doi.org/10.1007/s11042-018-6947-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-018-6947-8

Keywords

Navigation