Skip to main content
Log in

Whispered Speech Enhancement Based on Improved Mel Frequency Scale and Modified Compensated Phase Spectrum

  • Published:
Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Abstract

The whispered speech enhancement based on a novel improved Mel frequency scale is investigated in the proposed algorithm. The scale is derived from the characteristics of whispered speech. The whispered speech magnitude spectrum recombines with a changed phase spectrum in the process of synthesis rather than preserving the noisy whispered speech phase spectrum. The significance of phase correction is that the low-energy component of the new complex spectrum cancels more than the high-energy component, thus removing background noise as much as possible. Moreover, the noise estimation parameter in the compensated phase is obtained by a new method. This algorithm tries to find a trade-off mechanism between the whispered speech distortion, the noise reduction and the level of remnant music noise. The objective and subjective evaluations show that the proposed algorithm outperforms comparable whispered speech enhancement algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. S.E. Bou-Ghazale, J.H.L. Hansen, A comparative study of traditional and newly proposed features for recognition of speech under stress. IEEE Trans. Speech Audio Process. 8(4), 429–442 (2000)

    Article  Google Scholar 

  2. I. Eklund, H. Traunmüller, Comparative study of male and female whispered and phonated versions of the long vowels of Swedish. Phonetica 54(1), 1–21 (1997)

    Article  Google Scholar 

  3. A. Farmani, H.B. Bahar, Hardware implementation of 128-Bit AES image encryption with low power techniques on FPGA to VHDL. Majlesi J. Electr. Eng. 6(4), 13–22 (2012)

    Google Scholar 

  4. A. Farmani, M. Jafari, S.S. Miremadi, A high performance hardware implementation image encryption with AES algorithm, in Third International Conference on Digital Image Processing (ICDIP 2011). International Society for Optics and Photonics, vol 8009 (2011) p. 800905

  5. H. Fastl, E. Zwicker, Psychoacoustics; Fact and Models, 3rd edn. (Springer, Berlin, 2006)

    Google Scholar 

  6. D.T. Grozdic, S.T. Jovicic, Whispered Speech recognition using deep denoising autoencoder and inverse filtering. IEEE ACM Trans. Audio Speech Lang. Process. (TASLP) 25(12), 2313–2322 (2017)

    Article  Google Scholar 

  7. W.W. Hung, H.C. Wang, On the use of weighted filter bank analysis for the derivation of robust MFCCs. Signal Process. Lett. IEEE 8(3), 70–73 (2001)

    Article  Google Scholar 

  8. T. Itoh, K. Takeda, F. Itakura, Acoustic analysis and recognition of whispered speech. IEEE Workshop on Automatic Speech Recognition and Understanding. ASRU’01 IEEE, (2001), pp. 429–432

  9. T. Itoh, K. Takeda, F. Itakura, Acoustical analysis and recognition of whispered speech. Speech Commun. 45(2), 139–152 (2005)

    Article  Google Scholar 

  10. S.T. Jovičić, Formant feature differences between whispered and voiced sustained vowels. Acta Acust. united with Acust. 84(4), 739–743 (1998)

    Google Scholar 

  11. K.J. Kallail, F.W. Emanuel, Formant-frequency differences between isolated whispered and phonated vowel samples produced by adult female subjects. J. Speech Lang. Hear. Res. 27(2), 245–251 (1984)

    Article  Google Scholar 

  12. S. Kamath, A Multi-Band Spectral Subtraction Method for Speech Enhancement. Master’s Thesis, University of Texas-Dallas, Department Electrical Engineering, (2001), pp. 34–36

  13. S. Kamath, P. Loizou, A multi-band spectral subtraction method for enhancing speech corrupted by colored noise, in Proceedings International Conference Acoustic, Speech, Signal Processing, Orlando, USA, (2002)

  14. M.S.E. Langarani, H. Veisi, H. Sameti, The effect of phase information in speech enhancement and speech recognition, in International Conference on Information Science, Signal Processing and Their Applications (2012, IEEE), pp. 1446–1447

  15. X.L. Li, B.L. Xu, Formant comparison between whispered and voiced vowels in Mandarin. Acta Acust. united with Acust. 91(6), 1079–1085 (2005)

    Google Scholar 

  16. X.L. Li, D. Hui, B.L. Xu, Entropy-based initial/final segmentation for Chinese whispered speech. Sheng xue Xue bao (ActaAcustica) 30(1), 69–75 (2005)

    Google Scholar 

  17. J.J. Li, I.V. McLoughlin, L.R. Dai, Z.H. Ling, Whisper-to-speech conversion using restricted Boltzmann machine arrays. Electron. Lett. 50(24), 1781–1782 (2014)

    Article  Google Scholar 

  18. J.S. Lim, A.V. Oppenheim, Enhancement and bandwidth compression of noisy speech. Proc. IEEE 67(12), 1586–1604 (1979)

    Article  Google Scholar 

  19. W. Lin, L.L. Yang, B.L. Xu, Speaker recognition of Chinese whispered speech based on modified MFCC parameters. J. Nanjing Univ. (Nat. Sci.) 42(1), 54–62 (2006)

    Google Scholar 

  20. P. Loizou, Speech Enhancement: Theory and Practice (CRC, Boca Raton, 2007)

    Book  Google Scholar 

  21. M. Matsuda, H. Kasuya, Acoustic nature of the whisper, in European Conference on Speech Communication and Technology. DBLP (1999)

  22. G. N. Meenakshi, P. K. Ghosh, Whispered speech to neutral speech conversion using bidirectional LSTMs, in Proceedings of Interspeech, (2018), pp. 491–495

  23. B.C.J. Moore, An Introduction to the Psychology of Hearing, 5th edn. (Academic Press, Cambridge, 2003), pp. 66–69

    Google Scholar 

  24. R.W. Morris, Enhancement and Recognition of Whispered Speech (Georgia Institute of Technology, Georgia, 2003)

    Google Scholar 

  25. R.W. Morris, M.A. Clements, Reconstruction of speech from whispers. Med. Eng. Phys. 24(7–8), 515–520 (2002)

    Article  Google Scholar 

  26. S. Pascual, A. Bonafonte, J. Serrà, J.A. Gonzalez, Whispered-to-voiced Alaryngeal Speech Conversion with Generative Adversarial Networks. arXiv preprint arXiv:1808.10687 (2018)

  27. A.W. Rix, J.G. Beerends, M.P. Hollier, Perceptual evaluation of speech quality (PESQ)—a new method for speech quality assessment of telephone networks and codecs, in IEEE International Conference on Acoustics (IEEE, 2002)

  28. M.F. Schwartz, Identification of speaker sex from isolated, whispered vowels. J. Acoust. Soc. Am. 44(6), 1736–1737 (1968)

    Article  Google Scholar 

  29. A.P. Stark, K.K. Wójcicki, J.G. Lyons, Noise driven short-time phase spectrum compensation procedure for speech enhancement, in Ninth Annual Conference of the International Speech Communication Association (2008)

  30. J. Sun, Z. Tao, J.H. Gu, Research on whisper enhancement based on AD neural network. Comput. Eng. Appl. 43(29), 242–244 (2007)

    Google Scholar 

  31. Z. Tao, H.M. Zhao, D. Wu, Ear speech enhancement based on modified Mel domain masking model and no speech probability. J. Acoust. 34(4), 370–377 (2009)

    Google Scholar 

  32. Z. Tao, X.J. Zhang, H.M. Zhao, Noise reduction in whisper speech based on the auditory masking model, in International Conference on Information Networking and Automation (ICINA), IEEE, vol. 2, (2010), pp. V2-272–V2-277

  33. V.C. Tartter, What’s in a whisper? J. Acoust. Soc. Am. 86(5), 1678–1683 (1989)

    Article  Google Scholar 

  34. V.C. Tartter, Identifiability of vowels and speakers from whispered syllables. Percept. Psychophys. 49(4), 365–372 (1991)

    Article  Google Scholar 

  35. K. Wójcicki, M. Milacic, A. Stark, Exploiting conjugate symmetry of the short-time Fourier spectrum for speech enhancement. IEEE Signal Process. Lett. 15, 461–464 (2008)

    Article  Google Scholar 

  36. W. Xie, Research on Single-Channel Whisper Enhancement Based on Multi-window Spectrum (Southeast University, Nanjing, 2011)

    Google Scholar 

  37. L.L. Yang, W. Lin, B.L. Xu, Research on Chinese whispered isolated character recognition. Appl. Acoust. 25(3), 187–192 (2006)

    Google Scholar 

  38. J. Zhou, Whisper intelligibility enhancement using a supervised learning approach. Circuits Syst. Signal Process. 31(6), 2061–2074 (2012)

    Article  MathSciNet  Google Scholar 

  39. J. Zhou, R. Liang, L. Zhao, Unsupervised learning of phonemes of whispered speech in a noisy environment based on convolutive non-negative matrix factorization. Inf. Sci. 257(2), 115–126 (2014)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

This project was supported by the National Key Research and Development Program of China (Grant No. 2017YFB0503500), the Natural Science Foundation of Jiangsu Province (Grant No. BK20171031) and the Key Laboratory of Virtual Geographic Environment (Nanjing Normal University), Ministry of Education (Grant No. 2017VGE01).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chen Li.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (MP4 18 kb)

Supplementary material 2 (MP4 33 kb)

Supplementary material 3 (MP4 19 kb)

Supplementary material 4 (MP4 31 kb)

Supplementary material 5 (MP4 18 kb)

Supplementary material 6 (MP4 18 kb)

Supplementary material 7 (MP4 18 kb)

Supplementary material 8 (MP4 18 kb)

Supplementary material 9 (MP4 33 kb)

Supplementary material 10 (MP4 33 kb)

Supplementary material 11 (MP4 33 kb)

Supplementary material 12 (MP4 33 kb)

Supplementary material 13 (MP4 33 kb)

Supplementary material 14 (MP4 33 kb)

Supplementary material 15 (MP4 33 kb)

Supplementary material 16 (MP4 33 kb)

Supplementary material 17 (MP4 19 kb)

Supplementary material 18 (MP4 19 kb)

Supplementary material 19 (MP4 19 kb)

Supplementary material 20 (MP4 19 kb)

Supplementary material 21 (MP4 31 kb)

Supplementary material 22 (MP4 31 kb)

Supplementary material 23 (MP4 31 kb)

Supplementary material 24 (MP4 31 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wei, Y., Li, C., Li, T. et al. Whispered Speech Enhancement Based on Improved Mel Frequency Scale and Modified Compensated Phase Spectrum. Circuits Syst Signal Process 38, 5839–5860 (2019). https://doi.org/10.1007/s00034-019-01164-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00034-019-01164-4

Keywords

Navigation