Whispered Speech Enhancement Based on Improved Mel Frequency Scale and Modified Compensated Phase Spectrum

Wei, Yi; Li, Chen; Li, Tianfeng; Zeng, Yumin

doi:10.1007/s00034-019-01164-4

Whispered Speech Enhancement Based on Improved Mel Frequency Scale and Modified Compensated Phase Spectrum

Published: 14 June 2019

Volume 38, pages 5839–5860, (2019)
Cite this article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Yi Wei¹,
Chen Li^1,2,3,4,
Tianfeng Li¹ &
…
Yumin Zeng¹

285 Accesses
2 Citations
Explore all metrics

Abstract

The whispered speech enhancement based on a novel improved Mel frequency scale is investigated in the proposed algorithm. The scale is derived from the characteristics of whispered speech. The whispered speech magnitude spectrum recombines with a changed phase spectrum in the process of synthesis rather than preserving the noisy whispered speech phase spectrum. The significance of phase correction is that the low-energy component of the new complex spectrum cancels more than the high-energy component, thus removing background noise as much as possible. Moreover, the noise estimation parameter in the compensated phase is obtained by a new method. This algorithm tries to find a trade-off mechanism between the whispered speech distortion, the noise reduction and the level of remnant music noise. The objective and subjective evaluations show that the proposed algorithm outperforms comparable whispered speech enhancement algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 6

Speech intelligibility enhancement: a hybrid wiener approach

Article 16 July 2020

Performance measurement of a hybrid speech enhancement technique

Article 15 March 2021

Iterative-processed multiband speech enhancement for suppressing musical sounds

Article 21 October 2023

References

S.E. Bou-Ghazale, J.H.L. Hansen, A comparative study of traditional and newly proposed features for recognition of speech under stress. IEEE Trans. Speech Audio Process. 8(4), 429–442 (2000)
Article Google Scholar
I. Eklund, H. Traunmüller, Comparative study of male and female whispered and phonated versions of the long vowels of Swedish. Phonetica 54(1), 1–21 (1997)
Article Google Scholar
A. Farmani, H.B. Bahar, Hardware implementation of 128-Bit AES image encryption with low power techniques on FPGA to VHDL. Majlesi J. Electr. Eng. 6(4), 13–22 (2012)
Google Scholar
A. Farmani, M. Jafari, S.S. Miremadi, A high performance hardware implementation image encryption with AES algorithm, in Third International Conference on Digital Image Processing (ICDIP 2011). International Society for Optics and Photonics, vol 8009 (2011) p. 800905
H. Fastl, E. Zwicker, Psychoacoustics; Fact and Models, 3rd edn. (Springer, Berlin, 2006)
Google Scholar
D.T. Grozdic, S.T. Jovicic, Whispered Speech recognition using deep denoising autoencoder and inverse filtering. IEEE ACM Trans. Audio Speech Lang. Process. (TASLP) 25(12), 2313–2322 (2017)
Article Google Scholar
W.W. Hung, H.C. Wang, On the use of weighted filter bank analysis for the derivation of robust MFCCs. Signal Process. Lett. IEEE 8(3), 70–73 (2001)
Article Google Scholar
T. Itoh, K. Takeda, F. Itakura, Acoustic analysis and recognition of whispered speech. IEEE Workshop on Automatic Speech Recognition and Understanding. ASRU’01 IEEE, (2001), pp. 429–432
T. Itoh, K. Takeda, F. Itakura, Acoustical analysis and recognition of whispered speech. Speech Commun. 45(2), 139–152 (2005)
Article Google Scholar
S.T. Jovičić, Formant feature differences between whispered and voiced sustained vowels. Acta Acust. united with Acust. 84(4), 739–743 (1998)
Google Scholar
K.J. Kallail, F.W. Emanuel, Formant-frequency differences between isolated whispered and phonated vowel samples produced by adult female subjects. J. Speech Lang. Hear. Res. 27(2), 245–251 (1984)
Article Google Scholar
S. Kamath, A Multi-Band Spectral Subtraction Method for Speech Enhancement. Master’s Thesis, University of Texas-Dallas, Department Electrical Engineering, (2001), pp. 34–36
S. Kamath, P. Loizou, A multi-band spectral subtraction method for enhancing speech corrupted by colored noise, in Proceedings International Conference Acoustic, Speech, Signal Processing, Orlando, USA, (2002)
M.S.E. Langarani, H. Veisi, H. Sameti, The effect of phase information in speech enhancement and speech recognition, in International Conference on Information Science, Signal Processing and Their Applications (2012, IEEE), pp. 1446–1447
X.L. Li, B.L. Xu, Formant comparison between whispered and voiced vowels in Mandarin. Acta Acust. united with Acust. 91(6), 1079–1085 (2005)
Google Scholar
X.L. Li, D. Hui, B.L. Xu, Entropy-based initial/final segmentation for Chinese whispered speech. Sheng xue Xue bao (ActaAcustica) 30(1), 69–75 (2005)
Google Scholar
J.J. Li, I.V. McLoughlin, L.R. Dai, Z.H. Ling, Whisper-to-speech conversion using restricted Boltzmann machine arrays. Electron. Lett. 50(24), 1781–1782 (2014)
Article Google Scholar
J.S. Lim, A.V. Oppenheim, Enhancement and bandwidth compression of noisy speech. Proc. IEEE 67(12), 1586–1604 (1979)
Article Google Scholar
W. Lin, L.L. Yang, B.L. Xu, Speaker recognition of Chinese whispered speech based on modified MFCC parameters. J. Nanjing Univ. (Nat. Sci.) 42(1), 54–62 (2006)
Google Scholar
P. Loizou, Speech Enhancement: Theory and Practice (CRC, Boca Raton, 2007)
Book Google Scholar
M. Matsuda, H. Kasuya, Acoustic nature of the whisper, in European Conference on Speech Communication and Technology. DBLP (1999)
G. N. Meenakshi, P. K. Ghosh, Whispered speech to neutral speech conversion using bidirectional LSTMs, in Proceedings of Interspeech, (2018), pp. 491–495
B.C.J. Moore, An Introduction to the Psychology of Hearing, 5th edn. (Academic Press, Cambridge, 2003), pp. 66–69
Google Scholar
R.W. Morris, Enhancement and Recognition of Whispered Speech (Georgia Institute of Technology, Georgia, 2003)
Google Scholar
R.W. Morris, M.A. Clements, Reconstruction of speech from whispers. Med. Eng. Phys. 24(7–8), 515–520 (2002)
Article Google Scholar
S. Pascual, A. Bonafonte, J. Serrà, J.A. Gonzalez, Whispered-to-voiced Alaryngeal Speech Conversion with Generative Adversarial Networks. arXiv preprint arXiv:1808.10687 (2018)
A.W. Rix, J.G. Beerends, M.P. Hollier, Perceptual evaluation of speech quality (PESQ)—a new method for speech quality assessment of telephone networks and codecs, in IEEE International Conference on Acoustics (IEEE, 2002)
M.F. Schwartz, Identification of speaker sex from isolated, whispered vowels. J. Acoust. Soc. Am. 44(6), 1736–1737 (1968)
Article Google Scholar
A.P. Stark, K.K. Wójcicki, J.G. Lyons, Noise driven short-time phase spectrum compensation procedure for speech enhancement, in Ninth Annual Conference of the International Speech Communication Association (2008)
J. Sun, Z. Tao, J.H. Gu, Research on whisper enhancement based on AD neural network. Comput. Eng. Appl. 43(29), 242–244 (2007)
Google Scholar
Z. Tao, H.M. Zhao, D. Wu, Ear speech enhancement based on modified Mel domain masking model and no speech probability. J. Acoust. 34(4), 370–377 (2009)
Google Scholar
Z. Tao, X.J. Zhang, H.M. Zhao, Noise reduction in whisper speech based on the auditory masking model, in International Conference on Information Networking and Automation (ICINA), IEEE, vol. 2, (2010), pp. V2-272–V2-277
V.C. Tartter, What’s in a whisper? J. Acoust. Soc. Am. 86(5), 1678–1683 (1989)
Article Google Scholar
V.C. Tartter, Identifiability of vowels and speakers from whispered syllables. Percept. Psychophys. 49(4), 365–372 (1991)
Article Google Scholar
K. Wójcicki, M. Milacic, A. Stark, Exploiting conjugate symmetry of the short-time Fourier spectrum for speech enhancement. IEEE Signal Process. Lett. 15, 461–464 (2008)
Article Google Scholar
W. Xie, Research on Single-Channel Whisper Enhancement Based on Multi-window Spectrum (Southeast University, Nanjing, 2011)
Google Scholar
L.L. Yang, W. Lin, B.L. Xu, Research on Chinese whispered isolated character recognition. Appl. Acoust. 25(3), 187–192 (2006)
Google Scholar
J. Zhou, Whisper intelligibility enhancement using a supervised learning approach. Circuits Syst. Signal Process. 31(6), 2061–2074 (2012)
Article MathSciNet Google Scholar
J. Zhou, R. Liang, L. Zhao, Unsupervised learning of phonemes of whispered speech in a noisy environment based on convolutive non-negative matrix factorization. Inf. Sci. 257(2), 115–126 (2014)
Article MathSciNet Google Scholar

Download references

Acknowledgements

This project was supported by the National Key Research and Development Program of China (Grant No. 2017YFB0503500), the Natural Science Foundation of Jiangsu Province (Grant No. BK20171031) and the Key Laboratory of Virtual Geographic Environment (Nanjing Normal University), Ministry of Education (Grant No. 2017VGE01).

Author information

Authors and Affiliations

School of Physics and Technology, Nanjing Normal University, Nanjing, 210023, China
Yi Wei, Chen Li, Tianfeng Li & Yumin Zeng
Key Laboratory of Virtual Geographic Environment (Nanjing Normal University), Ministry of Education, Nanjing, 210023, China
Chen Li
State Key Laboratory Cultivation Base of Geographical Environment Evolution (Jiangsu Province), Nanjing, 210023, China
Chen Li
Jiangsu Center for Collaborative Innovation in Geographical Information Resource Development and Application, Nanjing, 210023, China
Chen Li

Authors

Yi Wei
View author publications
You can also search for this author in PubMed Google Scholar
Chen Li
View author publications
You can also search for this author in PubMed Google Scholar
Tianfeng Li
View author publications
You can also search for this author in PubMed Google Scholar
Yumin Zeng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chen Li.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (MP4 18 kb)

Supplementary material 2 (MP4 33 kb)

Supplementary material 3 (MP4 19 kb)

Supplementary material 4 (MP4 31 kb)

Supplementary material 5 (MP4 18 kb)

Supplementary material 6 (MP4 18 kb)

Supplementary material 7 (MP4 18 kb)

Supplementary material 8 (MP4 18 kb)

Supplementary material 9 (MP4 33 kb)

Supplementary material 10 (MP4 33 kb)

Supplementary material 11 (MP4 33 kb)

Supplementary material 12 (MP4 33 kb)

Supplementary material 13 (MP4 33 kb)

Supplementary material 14 (MP4 33 kb)

Supplementary material 15 (MP4 33 kb)

Supplementary material 16 (MP4 33 kb)

Supplementary material 17 (MP4 19 kb)

Supplementary material 18 (MP4 19 kb)

Supplementary material 19 (MP4 19 kb)

Supplementary material 20 (MP4 19 kb)

Supplementary material 21 (MP4 31 kb)

Supplementary material 22 (MP4 31 kb)

Supplementary material 23 (MP4 31 kb)

Supplementary material 24 (MP4 31 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wei, Y., Li, C., Li, T. et al. Whispered Speech Enhancement Based on Improved Mel Frequency Scale and Modified Compensated Phase Spectrum. Circuits Syst Signal Process 38, 5839–5860 (2019). https://doi.org/10.1007/s00034-019-01164-4

Download citation

Received: 17 October 2018
Revised: 02 June 2019
Accepted: 04 June 2019
Published: 14 June 2019
Issue Date: December 2019
DOI: https://doi.org/10.1007/s00034-019-01164-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Whispered Speech Enhancement Based on Improved Mel Frequency Scale and Modified Compensated Phase Spectrum

Abstract

Access this article

Similar content being viewed by others

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Electronic supplementary material

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation