Skip to main content

Speech Enhancement Using Generative Adversarial Network (GAN)

  • Conference paper
  • First Online:
Hybrid Intelligent Systems (HIS 2021)

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 420))

Included in the following conference series:

Abstract

Most of the restoration techniques for loss of voice result in whispered and monotonous speech. In addition to intelligibility, this type of speech is poor in expressiveness and naturalness due to a) the lack of pitch resulting in whispered speech, and b) artificial pitch production resulting in monotone speech. This research work offers a neural network method for estimating a fully voiced speech waveform from alaryngeal whispering speech waveform. In this research paper a speech enhancement method using Generative Adversarial Networks (GANs) is implemented. The aim of this GAN implementation to perform whispered-to-voiced speech conversion and to handle speech reconstruction tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 219.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 279.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Ali, H., et al.: Sound classification of Parkinsonism for telediagnosis. Tech. J. 24(1), 90–97 (2019)

    Google Scholar 

  • Berouti, M., Schwartz, R., Makhoul, J.: Enhancement of speech corrupted by acoustic noise. In: ICASS (1979)

    Google Scholar 

  • Burk, B., Watts, C.: The effect of Parkinson disease tremor phenotype on cepstral peak prominence and transglottal airflow in vowels and speech. J. Voice 33(4), 580.e11-580.e19 (2019). https://doi.org/10.1016/j.jvoice.2018.01.016

    Article  Google Scholar 

  • Dendrinos, M., Bakamidis, S., Carayannis, G.: Speech enhancement from noise: a regenerative approach. Speech Commun. 10(1), 45–57 (1991)

    Article  Google Scholar 

  • Ephraim, Y.: Statistical-model-based speech enhancement systems. Proc. IEEE 80(10), 1526–1555 (1992)

    Article  Google Scholar 

  • Gaballah, A., et al.: Objective and subjective speech quality assessment of amplification devices for patients with Parkinson’s disease. IEEE Trans. Neural Syst. Rehabil. Eng. 27(1226–1235), 2019 (2019)

    Google Scholar 

  • Gillivan-Murphy, P., Miller, N., Carding, P.: Voice tremor in Parkinson’s disease: an acoustic study. J. Voice 33(4), 526–535 (2019)

    Article  Google Scholar 

  • Gonzalez, J.A., et al.: Direct speech reconstruction from articulatory sensor data by machine learning. IEEE/ACM Trans. Audio Speech Lang. Process. 25(12), 2362–2374 (2017)

    Article  Google Scholar 

  • I. Goodfellow, J., et al.: Generative adversarial nets. In: NIPS 2014 (2014)

    Google Scholar 

  • Gudepu, P.R.R., et al.: Whisper augmented end-to-end/hybrid speech recognition system - CycleGAN approach. In: INTERSPEECH 2020 (2020)

    Google Scholar 

  • Jeancolas, L., et al.: Comparison of telephone recordings and professional microphone recordings for early detection of Parkinson's disease, using Mel-frequency cepstral coefficients with Gaussian mixture models. In: INTERSPEECH 2019 (2019)

    Google Scholar 

  • Kumar, A., Florencio, D.: Speech enhancement in multiplenoise conditions using deep neural networks. In: INTERSPEECH 2016, pp. 3738–3742 (2016)

    Google Scholar 

  • Lim, J., Oppenheim, A.: All-pole modeling of degraded speech. IEEE/ACM Trans. Audio Speech Lang. Process. 26(3), 197–210 (1978)

    MATH  Google Scholar 

  • Loizou, P.C.: Speech quality assessment. In: Lin, W., Tao, D., Kacprzyk, J., Li, Z., Izquierdo, E., Wang, H. (eds.) Multimedia Analysis, Processing and Communications. Studies in Computational Intelligence, vol. 346 (2011)

    Google Scholar 

  • Loizou, P.C.: Speech Enhancement: Theory and Practice, 2nd edn. CRC Press Inc., Boca Raton (2013)

    Book  Google Scholar 

  • Lu, X., Tsao, Y., Matsuda, S., Hori, C.: Speech enhancement based on deep denoising autoencoder. In: INTERSPEECH 2013, pp. 436–440(2013)

    Google Scholar 

  • Maas, A.L., Le, Q.V., O’Neil, T.M., Vinyals, O., Nguyen, P., Ng, A.Y.: Recurrent neural networks for noise reduction in robust ASR. In: INTERSPEECH 2012, pp. 22–25 (2012)

    Google Scholar 

  • Nakamura, K., Janke, M., Wand, M., Schultz, T.: Estimation of fundamental frequency from surface electromyographic data: emg-to-f 0. In: ICASSP 2011 (2011)

    Google Scholar 

  • Nakamura, K., Toda, T., Saruwatari, H., Shikano, K.: Speakingaid systems using gmm-based voice conversion for electrolaryngeal speech. Speech Commun. 54(1), 134–146 (2012)

    Article  Google Scholar 

  • Niranjan, A., Sharma, M.C., Gutha, S.B.C., Shaik, M.A.B.: End-to-End Whisper to Natural Speech Conversion using Modified Transformer Network. ArXiv: 2004.09347v3 (2021)

    Google Scholar 

  • Oung, Q.W., et al.: Empirical wavelet transform based features for classification of Parkinson’s disease severity. J. Med. Syst. 42(2), 1–17 (2018)

    Article  Google Scholar 

  • Paliwal, K., W’ojcicki, K., Shannon, B.: The importance of phase in speech enhancement. Speech Commun. 53(4), 465–494 (2021)

    Article  Google Scholar 

  • Pascual, S., Bonafonte1, A., Serra, J.: SEGAN: speech enhancement generative adversarial network. ArXiv: 1703:09452v3 (2017)

    Google Scholar 

  • Pascual, S., Bonafonte1, A., Serra, J., Gonzalez, J.A.: Whispered-to-voiced Alaryngeal Speech Conversion with Generative Adversarial Networks. ArXiv: 1808: 10687v2 (2018)

    Google Scholar 

  • Patel, M., Purohit, M., Shah, J., Patil, H.A.: CinC-GAN for effective F0 prediction for whisper-to-normal speech conversion. In: EUSIPCO 2020 (2020)

    Google Scholar 

  • Perez, M., et al.: Classification of Huntington disease using acoustic and lexical features. In: INTERSPEECH 2018 (2018)

    Google Scholar 

  • Sakar, O.C., et al.: A comparative analysis of speech signal processing algorithms for Parkinson’s disease classification and the use of the tunable Q-factor wavelet transform. Appl. Soft Comput. 74, 255–263 (2019)

    Article  Google Scholar 

  • Sharifzadeh, H.R., McLoughlin, I.V., Ahamdi, F.: Voiced speech from whispers for post-laryngectomised patients. Int. J. Comput. Sci. 36(4), 367–377 (2009)

    Google Scholar 

  • Sharifzadeh, H.R., McLoughlin, I.V., Ahmadi, F.: Reconstruction of normal sounding speech for laryngectomy patients through a modified celp codec. IEEE Trans. Biomed. Eng. 57(10), 2448–2458 (2010)

    Article  Google Scholar 

  • Sharifzadeh, H.R.: Reconstruction of Natural Sounding Speech from Whispers. Nanyang Technological University, Singapore (2011)

    Book  Google Scholar 

  • Wang, D., Lim, J.: The unimportance of phase in speech enhancement. IEEE Trans. Acoust. Speech Signal Process. 30(4), 679–681 (1982)

    Article  Google Scholar 

  • Xu, Y., Du, J., Dai, L.-R., Lee, C.-H.: A regression approach to speech enhancement based on deep neural networks. IEEE/ACM Trans. Audio Speech Lang. Process. 23(1), 7–19 (2015)

    Article  Google Scholar 

  • Yang, L.-P., Fu, Q.-J.: Spectral subtraction-based speech enhancement for cochlear implant patients in background noise. J. Acoust. Soc. Am. 117(3), 1001–1004 (2005)

    Article  Google Scholar 

  • Yu, D., Deng, L., Droppo, J., Wu, J., Gong, Y., Acero, A.: A minimum-mean-square-error noise reduction algorithm on melfrequency cepstra for robust speech recognition. In: ICASSP 2008 (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mahmudul Huq .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Huq, M., Maskeliunas, R. (2022). Speech Enhancement Using Generative Adversarial Network (GAN). In: Abraham, A., et al. Hybrid Intelligent Systems. HIS 2021. Lecture Notes in Networks and Systems, vol 420. Springer, Cham. https://doi.org/10.1007/978-3-030-96305-7_26

Download citation

Publish with us

Policies and ethics