Speech Enhancement Using Generative Adversarial Network (GAN)

Huq, Mahmudul; Maskeliunas, Rytis

doi:10.1007/978-3-030-96305-7_26

Mahmudul Huq¹⁶ &
Rytis Maskeliunas¹⁶

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 420))

Included in the following conference series:

International Conference on Hybrid Intelligent Systems

663 Accesses
1 Citations

Abstract

Most of the restoration techniques for loss of voice result in whispered and monotonous speech. In addition to intelligibility, this type of speech is poor in expressiveness and naturalness due to a) the lack of pitch resulting in whispered speech, and b) artificial pitch production resulting in monotone speech. This research work offers a neural network method for estimating a fully voiced speech waveform from alaryngeal whispering speech waveform. In this research paper a speech enhancement method using Generative Adversarial Networks (GANs) is implemented. The aim of this GAN implementation to perform whispered-to-voiced speech conversion and to handle speech reconstruction tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 219.00; Price excludes VAT (USA)

Softcover Book: USD 279.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ali, H., et al.: Sound classification of Parkinsonism for telediagnosis. Tech. J. 24(1), 90–97 (2019)
Google Scholar
Berouti, M., Schwartz, R., Makhoul, J.: Enhancement of speech corrupted by acoustic noise. In: ICASS (1979)
Google Scholar
Burk, B., Watts, C.: The effect of Parkinson disease tremor phenotype on cepstral peak prominence and transglottal airflow in vowels and speech. J. Voice 33(4), 580.e11-580.e19 (2019). https://doi.org/10.1016/j.jvoice.2018.01.016
Article Google Scholar
Dendrinos, M., Bakamidis, S., Carayannis, G.: Speech enhancement from noise: a regenerative approach. Speech Commun. 10(1), 45–57 (1991)
Article Google Scholar
Ephraim, Y.: Statistical-model-based speech enhancement systems. Proc. IEEE 80(10), 1526–1555 (1992)
Article Google Scholar
Gaballah, A., et al.: Objective and subjective speech quality assessment of amplification devices for patients with Parkinson’s disease. IEEE Trans. Neural Syst. Rehabil. Eng. 27(1226–1235), 2019 (2019)
Google Scholar
Gillivan-Murphy, P., Miller, N., Carding, P.: Voice tremor in Parkinson’s disease: an acoustic study. J. Voice 33(4), 526–535 (2019)
Article Google Scholar
Gonzalez, J.A., et al.: Direct speech reconstruction from articulatory sensor data by machine learning. IEEE/ACM Trans. Audio Speech Lang. Process. 25(12), 2362–2374 (2017)
Article Google Scholar
I. Goodfellow, J., et al.: Generative adversarial nets. In: NIPS 2014 (2014)
Google Scholar
Gudepu, P.R.R., et al.: Whisper augmented end-to-end/hybrid speech recognition system - CycleGAN approach. In: INTERSPEECH 2020 (2020)
Google Scholar
Jeancolas, L., et al.: Comparison of telephone recordings and professional microphone recordings for early detection of Parkinson's disease, using Mel-frequency cepstral coefficients with Gaussian mixture models. In: INTERSPEECH 2019 (2019)
Google Scholar
Kumar, A., Florencio, D.: Speech enhancement in multiplenoise conditions using deep neural networks. In: INTERSPEECH 2016, pp. 3738–3742 (2016)
Google Scholar
Lim, J., Oppenheim, A.: All-pole modeling of degraded speech. IEEE/ACM Trans. Audio Speech Lang. Process. 26(3), 197–210 (1978)
MATH Google Scholar
Loizou, P.C.: Speech quality assessment. In: Lin, W., Tao, D., Kacprzyk, J., Li, Z., Izquierdo, E., Wang, H. (eds.) Multimedia Analysis, Processing and Communications. Studies in Computational Intelligence, vol. 346 (2011)
Google Scholar
Loizou, P.C.: Speech Enhancement: Theory and Practice, 2nd edn. CRC Press Inc., Boca Raton (2013)
Book Google Scholar
Lu, X., Tsao, Y., Matsuda, S., Hori, C.: Speech enhancement based on deep denoising autoencoder. In: INTERSPEECH 2013, pp. 436–440(2013)
Google Scholar
Maas, A.L., Le, Q.V., O’Neil, T.M., Vinyals, O., Nguyen, P., Ng, A.Y.: Recurrent neural networks for noise reduction in robust ASR. In: INTERSPEECH 2012, pp. 22–25 (2012)
Google Scholar
Nakamura, K., Janke, M., Wand, M., Schultz, T.: Estimation of fundamental frequency from surface electromyographic data: emg-to-f 0. In: ICASSP 2011 (2011)
Google Scholar
Nakamura, K., Toda, T., Saruwatari, H., Shikano, K.: Speakingaid systems using gmm-based voice conversion for electrolaryngeal speech. Speech Commun. 54(1), 134–146 (2012)
Article Google Scholar
Niranjan, A., Sharma, M.C., Gutha, S.B.C., Shaik, M.A.B.: End-to-End Whisper to Natural Speech Conversion using Modified Transformer Network. ArXiv: 2004.09347v3 (2021)
Google Scholar
Oung, Q.W., et al.: Empirical wavelet transform based features for classification of Parkinson’s disease severity. J. Med. Syst. 42(2), 1–17 (2018)
Article Google Scholar
Paliwal, K., W’ojcicki, K., Shannon, B.: The importance of phase in speech enhancement. Speech Commun. 53(4), 465–494 (2021)
Article Google Scholar
Pascual, S., Bonafonte1, A., Serra, J.: SEGAN: speech enhancement generative adversarial network. ArXiv: 1703:09452v3 (2017)
Google Scholar
Pascual, S., Bonafonte1, A., Serra, J., Gonzalez, J.A.: Whispered-to-voiced Alaryngeal Speech Conversion with Generative Adversarial Networks. ArXiv: 1808: 10687v2 (2018)
Google Scholar
Patel, M., Purohit, M., Shah, J., Patil, H.A.: CinC-GAN for effective F0 prediction for whisper-to-normal speech conversion. In: EUSIPCO 2020 (2020)
Google Scholar
Perez, M., et al.: Classification of Huntington disease using acoustic and lexical features. In: INTERSPEECH 2018 (2018)
Google Scholar
Sakar, O.C., et al.: A comparative analysis of speech signal processing algorithms for Parkinson’s disease classification and the use of the tunable Q-factor wavelet transform. Appl. Soft Comput. 74, 255–263 (2019)
Article Google Scholar
Sharifzadeh, H.R., McLoughlin, I.V., Ahamdi, F.: Voiced speech from whispers for post-laryngectomised patients. Int. J. Comput. Sci. 36(4), 367–377 (2009)
Google Scholar
Sharifzadeh, H.R., McLoughlin, I.V., Ahmadi, F.: Reconstruction of normal sounding speech for laryngectomy patients through a modified celp codec. IEEE Trans. Biomed. Eng. 57(10), 2448–2458 (2010)
Article Google Scholar
Sharifzadeh, H.R.: Reconstruction of Natural Sounding Speech from Whispers. Nanyang Technological University, Singapore (2011)
Book Google Scholar
Wang, D., Lim, J.: The unimportance of phase in speech enhancement. IEEE Trans. Acoust. Speech Signal Process. 30(4), 679–681 (1982)
Article Google Scholar
Xu, Y., Du, J., Dai, L.-R., Lee, C.-H.: A regression approach to speech enhancement based on deep neural networks. IEEE/ACM Trans. Audio Speech Lang. Process. 23(1), 7–19 (2015)
Article Google Scholar
Yang, L.-P., Fu, Q.-J.: Spectral subtraction-based speech enhancement for cochlear implant patients in background noise. J. Acoust. Soc. Am. 117(3), 1001–1004 (2005)
Article Google Scholar
Yu, D., Deng, L., Droppo, J., Wu, J., Gong, Y., Acero, A.: A minimum-mean-square-error noise reduction algorithm on melfrequency cepstra for robust speech recognition. In: ICASSP 2008 (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Multimedia Engineering, Faculty of Informatics, Kaunas University of Technology, Kaunas, Lithuania
Mahmudul Huq & Rytis Maskeliunas

Authors

Mahmudul Huq
View author publications
You can also search for this author in PubMed Google Scholar
Rytis Maskeliunas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mahmudul Huq .

Editor information

Editors and Affiliations

Scientific Network for Innovation and Research Excellence, Machine Intelligence Research Labs (MIR Labs), Auburn, WA, USA
Ajith Abraham
Campus Centre de Créteil, Université Paris-Est Créteil, Créteil, France
Patrick Siarry
Department of Computer Science, Università degli Studi di Milano, Milan, Milano, Italy
Vincenzo Piuri
Scientific Network for Innovation and Research Excellence, Machine Intelligence Research Labs (MIR Labs), Auburn, WA, USA
Niketa Gandhi
University of Bari, Bari, Italy
Gabriella Casalino
Division of Graduate Studies and Research, Tijuana Institute of Technology, Tijuana, Mexico
Oscar Castillo
Ontario Tech University, Oshawa, ON, Canada
Patrick Hung

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Huq, M., Maskeliunas, R. (2022). Speech Enhancement Using Generative Adversarial Network (GAN). In: Abraham, A., et al. Hybrid Intelligent Systems. HIS 2021. Lecture Notes in Networks and Systems, vol 420. Springer, Cham. https://doi.org/10.1007/978-3-030-96305-7_26

Download citation

DOI: https://doi.org/10.1007/978-3-030-96305-7_26
Published: 04 March 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-96304-0
Online ISBN: 978-3-030-96305-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics