Abstract
Most of the restoration techniques for loss of voice result in whispered and monotonous speech. In addition to intelligibility, this type of speech is poor in expressiveness and naturalness due to a) the lack of pitch resulting in whispered speech, and b) artificial pitch production resulting in monotone speech. This research work offers a neural network method for estimating a fully voiced speech waveform from alaryngeal whispering speech waveform. In this research paper a speech enhancement method using Generative Adversarial Networks (GANs) is implemented. The aim of this GAN implementation to perform whispered-to-voiced speech conversion and to handle speech reconstruction tasks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ali, H., et al.: Sound classification of Parkinsonism for telediagnosis. Tech. J. 24(1), 90–97 (2019)
Berouti, M., Schwartz, R., Makhoul, J.: Enhancement of speech corrupted by acoustic noise. In: ICASS (1979)
Burk, B., Watts, C.: The effect of Parkinson disease tremor phenotype on cepstral peak prominence and transglottal airflow in vowels and speech. J. Voice 33(4), 580.e11-580.e19 (2019). https://doi.org/10.1016/j.jvoice.2018.01.016
Dendrinos, M., Bakamidis, S., Carayannis, G.: Speech enhancement from noise: a regenerative approach. Speech Commun. 10(1), 45–57 (1991)
Ephraim, Y.: Statistical-model-based speech enhancement systems. Proc. IEEE 80(10), 1526–1555 (1992)
Gaballah, A., et al.: Objective and subjective speech quality assessment of amplification devices for patients with Parkinson’s disease. IEEE Trans. Neural Syst. Rehabil. Eng. 27(1226–1235), 2019 (2019)
Gillivan-Murphy, P., Miller, N., Carding, P.: Voice tremor in Parkinson’s disease: an acoustic study. J. Voice 33(4), 526–535 (2019)
Gonzalez, J.A., et al.: Direct speech reconstruction from articulatory sensor data by machine learning. IEEE/ACM Trans. Audio Speech Lang. Process. 25(12), 2362–2374 (2017)
I. Goodfellow, J., et al.: Generative adversarial nets. In: NIPS 2014 (2014)
Gudepu, P.R.R., et al.: Whisper augmented end-to-end/hybrid speech recognition system - CycleGAN approach. In: INTERSPEECH 2020 (2020)
Jeancolas, L., et al.: Comparison of telephone recordings and professional microphone recordings for early detection of Parkinson's disease, using Mel-frequency cepstral coefficients with Gaussian mixture models. In: INTERSPEECH 2019 (2019)
Kumar, A., Florencio, D.: Speech enhancement in multiplenoise conditions using deep neural networks. In: INTERSPEECH 2016, pp. 3738–3742 (2016)
Lim, J., Oppenheim, A.: All-pole modeling of degraded speech. IEEE/ACM Trans. Audio Speech Lang. Process. 26(3), 197–210 (1978)
Loizou, P.C.: Speech quality assessment. In: Lin, W., Tao, D., Kacprzyk, J., Li, Z., Izquierdo, E., Wang, H. (eds.) Multimedia Analysis, Processing and Communications. Studies in Computational Intelligence, vol. 346 (2011)
Loizou, P.C.: Speech Enhancement: Theory and Practice, 2nd edn. CRC Press Inc., Boca Raton (2013)
Lu, X., Tsao, Y., Matsuda, S., Hori, C.: Speech enhancement based on deep denoising autoencoder. In: INTERSPEECH 2013, pp. 436–440(2013)
Maas, A.L., Le, Q.V., O’Neil, T.M., Vinyals, O., Nguyen, P., Ng, A.Y.: Recurrent neural networks for noise reduction in robust ASR. In: INTERSPEECH 2012, pp. 22–25 (2012)
Nakamura, K., Janke, M., Wand, M., Schultz, T.: Estimation of fundamental frequency from surface electromyographic data: emg-to-f 0. In: ICASSP 2011 (2011)
Nakamura, K., Toda, T., Saruwatari, H., Shikano, K.: Speakingaid systems using gmm-based voice conversion for electrolaryngeal speech. Speech Commun. 54(1), 134–146 (2012)
Niranjan, A., Sharma, M.C., Gutha, S.B.C., Shaik, M.A.B.: End-to-End Whisper to Natural Speech Conversion using Modified Transformer Network. ArXiv: 2004.09347v3 (2021)
Oung, Q.W., et al.: Empirical wavelet transform based features for classification of Parkinson’s disease severity. J. Med. Syst. 42(2), 1–17 (2018)
Paliwal, K., W’ojcicki, K., Shannon, B.: The importance of phase in speech enhancement. Speech Commun. 53(4), 465–494 (2021)
Pascual, S., Bonafonte1, A., Serra, J.: SEGAN: speech enhancement generative adversarial network. ArXiv: 1703:09452v3 (2017)
Pascual, S., Bonafonte1, A., Serra, J., Gonzalez, J.A.: Whispered-to-voiced Alaryngeal Speech Conversion with Generative Adversarial Networks. ArXiv: 1808: 10687v2 (2018)
Patel, M., Purohit, M., Shah, J., Patil, H.A.: CinC-GAN for effective F0 prediction for whisper-to-normal speech conversion. In: EUSIPCO 2020 (2020)
Perez, M., et al.: Classification of Huntington disease using acoustic and lexical features. In: INTERSPEECH 2018 (2018)
Sakar, O.C., et al.: A comparative analysis of speech signal processing algorithms for Parkinson’s disease classification and the use of the tunable Q-factor wavelet transform. Appl. Soft Comput. 74, 255–263 (2019)
Sharifzadeh, H.R., McLoughlin, I.V., Ahamdi, F.: Voiced speech from whispers for post-laryngectomised patients. Int. J. Comput. Sci. 36(4), 367–377 (2009)
Sharifzadeh, H.R., McLoughlin, I.V., Ahmadi, F.: Reconstruction of normal sounding speech for laryngectomy patients through a modified celp codec. IEEE Trans. Biomed. Eng. 57(10), 2448–2458 (2010)
Sharifzadeh, H.R.: Reconstruction of Natural Sounding Speech from Whispers. Nanyang Technological University, Singapore (2011)
Wang, D., Lim, J.: The unimportance of phase in speech enhancement. IEEE Trans. Acoust. Speech Signal Process. 30(4), 679–681 (1982)
Xu, Y., Du, J., Dai, L.-R., Lee, C.-H.: A regression approach to speech enhancement based on deep neural networks. IEEE/ACM Trans. Audio Speech Lang. Process. 23(1), 7–19 (2015)
Yang, L.-P., Fu, Q.-J.: Spectral subtraction-based speech enhancement for cochlear implant patients in background noise. J. Acoust. Soc. Am. 117(3), 1001–1004 (2005)
Yu, D., Deng, L., Droppo, J., Wu, J., Gong, Y., Acero, A.: A minimum-mean-square-error noise reduction algorithm on melfrequency cepstra for robust speech recognition. In: ICASSP 2008 (2008)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Huq, M., Maskeliunas, R. (2022). Speech Enhancement Using Generative Adversarial Network (GAN). In: Abraham, A., et al. Hybrid Intelligent Systems. HIS 2021. Lecture Notes in Networks and Systems, vol 420. Springer, Cham. https://doi.org/10.1007/978-3-030-96305-7_26
Download citation
DOI: https://doi.org/10.1007/978-3-030-96305-7_26
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-96304-0
Online ISBN: 978-3-030-96305-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)