Measuring the Effect of Reverberation on Statistical Parametric Speech Synthesis

Coto-Jiménez, Marvin

doi:10.1007/978-3-030-41005-6_25

Marvin Coto-Jiménez ORCID: orcid.org/0000-0002-6833-9938⁸

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1087))

Included in the following conference series:

Latin American High Performance Computing Conference

604 Accesses

Abstract

Text-to-speech (TTS) synthesis is the technique of generating intelligible speech from a given text. The most recent techniques for TTS are based on machine learning, implementing systems which learn linguistic specifications and their corresponding parameters of the speech signal. Given the growing interest in implementing verbal communication systems in different devices, such as cell phones, car navigation system and personal assistants, it is important to use speech data from many sources. The speech recordings available for this purpose are not always generated with the best quality. For example, if an artificial voice is created from historical recordings, or a voice created from a person whom only a small set of recordings exists. In these cases, there is an additional challenge due to the adverse conditions in the data. Reverberation is one of the conditions that can be found in these cases, a product of the different trajectories that a speech signal can take in an environment before registering through a microphone. In the present work, we quantitatively explore the effect of different levels of reverberation on the quality of artificial voice generated with those references. The results show that the quality of the generated artificial speech is affected considerably with any level of reverberation. Thus, the application of algorithms for speech enhancement must be taken always into consideration before and after any process of TTS.

Supported by the University of Costa Rica.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Black, A.W.: Unit selection and emotional speech. In: Eighth European Conference on Speech Communication and Technology (2003)
Google Scholar
Coto-Jiménez, M.: Improving post-filtering of artificial speech using pre-trained LSTM neural networks. Biomimetics 4(2), 39 (2019)
Article Google Scholar
Coto-Jiménez, M., Goddard-Close, J.: LSTM deep neural networks postfiltering for enhancing synthetic voices. Int. J. Pattern Recognit Artif Intell. 32(01), 1860008 (2018)
Article MathSciNet Google Scholar
Holmes, W.: Speech Synthesis and Recognition. CRC Press, Boca Raton (2001)
Google Scholar
ITU-T, R.P.: 862.1: Mapping function for transforming P. 862 raw result scores to MOS-LQO. International Telecommunication Union, Geneva, Switzerland, November 2003 (2003)
Google Scholar
Karhila, R., Remes, U., Kurimo, M.: Noise in HMM-based speech synthesis adaptation: analysis, evaluation methods and experiments. IEEE J. Sel. Top. Signal Process. 8(2), 285–295 (2013)
Article Google Scholar
King, S.: Measuring a decade of progress in text-to-speech. Loquens 1(1), e006 (2014)
Article Google Scholar
Kominek, J., Black, A.W.: The CMU arctic speech databases. In: Fifth ISCA Workshop on Speech Synthesis (2004)
Google Scholar
Lee, J., Song, K., Noh, K., Park, T.J., Chang, J.H.: DNN based multi-speaker speech synthesis with temporal auxiliary speaker id embedding. In: 2019 International Conference on Electronics, Information, and Communication (ICEIC), pp. 1–4. IEEE (2019)
Google Scholar
Moreno Pimentel, J., et al.: Effects of noise on a speaker-adaptive statistical speech synthesis system (2014)
Google Scholar
Öztürk, M.G., Ulusoy, O., Demiroglu, C.: DNN-based speaker-adaptive postfiltering with limited adaptation data for statistical speech synthesis systems. In: ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7030–7034. IEEE (2019)
Google Scholar
Prenger, R., Valle, R., Catanzaro, B.: WaveGlow: a flow-based generative network for speech synthesis. In: ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3617–3621. IEEE (2019)
Google Scholar
Rix, A.W., Hollier, M.P., Hekstra, A.P., Beerends, J.G.: Perceptual evaluation of speech quality (PESQ) the new itu standard for end-to-end speech quality assessment Part I-time-delay compensation. J. Audio Eng. Soc. 50(10), 755–764 (2002)
Google Scholar
Stewart, R., Sandler, M.: Database of omnidirectional and B-format room impulse responses. In: 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 165–168. IEEE (2010)
Google Scholar
Tokuda, K., Nankaku, Y., Toda, T., Zen, H., Yamagishi, J., Oura, K.: Speech synthesis based on hidden Markov models. Proc. IEEE 101(5), 1234–1252 (2013)
Article Google Scholar
Tokuda, K., Zen, H., Black, A.W.: An HMM-based speech synthesis system applied to English. In: IEEE Speech Synthesis Workshop, pp. 227–230 (2002)
Google Scholar
Valentini-Botinhao, C., Wang, X., Takaki, S., Yamagishi, J.: Speech enhancement for a noise-robust text-to-speech synthesis system using deep recurrent neural networks. In: Interspeech, pp. 352–356 (2016)
Google Scholar
Valentini-Botinhao, C., Yamagishi, J.: Speech enhancement of noisy and reverberant speech for text-to-speech. IEEE/ACM Trans. Audio Speech Lang. Process. 26(8), 1420–1433 (2018)
Article Google Scholar
Valin, J.M., Skoglund, J.: LPCNet: improving neural speech synthesis through linear prediction. In: ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5891–5895. IEEE (2019)
Google Scholar
Wang, X., Lorenzo-Trueba, J., Takaki, S., Juvela, L., Yamagishi, J.: A comparison of recent waveform generation and acoustic modeling methods for neural-network-based speech synthesis. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4804–4808. IEEE (2018)
Google Scholar
Wang, X., Takaki, S., Yamagishi, J.: Investigating very deep highway networks for parametric speech synthesis. Speech Commun. 96, 1–9 (2018)
Article Google Scholar
Wen, J.Y., Gaubitch, N.D., Habets, E.A., Myatt, T., Naylor, P.A.: Evaluation of speech dereverberation algorithms using the MARDY database. In: Proceedings of the International Workshop Acoustic Echo Noise Control (IWAENC). Citeseer (2006)
Google Scholar
Zen, H., et al.: The HMM-based speech synthesis system (HTS) version 2.0. In: SSW, pp. 294–299. Citeseer (2007)
Google Scholar
Zen, H., et al.: Recent development of the HMM-based speech synthesis system (HTS) (2009)
Google Scholar

Download references

Acknowledgements

This work was supported by the University of Costa Rica (UCR), Project No. 322-B9-105.

Author information

Authors and Affiliations

PRIS-Lab, Escuela de Ingeniería Eléctrica, Universidad de Costa Rica, San Pedro, Costa Rica
Marvin Coto-Jiménez

Authors

Marvin Coto-Jiménez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marvin Coto-Jiménez .

Editor information

Editors and Affiliations

Costa Rica Institute of Technology, Cartago, Costa Rica
Juan Luis Crespo-Mariño
Costa Rica Institute of Technology, Cartago, Costa Rica
Esteban Meneses-Rojas

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Coto-Jiménez, M. (2020). Measuring the Effect of Reverberation on Statistical Parametric Speech Synthesis. In: Crespo-Mariño, J., Meneses-Rojas, E. (eds) High Performance Computing. CARLA 2019. Communications in Computer and Information Science, vol 1087. Springer, Cham. https://doi.org/10.1007/978-3-030-41005-6_25

Download citation

DOI: https://doi.org/10.1007/978-3-030-41005-6_25
Published: 12 February 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-41004-9
Online ISBN: 978-3-030-41005-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics