Abstract
Statistical parameters speech synthesis typically relies on context-dependent Hidden Markov Model (HMM) that is based on decision tree clustering. However, the shortcomings of clustering decision tree, restricted to a feature rigid subdivision model space, results in smooth speech parameters generated from HMM. In this paper, Deep Neural Network (DNN) is put forward to replace clustering decision tree, and we propose a post filter-parameter-based speech synthesis improvement algorithm. This method enhances the formant region of synthesized speech spectrum by selecting the most optimized filter parameter according to the flatness of spectrum. The experimental results show that DNN effectively can modify the deficiency of two smooth parameters. Furthermore, the improved post filter algorithm increases the naturalness of synthesized speech.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Dahl, G.E., Yu, D., Deng, L., et al.: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 20(1), 30–42 (2012)
Qian, Y., Fan, Y., Hu, W., et al.: On the training aspects of deep neural network (DNN) for parametric TTS synthesis. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3829–3833. IEEE (2014)
Ze, H., Senior, A., Schuster, M.: Statistical parametric speech synthesis using deep neural networks. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7962–7966. IEEE (2013)
Yoshimura, T., Tokuda, K., Masuko, T., et al.: Incorporating a mixed excitation model and postfilter into HMM-based text-to-speech synthesis. Syst. Comput. Jpn. 36(12), 43–50 (2005)
Takamichi, S., Toda, T., Neubig, G., et al.: A postfilter to modify the modulation spectrum in HMM-based speech synthesis. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 290–294. IEEE (2014)
Ling, Z.H., Wu, Y.J., Wang, Y.P., et al.: USTC system for Blizzard challenge 2006 an improved HMM-based speech synthesis method. In: Blizzard Challenge Workshop (2006)
Deng, L.: Analysis of Deep Learning. Publishing House of Electronics Industry (2016)
Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Cogn. Model. 5(3), 1 (1988)
Krogh, A., Hertz, J.A.: A simple weight decay can improve generalization. In: NIPS, vol. 4, pp. 950–957 (1991)
Grancharov, V., Samuelsson, J., Kleijn, W.B.: Distortion measures for vector quantization of noisy spectrum. In: INTERSPEECH 2005 - Eurospeech, European Conference on Speech Communication and Technology, Lisbon, Portugal, September, DBLP, pp. 3173–3176 (2005)
Grancharov, V., Plasberg, J.H., Samuelsson, J., et al.: Generalized postfilter for speech quality enhancement. IEEE Trans. Audio Speech Lang. Process. 16(1), 57–64 (2008)
Koishida, K., Tokuda, K., Kobayashi, T., et al.: CELP coding based on mel-cepstral analysis. In: International Conference on Acoustics, Speech, and Signal Processing, vol.1, 33–36. IEEE (1995)
Ge, Y.K.: Postfilter Parameter Adapted Speech Synthesis Modified Agorithim. Advance publish house (2015)
Kominek, J., Black, A.W.: The CMU Arctic speech databases. In: Fifth ISCA Workshop on Speech Synthesis (2004)
Fan, Y., Qian, Y., Soong, F.K., et al.: Multi-speaker modeling and speaker adaptation for DNN-based TTS synthesis. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4475–4479. IEEE (2015)
Acknowledgements
This paper is supported by the URTP project of School of SME, the demonstration course project of Xidian University, and the Ministry of Education cooperation collaborative education project.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Dong, S., Li, C., Zhang, H. (2020). An Improved Speech Synthesis Algorithm with Post filter Parameters Based on Deep Neural Network. In: Liang, Q., Liu, X., Na, Z., Wang, W., Mu, J., Zhang, B. (eds) Communications, Signal Processing, and Systems. CSPS 2018. Lecture Notes in Electrical Engineering, vol 517. Springer, Singapore. https://doi.org/10.1007/978-981-13-6508-9_30
Download citation
DOI: https://doi.org/10.1007/978-981-13-6508-9_30
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-6507-2
Online ISBN: 978-981-13-6508-9
eBook Packages: EngineeringEngineering (R0)