Modifying Spectral Envelope to Synthetically Adjust Voice Quality and Articulation Parameters for Emotional Speech Synthesis

Shao, Yanqiu; Wang, Zhuoran; Han, Jiqing; Liu, Ting

doi:10.1007/11573548_43

Yanqiu Shao¹⁹,
Zhuoran Wang¹⁹,
Jiqing Han¹⁹ &
…
Ting Liu¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3784))

Included in the following conference series:

International Conference on Affective Computing and Intelligent Interaction

5029 Accesses
3 Citations

Abstract

Both of the prosody and spectral features are important for emotional speech synthesis. Besides prosody effects, voice quality and articulation parameters are the factors that should be considered to modify in emotional speech synthetic systems. Generally, rules and filters are designed to process these parameters respectively. This paper proves that by modifying spectral envelope, the voice quality and articulation could be adjusted as a whole. Thus, it will not need to modify each of the parameter separately depending on rules. Accordingly, it will make the synthetic system more flexible by designing an automatic spectral envelope model based on some machine learning methods. The perception test in this paper also shows that when prosody and spectral features are all modified, the best emotional synthetic speech will be obtained.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Cahn, J.E.: Generating expression in synthesized speech. Master’s thesis, Massachusetts Institute of Technology (1989)
Google Scholar
Murray, I.R., Arnott, J.L.: Implementation and testing of a system for producing emotion-by-rule in synthetic speech. Speech Communication 16, 369–390 (1995)
Article Google Scholar
Rank, E., Pirker, H.: Generating emotional speech with a concatenative synthesizer. In: Proceedings, ICSLP 1998, Sydney, Australia, vol. 3, pp. 671–674 (1998)
Google Scholar
Iida, A., Campbell, N., Iga, S., Higuchi, F., Yasumura, M.: A Speech synthesis system with emotion for assisting communication. In: Proceedings of ISCA Workshop (ITRW) on Speech and Emotion, Newcastle, Northern Ireland, pp. 167–172 (2000)
Google Scholar
Nagasaki, Y., Komatsu, T.: Can people perceive different emotions from a non-emotional voice by modifying its F0 and duration? In: Proceedings of Speech Prosody 2004, Nara, Japan (2004)
Google Scholar
Gobl, C., Bennett, E., Ní, C.A.: Expressive synthesis: How crucial is voice quality? In: Proceedings of IEEE Workshop on Speech Synthesis, Santa, Monica (2002)
Google Scholar
Moriyama, T., Ozawa, S.: Emotion recognition and synthesis system on speech. In: IEEE ICMCS 1999 (1999)
Google Scholar
Hawkins, S., Stevens, K.: Acoustic and perceptual correlates of the non-nasal nasal distinction for vowels. Journal of the Acoustical Society of America 77, 1560–1575 (1985)
Article Google Scholar
Klatt, D., Klatt, L.: Analysis, synthesis, and perception of voice quality variations among female and male talkers. Journal of the Acoustical Society of America 87, 820–857 (1990)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
Yanqiu Shao, Zhuoran Wang, Jiqing Han & Ting Liu

Authors

Yanqiu Shao
View author publications
You can also search for this author in PubMed Google Scholar
Zhuoran Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jiqing Han
View author publications
You can also search for this author in PubMed Google Scholar
Ting Liu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences,
Jianhua Tao
National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China
Tieniu Tan
MIT Media Laboratory, 20 Ames Street, 02139, Cambridge, MA, USA
Rosalind W. Picard

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shao, Y., Wang, Z., Han, J., Liu, T. (2005). Modifying Spectral Envelope to Synthetically Adjust Voice Quality and Articulation Parameters for Emotional Speech Synthesis. In: Tao, J., Tan, T., Picard, R.W. (eds) Affective Computing and Intelligent Interaction. ACII 2005. Lecture Notes in Computer Science, vol 3784. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11573548_43

Download citation

DOI: https://doi.org/10.1007/11573548_43
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29621-8
Online ISBN: 978-3-540-32273-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics