Personalized Voice Assignment Techniques for Synchronized Scenario Speech Output in Entertainment Systems

Kawamoto, Shin-ichi; Yotsukura, Tatsuo; Nakamura, Satoshi; Morishima, Shigeo

doi:10.1007/978-3-642-22024-1_20

Shin-ichi Kawamoto^17,19,
Tatsuo Yotsukura¹⁸,
Satoshi Nakamura¹⁹ &
…
Shigeo Morishima²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6774))

Included in the following conference series:

International Conference on Virtual and Mixed Reality

2013 Accesses

Abstract

The paper describes voice assignment techniques for synchronized scenario speech output in an instant casting movie system that enables anyone to be a movie star using his or her own voice and face. Two prototype systems were implemented, and both systems worked well for various participants, ranging from children to the elderly.

Download to read the full chapter text

Chapter PDF

TMCSpeech: A Chinese TV and Movie Speech Dataset with Character Descriptions and a Character-Based Voice Generation Model

Musical Syntax and Sonification of Voice and Speech Interfaces: A Case Study in Turn-Taking

Research on Speech Interaction Design Based on Emotion

Keywords

References

Maejima, A., Wemler, S., Machida, T., Takebayahashi, M., Morishima, S.: Instant Casting Movie Theater: The Future Cast System. The IEICE Transactions on Information and Systems E91-D(4), 1135–1148 (2008)
Article Google Scholar
Zen, H., Nose, T., Yamagishi, J., Sako, S., Masuko, T., Black, A.W., Tokuda, K.: The HMM-based speech synthesis system version 2.0. In: Proc. of ISCA SSW6, Bonn, Germany (2007)
Google Scholar
Kawai, H., Toda, T., Yamagishi, J., Hirai, T., Ni, J., Nishizawa, N., Tsuzaki, M., Tokuda, K.: XIMERA: A Concatenative Speech Synthesis System with Large Scale Corpora. IEICE Trans. J89-D-II(12), 2688–2698 (2006)
Google Scholar
Hunt, A., Black, A.: Unit selection in a concatenative speech synthesis system using a large speech database. In: Proc. ICASSP, pp. 373–376 (1996)
Google Scholar
Clark, R.A.K., Richmond, K., King, S.: Multisyn: Open-domain unit selection for the Festival speech synthesis system. Speech Communication 49(4), 317–330 (2007)
Article Google Scholar
Reynolds, D.: Robust text-independent speaker identication using gaussian mixture speaker models. IEEE Trans. On Acoust. Speech and Audio Processing 3(1) (1995)
Google Scholar
Kitamura, T., Saitou, T.: Contribution of acoustic features of sustained vowels on perception of speaker characteristic. In: Proc. of Acoustical Society of Japan 2007 Spring Meeting, pp. 443–444 (2007)
Google Scholar
Saitou, T., Kitamura, T.: Factors in /vvv/ concatenated vowels affecting perception of speaker individuality. In: Proc. of Acoustical Society of Japan 2007 Spring Meeting, pp. 441–442 (2007)
Google Scholar
Higuchi, N., Hashimoto, M.: Analysis of acoustic features affecting speaker identification. In: Proc. of EUROSPEECH, pp. 435–438 (1995)
Google Scholar
Kawahara, H.: Straight: An extremely high-quality vocoder for auditory and speech perception research. In: Greenberg, Slaney (eds.) Computational Models of Auditory Function, pp. 343–354 (2001)
Google Scholar
Kawahara, H., Matsui, H.: Auditory morphing based on an elastic perceptual distance metric in an interference-free time-frequency representation. In: Proc. of ICASSP, vol. 1, pp. 256–259 (2003)
Google Scholar
Slaney, M., Covell, M., Lassiter, B.: Automatic audio morphing. In: Proc. of ICASSP, pp. 1001–1004 (1995)
Google Scholar
Takahashi, T., Nishi, M., Irino, T., Kawahara, H.: Average voice synthesis using multiple speech morphing. In: Proc. of Acoustical Society of Japan 2006 Spring Meeting, pp. 229–230 (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Japan Advanced Institute of Science and Technology, 1-1 Asahidai, Nomi, Ishikawa, 923-1211, Japan
Shin-ichi Kawamoto
OLM Digital Inc., Mikami Bldg. 2F, 1-18-10 Wakabayashi, Setagaya-ku, Tokyo, 154-0023, Japan
Tatsuo Yotsukura
National Institute of Information and Communications Technology, 3-5 Hikaridai, Seika-cho, Soraku-gun, Kyoto, 619-0289, Japan
Shin-ichi Kawamoto & Satoshi Nakamura
Waseda University, 3-4-1 Okubo, Shinjuku-ku, Tokyo, 169-8555, Japan
Shigeo Morishima

Authors

Shin-ichi Kawamoto
View author publications
You can also search for this author in PubMed Google Scholar
Tatsuo Yotsukura
View author publications
You can also search for this author in PubMed Google Scholar
Satoshi Nakamura
View author publications
You can also search for this author in PubMed Google Scholar
Shigeo Morishima
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute for Simulation and Training, University of Central Florida, 3100 Technology Parkway and 3280 Progress Drive, 32826, Orlando, FL, USA
Randall Shumaker

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kawamoto, Si., Yotsukura, T., Nakamura, S., Morishima, S. (2011). Personalized Voice Assignment Techniques for Synchronized Scenario Speech Output in Entertainment Systems. In: Shumaker, R. (eds) Virtual and Mixed Reality - Systems and Applications. VMR 2011. Lecture Notes in Computer Science, vol 6774. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22024-1_20

Download citation

DOI: https://doi.org/10.1007/978-3-642-22024-1_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-22023-4
Online ISBN: 978-3-642-22024-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics