Article

Model-based talking face synthesis for anthropomorphic spoken dialog agent system

Authors:
Tatsuo Yotsukura

ATR Spoken Language Translation Research Laboratories, "Keihanna Science City", Kyoto, Japan

ATR Spoken Language Translation Research Laboratories, "Keihanna Science City", Kyoto, Japan
View Profile

,
Shigeo Morishima

Seikei University, Musashinoshi, Tokyo, Japan

Seikei University, Musashinoshi, Tokyo, Japan
View Profile

,
Satoshi Nakamura

ATR Spoken Language Translation Research Laboratories, "Keihanna Science City", Kyoto, Japan

ATR Spoken Language Translation Research Laboratories, "Keihanna Science City", Kyoto, Japan
View Profile

MULTIMEDIA '03: Proceedings of the eleventh ACM international conference on MultimediaNovember 2003Pages 351–354https://doi.org/10.1145/957013.957089

Published:02 November 2003Publication History

MULTIMEDIA '03: Proceedings of the eleventh ACM international conference on Multimedia

Pages 351–354

ABSTRACT

Towards natural human-machine communication, interface technologies by way of speech and image information have been intensively developed. An anthropomorphic dialog agent is an ideal system, which integrates spoken dialog and natural facial expressions. This paper reports on our project aiming to create a general-purpose toolkit for building an easily customizable anthropomorphic agent. There have been almost no tools so far such as intuitive, easy to understand, fully interactive, and open source. Our anthropomorphic agent is designed to fulfill these requirements. This toolkit consists four modules, multi modal dialog integration, speech recognition, speech synthesis, and face image synthesis. These modules are highly modularized and interlinked by a simple communication protocols.In this paper, we focus on the construction of an agent's face image synthesis. For this part lip movement control synchronous to the speech signal and facial emotion expression are the most important parts. We developed the face image synthesis module (FSM) that only requires one frontal face image, and can be used by any skill level of users. A user's original agent can be generated by easy adjustment of the frontal face image and the generic wire-frame model. The paper describes overall system diagram and specifically the agent's face image synthesis part.

References

DARPA: Communicator Program (1998). http://fofoca.mitre.org/.Google Scholar
Seneff, S., Hurley, E., Lau, R., Pao, C., Schmid, P. and Zue, V.: GALAXY-II: A Referece Architecture for Conversational System Development, ICSLP-1998, pp. 931--934 (1998).Google Scholar
OAA: (The Open Agent Architecture). http://www.ai.sri.com/Eoaa/.Google Scholar
VoiceXML: (Voice eXtensible Markup Language Ver1.0) (2000). http://www.voicexml.org.Google Scholar
Yoshimura, T., Tokuda, K., Masuko, T.,Kobayashi, T. and Kitamura, T.: Speaker Interpolation for HMM-based Speech Synthesis System, J Acoust. Soc. Jpn. (E), Vol. 21, No. 4, pp. 199--206 (2000).Google ScholarCross Ref
Itou, K., Hayamizu, S., Tanaka, K., Tanaka, H.: Sysstem design data collection and evaluation of a speech dialogue system, IEICE Trans. Inf. And Syst., Vol.36, No.1, pp.121--127 (1993)Google Scholar
Morishima, S.: Face-to-face Communication in Cyberspace using Analysis and Synthesis of Facial Expression, Proceedings of '99 International Workshop on Advanced Image Technology(IWAIT99), pp.111--118 (1999) Google ScholarDigital Library
Ekman, P., Friesen, W. V.: Manual for the Facial Action Coding System and Action Unit Photographs. Palo Alto, CA: Consulting Psychological Press. (1978)Google Scholar

Index Terms

Model-based talking face synthesis for anthropomorphic spoken dialog agent system
1. Computing methodologies
  1. Computer graphics
    1. Animation
    2. Graphics systems and interfaces
2. Human-centered computing
  1. Human computer interaction (HCI)
    1. Interaction devices
      1. Graphics input devices

Recommendations

Spontaneous spoken dialogues with the furhat human-like robot head
HRI '14: Proceedings of the 2014 ACM/IEEE international conference on Human-robot interaction

Furhat [1] is a robot head that deploys a back-projected animated face that is realistic and human-like in anatomy. Furhat relies on a state-of-the-art facial animation architecture allowing accurate synchronized lip movements with speech, and the ...
Read More
Animating expressive faces across languages

This paper describes a morphing-based audio driven facial animation system. Based on an incoming audio stream, a face image is animated with full lip synchronization and synthesized expressions. A novel scheme to implement a language independent system ...
Read More
An extensible framework for interactive facial animation with facial expressions, lip synchronization and eye behavior
SPECIAL ISSUE: Games

In this article we describe our approach to generating convincing and empathetic facial animation. Our goal is to develop a robust facial animation platform that is usable and can be easily extended. We also want to facilitate the integration of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MULTIMEDIA '03: Proceedings of the eleventh ACM international conference on Multimedia
November 2003
670 pages
ISBN:1581137222
DOI:10.1145/957013
General Chairs:
Lawrence Rowe
University of California, Berkeley
,
Harrick Vin
University of Texas, Austin
,
Program Chairs:
Thomas Plagemann
University of Oslo
,
Prashant Shenoy
University of Massachusetts, Amherst
,
John R. Smith
IBM T.J. Watson Research Center
Copyright © 2003 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 2 November 2003
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
anthropomorphic dialog agent
face image synthesis
facial animation
lip synchronization
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate995of4,171submissions,24%
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 456
  Total Downloads
- Downloads (Last 12 months)4
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Model-based talking face synthesis for anthropomorphic spoken dialog agent system

MULTIMEDIA '03: Proceedings of the eleventh ACM international conference on Multimedia

ABSTRACT

References

Cited By

Index Terms

Recommendations

Spontaneous spoken dialogues with the furhat human-like robot head

Animating expressive faces across languages

An extensible framework for interactive facial animation with facial expressions, lip synchronization and eye behavior