skip to main content
10.1145/957013.957089acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
Article

Model-based talking face synthesis for anthropomorphic spoken dialog agent system

Authors Info & Claims
Published:02 November 2003Publication History

ABSTRACT

Towards natural human-machine communication, interface technologies by way of speech and image information have been intensively developed. An anthropomorphic dialog agent is an ideal system, which integrates spoken dialog and natural facial expressions. This paper reports on our project aiming to create a general-purpose toolkit for building an easily customizable anthropomorphic agent. There have been almost no tools so far such as intuitive, easy to understand, fully interactive, and open source. Our anthropomorphic agent is designed to fulfill these requirements. This toolkit consists four modules, multi modal dialog integration, speech recognition, speech synthesis, and face image synthesis. These modules are highly modularized and interlinked by a simple communication protocols.In this paper, we focus on the construction of an agent's face image synthesis. For this part lip movement control synchronous to the speech signal and facial emotion expression are the most important parts. We developed the face image synthesis module (FSM) that only requires one frontal face image, and can be used by any skill level of users. A user's original agent can be generated by easy adjustment of the frontal face image and the generic wire-frame model. The paper describes overall system diagram and specifically the agent's face image synthesis part.

References

  1. DARPA: Communicator Program (1998). http://fofoca.mitre.org/.Google ScholarGoogle Scholar
  2. Seneff, S., Hurley, E., Lau, R., Pao, C., Schmid, P. and Zue, V.: GALAXY-II: A Referece Architecture for Conversational System Development, ICSLP-1998, pp. 931--934 (1998).Google ScholarGoogle Scholar
  3. OAA: (The Open Agent Architecture). http://www.ai.sri.com/Eoaa/.Google ScholarGoogle Scholar
  4. VoiceXML: (Voice eXtensible Markup Language Ver1.0) (2000). http://www.voicexml.org.Google ScholarGoogle Scholar
  5. Yoshimura, T., Tokuda, K., Masuko, T.,Kobayashi, T. and Kitamura, T.: Speaker Interpolation for HMM-based Speech Synthesis System, J Acoust. Soc. Jpn. (E), Vol. 21, No. 4, pp. 199--206 (2000).Google ScholarGoogle ScholarCross RefCross Ref
  6. Itou, K., Hayamizu, S., Tanaka, K., Tanaka, H.: Sysstem design data collection and evaluation of a speech dialogue system, IEICE Trans. Inf. And Syst., Vol.36, No.1, pp.121--127 (1993)Google ScholarGoogle Scholar
  7. Morishima, S.: Face-to-face Communication in Cyberspace using Analysis and Synthesis of Facial Expression, Proceedings of '99 International Workshop on Advanced Image Technology(IWAIT99), pp.111--118 (1999) Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Ekman, P., Friesen, W. V.: Manual for the Facial Action Coding System and Action Unit Photographs. Palo Alto, CA: Consulting Psychological Press. (1978)Google ScholarGoogle Scholar

Index Terms

  1. Model-based talking face synthesis for anthropomorphic spoken dialog agent system

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          MULTIMEDIA '03: Proceedings of the eleventh ACM international conference on Multimedia
          November 2003
          670 pages
          ISBN:1581137222
          DOI:10.1145/957013

          Copyright © 2003 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 2 November 2003

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • Article

          Acceptance Rates

          Overall Acceptance Rate995of4,171submissions,24%

          Upcoming Conference

          MM '24
          MM '24: The 32nd ACM International Conference on Multimedia
          October 28 - November 1, 2024
          Melbourne , VIC , Australia

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader