research-article

Generation of realistic facial animation of a CG avatar speaking a moraic language

Authors:
Ryoto Kato

Technical University of Munich, Germany

Technical University of Munich, Germany

0000-0002-6520-8270
View Profile

,
Yusuke Kikuchi

Tokyo Metropolitan University, Japan

Tokyo Metropolitan University, Japan

0000-0002-9385-9732
View Profile

,
Vibol Yem

Tokyo Metropolitan University, Japan

Tokyo Metropolitan University, Japan

0000-0002-5296-6128
View Profile

,
Yasushi Ikei

The University of Tokyo, Japan

The University of Tokyo, Japan

0000-0002-4405-5789
View Profile

AHs '23: Proceedings of the Augmented Humans International Conference 2023March 2023Pages 77–85https://doi.org/10.1145/3582700.3582705

Published:14 March 2023Publication History

AHs '23: Proceedings of the Augmented Humans International Conference 2023

Pages 77–85

ABSTRACT

We propose a new method for generating real-time realistic facial animation using face mesh data corresponding to the fifty-six C+V (Consonant and Vowel) type morae that form the basis of Japanese speech. This method produces facial expressions by weighted addition of fifty-three face meshes based on the mapping of voice streaming to registered morae in real-time. Both photogrammetric models and existing off-the-shelf head models can be used as face meshes. Natural facial expressions of speech can be synthesized from a modeling to live animation in less than two hours. The results of a user study of our method showed that the facial expression during Japanese speech was more natural than popular real-time methods to generate facial animation, English-base Oculus Lipsync and volume intensity based facial animations.

References

Visage Technologies AB. 2012. MPEG-4 Face and Body Animation (MPEG-4 FBA). https://www.visagetechnologies.com/uploads/2012/08/MPEG-4FBAOverview.pdf. (Accessed on 12/01/2022).Google Scholar
Apple. 2020. ARFaceAnchor.BlendShapeLocation | Apple Developer Documentation. https://developer.apple.com/documentation/arkit/arfaceanchor/blendshapelocation. (Accessed on 12/01/2022).Google Scholar
Gérard Bailly. 1997. Learning to speak. Sensori-motor control of speech movements. Speech Communication 22, 2 (1997), 251–267. https://doi.org/10.1016/S0167-6393(97)00025-3Google ScholarDigital Library
Eric Bateson, Gerard Bailly, Eric Vatikiotis-Bateson, and Pascal Perrier (Eds.). 2012. Audiovisual Speech Processing. Cambridge University Press, Cambridge, England.Google Scholar
Preston Blair. 2012. Animation: Learn How to Draw Animated Cartoons. Literary Licensing, USA.Google Scholar
Keith Brown and Sarah Ogilvie. 2008. Concise encyclopedia of languages of the world. Elsevier Science, London, England.Google Scholar
AHS Co.2020. VOICEROID2 Yukari Yuzuki. https://www.ah-soft.com/voiceroid/yukari/. (Accessed on 12/01/2022).Google Scholar
DevelopW Corporation. 2020. iFacialMocap. https://www.ifacialmocap.com/tutorial/unity/. (Accessed on 12/01/2022).Google Scholar
Daniel Cudeiro, Timo Bolkart, Cassidy Laidlaw, Anurag Ranjan, and Michael J. Black. 2019. Capture, Learning, and Synthesis of 3D Speaking Styles. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, CA, USA, 10093–10103. https://doi.org/10.1109/CVPR.2019.01034Google ScholarCross Ref
Pif Edwards, Chris Landreth, Eugene Fiume, and Karan Singh. 2016. JALI: An Animator-Centric Viseme Model for Expressive Lip Synchronization. ACM Trans. Graph. 35, 4, Article 127 (jul 2016), 11 pages. https://doi.org/10.1145/2897824.2925984Google ScholarDigital Library
Paul Ekman and Wallace V Friesen. 1978. Facial action coding system. Environmental Psychology & Nonverbal Behavior (1978).Google Scholar
T. Ezzat and T. Poggio. 1998. MikeTalk: a talking facial display based on morphing visemes. In Proceedings Computer Animation ’98 (Cat. No.98EX169). IEEE Computer Society, Philadelphia, USA, 96–102. https://doi.org/10.1109/CA.1998.681913Google ScholarCross Ref
Cletus G Fisher. 1968. Confusions among visually perceived consonants. Journal of speech and hearing research 11, 4 (1968), 796–804.Google ScholarCross Ref
hecomi. 2021. hecomi/uLipSync: A MFCC-based LipSync plugin for Unity using Burst Compiler. https://github.com/hecomi/uLipSync. (Accessed on 12/01/2022).Google Scholar
Tero Karras, Timo Aila, Samuli Laine, Antti Herva, and Jaakko Lehtinen. 2017. Audio-Driven Facial Animation by Joint End-to-End Learning of Pose and Emotion. ACM Trans. Graph. 36, 4, Article 94 (jul 2017), 12 pages. https://doi.org/10.1145/3072959.3073658Google ScholarDigital Library
J. P. Lewis, Ken Anjyo, Taehyun Rhee, Mengjie Zhang, Fred Pighin, and Zhigang Deng. 2014. Practice and Theory of Blendshape Facial Models. In Eurographics 2014 - State of the Art Reports, Sylvain Lefebvre and Michela Spagnuolo (Eds.). The Eurographics Association, Strasbourg, France, 199–218. https://doi.org/10.2312/egst.20141042Google Scholar
Meta. 2018. Tech Note: Enhancing Oculus Lipsync with Deep Learning. https://developer.oculus.com/blog/tech-note-enhancing-oculus-lipsync-with-deep-learning/. (Accessed on 12/01/2022).Google Scholar
Masahiro Mori, Karl F. MacDorman, and Norri Kageki. 2012. The Uncanny Valley [From the Field]. IEEE Robotics & Automation Magazine 19, 2 (2012), 98–100. https://doi.org/10.1109/MRA.2012.2192811Google ScholarCross Ref
Jason Osipa. 2010. Stop staring: Facial modeling and animation done right (3 ed.). John Wiley & Sons, Chichester, England.Google Scholar
T. Otake, G. Hatano, A. Cutler, and J. Mehler. 1993. Mora or Syllable? Speech Segmentation in Japanese. Journal of Memory and Language 32, 2 (1993), 258–278. https://doi.org/10.1006/jmla.1993.1014Google ScholarCross Ref
Frederick I. Parke. 1972. Computer Generated Animation of Faces. In Proceedings of the ACM Annual Conference - Volume 1 (Boston, Massachusetts, USA) (ACM ’72). Association for Computing Machinery, New York, NY, USA, 451–457. https://doi.org/10.1145/800193.569955Google ScholarDigital Library
R3DS. 2020. Wrapping — R3DS Wrap documentation. https://www.russian3dscanner.com/docs/Wrap3/Nodes/Wrapping/Wrapping.html. (Accessed on 12/01/2022).Google Scholar
Alexander Richard, Michael Zollhoefer, Yandong Wen, Fernando de la Torre, and Yaser Sheikh. 2021. MeshTalk: 3D Face Animation from Speech using Cross-Modality Disentanglement. https://doi.org/10.48550/ARXIV.2104.08223Google Scholar
Tim Riney and Janet Anderson-Hsieh. 1993. Japanese pronunciation of English. JALT Journal 15, 1 (1993), 21–36.Google Scholar
Keith Brown Ronald E. Asher. 2006. Encyclopedia of language and linguistics, 14-volume set (2 ed.). Elsevier Science & Technology, Amsterdam, Netherlands. 149–156 pages.Google Scholar
Sarah L. Taylor, Moshe Mahler, Barry-John Theobald, and Iain Matthews. 2012. Dynamic Units of Visual Speech. In Eurographics/ ACM SIGGRAPH Symposium on Computer Animation, Jehee Lee and Paul Kry (Eds.). The Eurographics Association, Lausanne, Switzerland, 275–284. https://doi.org/10.2312/SCA/SCA12/275-284Google Scholar
Lance Williams. 1990. Performance-Driven Facial Animation. SIGGRAPH Comput. Graph. 24, 4 (sep 1990), 235–242. https://doi.org/10.1145/97880.97906Google ScholarDigital Library
Yang Zhou, Zhan Xu, Chris Landreth, Evangelos Kalogerakis, Subhransu Maji, and Karan Singh. 2018. Visemenet: Audio-Driven Animator-Centric Speech Animation. ACM Trans. Graph. 37, 4, Article 161 (jul 2018), 10 pages. https://doi.org/10.1145/3197517.3201292Google ScholarDigital Library

Index Terms

Generation of realistic facial animation of a CG avatar speaking a moraic language
1. Computing methodologies
  1. Modeling and simulation
    1. Model development and analysis
      1. Model verification and validation
2. Human-centered computing
  1. Human computer interaction (HCI)
    1. Interaction devices
      1. Displays and imagers

Recommendations

Multimedia authoring tool for real-time facial animation
MCAM'07: Proceedings of the 2007 international conference on Multimedia content analysis and mining

We present a multimedia authoring tool for real-time facial animation based on multiple face models. In order to overcome the heavy burden of geometry data management on various multimedia applications, we employ wire curve [16] which is a simple, ...
Read More
Audio-driven talking face generation with diverse yet realistic facial animations
Abstract
Audio-driven talking face generation, which aims to synthesize talking faces with realistic facial animations (including accurate lip movements, vivid facial expression details and natural head poses) corresponding to the audio, has achieved ...
Highlights
- Generate diverse yet realistic talking faces from the same input audio.
- Network for modelling the uncertain relations between audio and visual signals.
- Novel technique that enables to generate temporally coherent talking faces.
Read More
Generation of optimized facial animation parameters
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
AHs '23: Proceedings of the Augmented Humans International Conference 2023
March 2023
395 pages
ISBN:9781450399845
DOI:10.1145/3582700
General Chairs:
Jamie A. Ward
Goldsmiths, University of London, United Kingdom
,
Mark McGill
University of Glasgow, United Kingdom
,
Karola Marky
Ruhr-University Bochum, Germany
Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 March 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Real-time facial animation
Realistic avatar
Speech expression
Virtual human
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 80
  Total Downloads
- Downloads (Last 12 months)46
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Generation of realistic facial animation of a CG avatar speaking a moraic language

AHs '23: Proceedings of the Augmented Humans International Conference 2023

ABSTRACT

References

Cited By

Index Terms

Recommendations

Multimedia authoring tool for real-time facial animation

Audio-driven talking face generation with diverse yet realistic facial animations

Generation of optimized facial animation parameters

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Generation of realistic facial animation of a CG avatar speaking a moraic language

AHs '23: Proceedings of the Augmented Humans International Conference 2023

ABSTRACT

References

Cited By

Index Terms

Recommendations

Multimedia authoring tool for real-time facial animation

Audio-driven talking face generation with diverse yet realistic facial animations

Generation of optimized facial animation parameters

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media