skip to main content
10.1145/3582700.3582705acmotherconferencesArticle/Chapter ViewAbstractPublication PagesahsConference Proceedingsconference-collections
research-article

Generation of realistic facial animation of a CG avatar speaking a moraic language

Published:14 March 2023Publication History

ABSTRACT

We propose a new method for generating real-time realistic facial animation using face mesh data corresponding to the fifty-six C+V (Consonant and Vowel) type morae that form the basis of Japanese speech. This method produces facial expressions by weighted addition of fifty-three face meshes based on the mapping of voice streaming to registered morae in real-time. Both photogrammetric models and existing off-the-shelf head models can be used as face meshes. Natural facial expressions of speech can be synthesized from a modeling to live animation in less than two hours. The results of a user study of our method showed that the facial expression during Japanese speech was more natural than popular real-time methods to generate facial animation, English-base Oculus Lipsync and volume intensity based facial animations.

References

  1. Visage Technologies AB. 2012. MPEG-4 Face and Body Animation (MPEG-4 FBA). https://www.visagetechnologies.com/uploads/2012/08/MPEG-4FBAOverview.pdf. (Accessed on 12/01/2022).Google ScholarGoogle Scholar
  2. Apple. 2020. ARFaceAnchor.BlendShapeLocation | Apple Developer Documentation. https://developer.apple.com/documentation/arkit/arfaceanchor/blendshapelocation. (Accessed on 12/01/2022).Google ScholarGoogle Scholar
  3. Gérard Bailly. 1997. Learning to speak. Sensori-motor control of speech movements. Speech Communication 22, 2 (1997), 251–267. https://doi.org/10.1016/S0167-6393(97)00025-3Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Eric Bateson, Gerard Bailly, Eric Vatikiotis-Bateson, and Pascal Perrier (Eds.). 2012. Audiovisual Speech Processing. Cambridge University Press, Cambridge, England.Google ScholarGoogle Scholar
  5. Preston Blair. 2012. Animation: Learn How to Draw Animated Cartoons. Literary Licensing, USA.Google ScholarGoogle Scholar
  6. Keith Brown and Sarah Ogilvie. 2008. Concise encyclopedia of languages of the world. Elsevier Science, London, England.Google ScholarGoogle Scholar
  7. AHS Co.2020. VOICEROID2 Yukari Yuzuki. https://www.ah-soft.com/voiceroid/yukari/. (Accessed on 12/01/2022).Google ScholarGoogle Scholar
  8. DevelopW Corporation. 2020. iFacialMocap. https://www.ifacialmocap.com/tutorial/unity/. (Accessed on 12/01/2022).Google ScholarGoogle Scholar
  9. Daniel Cudeiro, Timo Bolkart, Cassidy Laidlaw, Anurag Ranjan, and Michael J. Black. 2019. Capture, Learning, and Synthesis of 3D Speaking Styles. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, CA, USA, 10093–10103. https://doi.org/10.1109/CVPR.2019.01034Google ScholarGoogle ScholarCross RefCross Ref
  10. Pif Edwards, Chris Landreth, Eugene Fiume, and Karan Singh. 2016. JALI: An Animator-Centric Viseme Model for Expressive Lip Synchronization. ACM Trans. Graph. 35, 4, Article 127 (jul 2016), 11 pages. https://doi.org/10.1145/2897824.2925984Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Paul Ekman and Wallace V Friesen. 1978. Facial action coding system. Environmental Psychology & Nonverbal Behavior (1978).Google ScholarGoogle Scholar
  12. T. Ezzat and T. Poggio. 1998. MikeTalk: a talking facial display based on morphing visemes. In Proceedings Computer Animation ’98 (Cat. No.98EX169). IEEE Computer Society, Philadelphia, USA, 96–102. https://doi.org/10.1109/CA.1998.681913Google ScholarGoogle ScholarCross RefCross Ref
  13. Cletus G Fisher. 1968. Confusions among visually perceived consonants. Journal of speech and hearing research 11, 4 (1968), 796–804.Google ScholarGoogle ScholarCross RefCross Ref
  14. hecomi. 2021. hecomi/uLipSync: A MFCC-based LipSync plugin for Unity using Burst Compiler. https://github.com/hecomi/uLipSync. (Accessed on 12/01/2022).Google ScholarGoogle Scholar
  15. Tero Karras, Timo Aila, Samuli Laine, Antti Herva, and Jaakko Lehtinen. 2017. Audio-Driven Facial Animation by Joint End-to-End Learning of Pose and Emotion. ACM Trans. Graph. 36, 4, Article 94 (jul 2017), 12 pages. https://doi.org/10.1145/3072959.3073658Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. P. Lewis, Ken Anjyo, Taehyun Rhee, Mengjie Zhang, Fred Pighin, and Zhigang Deng. 2014. Practice and Theory of Blendshape Facial Models. In Eurographics 2014 - State of the Art Reports, Sylvain Lefebvre and Michela Spagnuolo (Eds.). The Eurographics Association, Strasbourg, France, 199–218. https://doi.org/10.2312/egst.20141042Google ScholarGoogle Scholar
  17. Meta. 2018. Tech Note: Enhancing Oculus Lipsync with Deep Learning. https://developer.oculus.com/blog/tech-note-enhancing-oculus-lipsync-with-deep-learning/. (Accessed on 12/01/2022).Google ScholarGoogle Scholar
  18. Masahiro Mori, Karl F. MacDorman, and Norri Kageki. 2012. The Uncanny Valley [From the Field]. IEEE Robotics & Automation Magazine 19, 2 (2012), 98–100. https://doi.org/10.1109/MRA.2012.2192811Google ScholarGoogle ScholarCross RefCross Ref
  19. Jason Osipa. 2010. Stop staring: Facial modeling and animation done right (3 ed.). John Wiley & Sons, Chichester, England.Google ScholarGoogle Scholar
  20. T. Otake, G. Hatano, A. Cutler, and J. Mehler. 1993. Mora or Syllable? Speech Segmentation in Japanese. Journal of Memory and Language 32, 2 (1993), 258–278. https://doi.org/10.1006/jmla.1993.1014Google ScholarGoogle ScholarCross RefCross Ref
  21. Frederick I. Parke. 1972. Computer Generated Animation of Faces. In Proceedings of the ACM Annual Conference - Volume 1 (Boston, Massachusetts, USA) (ACM ’72). Association for Computing Machinery, New York, NY, USA, 451–457. https://doi.org/10.1145/800193.569955Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. R3DS. 2020. Wrapping — R3DS Wrap documentation. https://www.russian3dscanner.com/docs/Wrap3/Nodes/Wrapping/Wrapping.html. (Accessed on 12/01/2022).Google ScholarGoogle Scholar
  23. Alexander Richard, Michael Zollhoefer, Yandong Wen, Fernando de la Torre, and Yaser Sheikh. 2021. MeshTalk: 3D Face Animation from Speech using Cross-Modality Disentanglement. https://doi.org/10.48550/ARXIV.2104.08223Google ScholarGoogle Scholar
  24. Tim Riney and Janet Anderson-Hsieh. 1993. Japanese pronunciation of English. JALT Journal 15, 1 (1993), 21–36.Google ScholarGoogle Scholar
  25. Keith Brown Ronald E. Asher. 2006. Encyclopedia of language and linguistics, 14-volume set (2 ed.). Elsevier Science & Technology, Amsterdam, Netherlands. 149–156 pages.Google ScholarGoogle Scholar
  26. Sarah L. Taylor, Moshe Mahler, Barry-John Theobald, and Iain Matthews. 2012. Dynamic Units of Visual Speech. In Eurographics/ ACM SIGGRAPH Symposium on Computer Animation, Jehee Lee and Paul Kry (Eds.). The Eurographics Association, Lausanne, Switzerland, 275–284. https://doi.org/10.2312/SCA/SCA12/275-284Google ScholarGoogle Scholar
  27. Lance Williams. 1990. Performance-Driven Facial Animation. SIGGRAPH Comput. Graph. 24, 4 (sep 1990), 235–242. https://doi.org/10.1145/97880.97906Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Yang Zhou, Zhan Xu, Chris Landreth, Evangelos Kalogerakis, Subhransu Maji, and Karan Singh. 2018. Visemenet: Audio-Driven Animator-Centric Speech Animation. ACM Trans. Graph. 37, 4, Article 161 (jul 2018), 10 pages. https://doi.org/10.1145/3197517.3201292Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Generation of realistic facial animation of a CG avatar speaking a moraic language

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        AHs '23: Proceedings of the Augmented Humans International Conference 2023
        March 2023
        395 pages
        ISBN:9781450399845
        DOI:10.1145/3582700

        Copyright © 2023 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 14 March 2023

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited
      • Article Metrics

        • Downloads (Last 12 months)46
        • Downloads (Last 6 weeks)2

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format