ABSTRACT
To address the need for interoperable user representations (avatars) cross-platform exchange format for immersive realities, ISO/IEC JTC 1/SC29/WG03 MPEG Systems has standardized a Scene Description framework in ISO/IEC 23090-14 [ISO/IEC 2023]. It serves as a baseline format for user representation format to enrich the interactive experience between 3D objects in an immersive scene. This work presents the MPEG Original Reference Geometric Avatar Neutral (Morgan), a humanoid avatar specified as informative content in the MPEG-I Scene Description (MPEG-I SD) standardization group. Morgan is a generic avatar representation that facilitates interactivity and manipulation in immersive realities and is accompanied by a complete body mesh and realistic appearance, hierarchical skeletal representation, blend shapes, eye globes, jaws with teeth and semantical representation of human body parts.
- T. Beeler, B. Bickel, P. Beardsley, B. Sumner, and M. Gross. 2010. High-quality single-shot capture of facial geometry. In ACM SIGGRAPH, Vol. 29. ACM, New York, NY, USA, 40:1–40:9. https://doi.org/10.1145/1778765.1778777Google ScholarDigital Library
- A. Bulat and G. Tzimiropoulos. 2017. How Far are We from Solving the 2D & 3D Face Alignment Problem? (and a Dataset of 230,000 3D Facial Landmarks). ICCV (12 2017), 1021–1030. https://doi.org/10.1109/ICCV.2017.116Google Scholar
- Chen Cao, Yanlin Weng, Shun Zhou, Yiying Tong, and Kun Zhou. 2014. FaceWarehouse: A 3D facial expression database for visual computing. IEEE Transactions on Visualization and Computer Graphics 20, 3 (2014), 413–425. https://doi.org/10.1109/TVCG.2013.249Google ScholarDigital Library
- J. Carranza, C. Theobalt, M. A. Magnor, and H-P. Seidel. 2003. Free-viewpoint Video of Human Actors. ACM Trans. Graph. 22, 3 (July 2003), 569–577. https://doi.org/10.1145/882262.882309Google ScholarDigital Library
- A. Collet, M. Chuang, P. Sweeney, D. Gillett, D. Evseev, D. Calabrese, H. Hoppe, A. Kirk, and S. Sullivan. 2015. High-quality Streamable Free-viewpoint Video. ACM Trans. Graph. 34, 4, Article 69 (July 2015), 69:1–69:13 pages. https://doi.org/10.1145/2766945Google ScholarDigital Library
- F. Danieau, I. Gubins, N. Olivier, O. Dumas, B. Denis, T. Lopez, N. Mollet, B. Frager, and Q. Avril. 2019. Automatic Generation and Stylization of 3D Facial Rigs. In 2019 IEEE Conf. on Virtual Reality and 3D User Interfaces. 784–792. https://doi.org/10.1109/VR.2019.8798208Google ScholarCross Ref
- C. Doukas, S. Zafeiriou, and V. Sharmanska. 2021. HeadGAN: One-shot Neural Head Synthesis and Editing. ICCV (2021). https://doi.org/10.1109/iccv48922.2021.01413 arxiv:2012.08261Google Scholar
- P. Ekman, W. V. Freisen, and S. Ancoli. 1980. Facial signs of emotional experience. In Journal of Personality and Social Psychology. 1125–1134. https://doi.org/10.1037/h0077722Google Scholar
- R. Gross, I. Matthews, J. Cohn, T. Kanade, and S. Baker. 2008. Multi-PIE. In FG. 1–8. http://dx.doi.org/10.1109/AFGR.2008.4813399Google Scholar
- ISO/IEC. 2006. ISO/IEC 19774:2006 Information technology — Computer graphics and image processing — Humanoid Animation (H-Anim). Standard ISO/IEC 19774:2006. International Organization for Standardization.Google Scholar
- ISO/IEC. 2020. ISO/IEC 23005 MPEG-V, Information technology — Media context and control. Standard ISO/IEC 23005. International Organization for Standardization.Google Scholar
- ISO/IEC. 2023. Potential improvements of ISO/IEC 23090-14 CDAM 2 Support for haptics, augmented reality, avatars, interactivity, MPEG-I audio and lighting. Standard ISO/IEC 23090-14:2023. International Organization for Standardization.Google Scholar
- M. Kowalski, J. Naruniec, and T. Trzcinski. 2017. Deep Alignment Network: A Convolutional Neural Network for Robust Face Alignment. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Vol. 2017-July. 2034–2043. https://doi.org/10.1109/CVPRW.2017.254 arxiv:1706.01789Google Scholar
- R. Li, K. Bladin, Y. Zhao, C. Chinara, O. Ingraham, P. Xiang, X. Ren, P. Prasad, B. Kishore, J. Xing, and H. Li. 2020. Learning formation of physically-based face attributes. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2020), 3407–3416. https://doi.org/10.1109/CVPR42600.2020.00347 arxiv:2004.03458Google Scholar
- Tianye Li, Timo Bolkart, Michael J. Black, Hao Li, and Javier Romero. 2017. Learning a model of facial shape and expression from 4D scans. In ACM Transactions on Graphics, Vol. 36. Association for Computing Machinery. https://doi.org/10.1145/3130800.3130813Google ScholarDigital Library
- Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J Black. 2015. SMPL: A Skinned Multi-Person Linear Model. (2015).Google ScholarDigital Library
- S. Ma, T. Simon, J. Saragih, D. Wang, Y. Li, F. De La Torre, and Y. Sheikh. 2021. Pixel Codec Avatars. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 64–73. https://doi.org/10.1109/CVPR46437.2021.00013 arxiv:2104.04638Google Scholar
- R. McDonnell, M. Breidt, and H. Bülthoff. 2012. Render me real?: investigating the effect of render style on the perception of animated virtual humans. ACM Trans. on Graph. 31, 4 (2012), 1–11. https://doi.org/10.1145/2185520.2185587Google ScholarDigital Library
- N. Olivier, K. Baert, F. Danieau, F. Multon, and Q. Avril. 2023. FaceTuneGAN: Face autoencoder for convolutional expression transfer using neural generative adversarial networks. Computers & Graphics 110 (2023), 69–85. https://doi.org/10.1016/j.cag.2022.12.004Google ScholarDigital Library
- N. Olivier, G. Kerbiriou, F. Arguelaguet, Q. Avril, F. Danieau, P. Guillotel, L. Hoyet, and F. Multon. 2022. Study on Automatic 3D Facial Caricaturization: From Rules to Deep Learning. Frontiers in Virtual Reality 2 (01 2022). https://doi.org/10.3389/frvir.2021.785104Google Scholar
- Pascal Paysan, Reinhard Knothe, Brian Amberg, Sami Romdhani, and Thomas Vetter. 2009. A 3D face model for pose and illumination invariant face recognition. In 6th IEEE International Conference on Advanced Video and Signal Based Surveillance, AVSS 2009. 296–301. https://doi.org/10.1109/AVSS.2009.58Google ScholarDigital Library
- Marius Preda and Françoise Preteux. 2002. Critic review on MPEG-4 face and body animation. In IEEE International Conference on Image Processing, Vol. 3. https://doi.org/10.1109/icip.2002.1039018Google ScholarCross Ref
- C. Sagonas, G. Tzimiropoulos, S. Zafeiriou, and M. Pantic. 2013. 300 Faces in-the-Wild Challenge: The First Facial Landmark Localization Challenge. In 2013 IEEE International Conference on Computer Vision Workshops, ICCV Workshops 2013, Sydney, Australia, December 1-8, 2013. IEEE Computer Society, 397–403. https://doi.org/10.1109/ICCVW.2013.59Google ScholarDigital Library
- Maria V. Sanchez-Vives and Mel Slater. 2005. From presence to consciousness through virtual reality., 332–339 pages. https://doi.org/10.1038/nrn1651Google Scholar
- J. Starck and A. Hilton. 2007. Surface Capture for Performance-Based Animation. IEEE Computer Graph. and App. 27, 3 (2007), 21–31. https://doi.org/10.1109/MCG.2007.68Google ScholarDigital Library
- K. Teotia, B. Mallikarjun, X. Pan, H. Kim, P. Garrido, M. Elgharib, and C. Theobalt. 2023. HQ3DAvatar: High Quality Controllable 3D Head Avatar. arxiv:2303.14471 [cs.CV]Google Scholar
- A. Tewari, M. Zollhöfer, H. Kim, P. Garrido, F. Bernard, P Pérez, and C. Theobalt. 2017. MoFA: Model-based Deep Convolutional Face Autoencoder for Unsupervised Monocular Reconstruction. ICCV (2017).Google Scholar
- The Khronos Group. 2021. glTF TM 2.0 Specification. https://registry.khronos.org/glTF/specs/2.0/glTF-2.0.htmlGoogle Scholar
- Unreal Engine. 2021. Grooming for Real-Time Realism: Hair and Fur with Unreal Engine. https://www.unrealengine.com/en-US/blog/realistic-hair-and-fur-with-unreal-engine-get-the-white-paperGoogle Scholar
- VRM Consortium. 2019. A General Incorporated Association Dedicated to Developing and Disseminating the "VRM" File Format for 3D Avatars. https://vrm-consortium.org/en/#vrmGoogle Scholar
- T-C. Wang, A. Mallya, and M-Y. Liu. 2021. One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing. CVPR (2021). https://doi.org/10.1109/cvpr46437.2021.00991 arxiv:2011.15126Google Scholar
- E. Wood, T. Baltrušaitis, C. Hewitt, S. Dziadzio, T. Cashman, and J. Shotton. 2021. Fake it till you make it: face analysis in the wild using synthetic data alone. In Proceedings of the IEEE International Conference on Computer Vision. 3661–3671. https://doi.org/10.1109/ICCV48922.2021.00366 arxiv:2109.15102Google Scholar
Index Terms
- MORGAN: MPEG Original Reference Geometric Avatar Neutral
Recommendations
Avatar interoperability and control in virtual Worlds
Virtual worlds (VWs), and especially the ones using 3D graphics content, have been massively developed and deployed over the last few years. Technological developments, both in hardware and software, bring up significant improvements in the areas of ...
Haptic Vibration for Emotional Expression of Avatar to Enhance the Realism of Virtual Reality
ICCTD '09: Proceedings of the 2009 International Conference on Computer Technology and Development - Volume 02This paper intends to show the importance of haptic vibration to support emotional expression of avatar in virtual reality systems. As we know emotional avatar plays a vital role to improve the degree of realism in virtual reality. Current practice in ...
My Body, My Avatar: How People Perceive Their Avatars in Social Virtual Reality
CHI EA '20: Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing SystemsThe perception and experience of avatars has been critical to understand the social dynamics in virtual environments, online gaming, and collaborative systems. How would emerging sociotechnical systems further complicate the role of avatars in our online ...
Comments