Abstract
Generating real-time animations for customized avatars is becoming of paramount importance, especially in Virtual Try-On applications. This technology allows customers to explore or “try on” products virtually. Despite the numerous benefits of this technology, there are some aspects that prevent its applicability in real scenarios. The first limitation regards the difficulties in generating expressive avatar animations. Moreover, potential customers usually expressed concerns regarding the fidelity of the animations. To overcome these two limitations, the current paper is aimed at presenting a framework for animating customized avatars based on state-of-the-art techniques. The focus of the proposed work mainly relies on aspects regarding the animation of the customized avatars. More specifically, the framework encompasses two components. The first one automatizes the operations needed for generating the data structures used for the avatar animation. This component assumes that the mesh of the avatar is described through the Sparse Unified Part-Based Human Representation (SUPR). The second component of the framework is designed to animate the avatar through motion capture by making use of the MediaPipe Holistic pipeline. Experimental evaluations were carried out aimed at assessing the solutions proposed for pose beautification and joint estimations. Results demonstrated improvements in the quality of the reconstructed animation from both an objective and subjective point of view.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Statistics on online sales: https://tinyurl.com/fashion-ecommerce-value.
- 2.
Blender: https://www.blender.org/.
- 3.
Autodesk Maya: https://tinyurl.com/bda6pcwa.
- 4.
Holistic landmarks detection: https://tinyurl.com/yc3m6b9v.
- 5.
Mixamo: https://www.mixamo.com/#/.
- 6.
Auto-Rig Pro: https://blendermarket.com/products/auto-rig-pro.
- 7.
Rigify: https://tinyurl.com/rigify.
- 8.
Unity: https://unity.com/.
- 9.
MediaPipe Guide: https://tinyurl.com/MediaPipe-Holistic.
- 10.
MediaPipeUnityPlugin Public: https://github.com/homuler/MediaPipeUnity.
- 11.
Face landmark detection guide: https://tinyurl.com/3vx5fjv9.
References
Achenbach, J., Waltemate, T., Latoschik, M.E., Botsch, M.: Fast generation of realistic virtual humans. In: Proceedings of the ACM Symposium on Virtual Reality Software and Technology, pp. 1–10 (2017)
Arora, R., Kazi, R.H., Kaufman, D.M., Li, W., Singh, K.: Magicalhands: mid-air hand gestures for animating in VR. In: Proceedings of the ACM Symposium on User Interface Software and Technology, pp. 463–477 (2019)
Baran, I., Popović, J.: Automatic rigging and animation of 3d characters. ACM Transa. Graph. 26(3), 72-es (2007)
Bazarevsky, V., Grishchenko, I., Raveendran, K., Zhu, T., Zhang, F., Grundmann, M.: Blazepose: on-device real-time body pose tracking. arXiv preprint arXiv:2006.10204 (2020)
Blázquez, M.: Fashion shopping in multichannel retail: the role of technology in enhancing the customer experience. Int. J. Electron. Commer. 18(4), 97–116 (2014)
Cannavò, A., Lamberti, F., et al.: A virtual character posing system based on reconfigurable tangible user interfaces and immersive virtual reality. In: Proceedings of the Smart Tools and Applications in Graphics, pp. 1–11. Eurographics (2018)
Cannavò, A., Pratticò, F.G., Ministeri, G., Lamberti, F.: A movement analysis system based on immersive virtual reality and wearable technology for sport training. In: Proceedings of the International Conference on Virtual Reality, pp. 26–31 (2018)
Cao, C., Weng, Y., Zhou, S., Tong, Y., Zhou, K.: Facewarehouse: a 3D facial expression database for visual computing. IEEE Trans. Vis. Comput. Graph. 20(3), 413–425 (2013)
Cao, Z., Hidalgo, G., Simon, T., Wei, S.E., Sheikh, Y.: Openpose: realtime multi-person 2D pose estimation using part affinity fields. IEEE Trans. Pattern Anal. Mach. Intell. 43(1), 172–186 (2021)
Cudeiro, D., Bolkart, T., Laidlaw, C., Ranjan, A., Black, M.J.: Capture, learning, and synthesis of 3D speaking styles. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10101–10111 (2019)
Fang, H.S., Xie, S., Tai, Y.W., Lu, C.: RMPE: regional multi-person pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision. pp. 2334–2343 (2017)
Gao, Y., Petersson Brooks, E., Brooks, A.L.: The performance of self in the context of shopping in a virtual dressing room system. In: Nah, F.F.-H. (ed.) HCIB 2014. LNCS, vol. 8527, pp. 307–315. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07293-7_30
Hangaragi, S., Singh, T., Neelima, N.: Face detection and recognition using face mesh and deep neural network. Procedia Comput. Sci. 218, 741–749 (2023)
Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3. 6m: large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1325–1339 (2013)
James, D.L., Twigg, C.D.: Skinning mesh animations. ACM Trans. Graph. 24(3), 399–407 (2005)
John, V., Trucco, E.: Charting-based subspace learning for video-based human action classification. Mach. Vis. Appl. 25, 119–132 (2014)
Knöpfle, C., Jung, Y.: The virtual human platform: simplifying the use of virtual characters. Int. J. Virtual Reality 5(2), 25–30 (2006)
Kulkarni, S., Deshmukh, S., Fernandes, F., Patil, A., Jabade, V.: Poseanalyser: a survey on human pose estimation. SN Comput. Sci. 4(2), 136 (2023)
Lagė, A., Ancutienė, K.: Virtual try-on technologies in the clothing industry: basic block pattern modification. Int. J. Cloth. Sci. Technol. (2019)
Lee, H., Xu, Y.: Classification of virtual fitting room technologies in the fashion industry: from the perspective of consumer experience. Int. J. Fashion Des. Technol. Educ. 13(1), 1–10 (2020)
Liu, Y., Liu, Y., Xu, S., Cheng, K., Masuko, S., Tanaka, J.: Comparing VR-and AR-based try-on systems using personalized avatars. Electronics 9(11), 1814 (2020)
Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: a skinned multi-person linear model. ACM Trans. Graph. 34(6), 1–16 (2015)
Lugaresi, C., et al.: Mediapipe: a framework for building perception pipelines. arXiv preprint arXiv:1906.08172 (2019)
Mahmood, N., Ghorbani, N., Troje, N.F., Pons-Moll, G., Black, M.J.: Amass: archive of motion capture as surface shapes. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5442–5451 (2019)
Maji, D., Nagori, S., Mathew, M., Poddar, D.: Yolo-pose: enhancing yolo for multi person pose estimation using object keypoint similarity loss. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2637–2646 (2022)
Nunnari, F., Heloir, A.: Yet another low-level agent handler. Comput. Animat. Virtual Worlds 30(3–4), e1891 (2019). https://doi.org/10.1002/cav.1891, https://onlinelibrary.wiley.com/doi/abs/10.1002/cav.1891
Osman, A.A.A., Bolkart, T., Black, M.J.: STAR: sparse trained articulated human body regressor. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12351, pp. 598–613. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58539-6_36
Osman, A.A., Bolkart, T., Tzionas, D., Black, M.J.: SUPR: a sparse unified part-based human representation. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. Lecture Notes in Computer Science, vol. 13662, pp. 568–585. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20086-1_33
Park, S.I., Shin, H.J., Kim, T.H., Shin, S.Y.: On-line motion blending for real-time locomotion generation. Comput. Animat. Virtual Worlds 15(3–4), 125–138 (2004)
Parmar, D., Olafsson, S., Utami, D., Murali, P., Bickmore, T.: Designing empathic virtual agents: manipulating animation, voice, rendering, and empathy to create persuasive agents. Auton. Agent. Multi-Agent Syst. 36(1), 17 (2022)
Pavlakos, G., et al.: Expressive body capture: 3D hands, face, and body from a single image. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10975–10985 (2019)
Rumman, N.A., Fratarcangeli, M.: Skin deformation methods for interactive character animation. In: Braz, J., et al. (eds.) VISIGRAPP 2016. CCIS, vol. 693, pp. 153–174. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-64870-5_8
Savastano, M., Barnabei, R., Ricotta, F.: Going online while purchasing offline: an explorative analysis of omnichannel shopping behaviour in retail settings. In: Proceedings of the International Marketing Trends Conference, vol. 1, p. 22 (2016)
Scurati, G.W., Bertoni, M., Graziosi, S., Ferrise, F.: Exploring the use of virtual reality to support environmentally sustainable behavior: A framework to design experiences. Sustainability 13(2), 943 (2021)
Song, W., Wang, X., Gao, Y., Hao, A., Hou, X.: Real-time expressive avatar animation generation based on monocular videos. In: Proceedigns of the IEEE International Symposium on Mixed and Augmented Reality Adjunct, pp. 429–434. IEEE (2022)
Tang, M.T., Zhu, V.L., Popescu, V.: Alterecho: loose avatar-streamer coupling for expressive vtubing. In: Proceedings of the IEEE International Symposium on Mixed and Augmented Reality, pp. 128–137. IEEE (2021)
Tompson, J., Goroshin, R., Jain, A., LeCun, Y., Bregler, C.: Efficient object localization using convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 648–656 (2015)
Toshev, A., Szegedy, C.: Deeppose: Human pose estimation via deep neural networks. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 1653–1660 (2014)
Xu, H., Bazavan, E.G., Zanfir, A., Freeman, W.T., Sukthankar, R., Sminchisescu, C.: GHUM & GHUML: generative 3D human shape and articulated pose models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6184–6193 (2020)
Xu, Z., Zhou, Y., Kalogerakis, E., Landreth, C., Singh, K.: Rignet: neural rigging for articulated characters. ACM Trans. Graph. 39 (2020)
Yu, T., Zheng, Z., Guo, K., Liu, P., Dai, Q., Liu, Y.: Function4d: real-time human volumetric capture from very sparse consumer RGBD sensors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5746–5756 (2021)
Zhang, F., et al.: Mediapipe hands: on-device real-time hand tracking. arXiv preprint arXiv:2006.10214 (2020)
Zhang, Y., Li, Z., An, L., Li, M., Yu, T., Liu, Y.: Lightweight multi-person total motion capture using sparse multi-view cameras. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5560–5569 (2021)
Acknowledgements
This research was developed in collaboration with Protocube Reply and VR@POLITO, and was supported by PON “Ricerca e Innovazione” 2014-2020 - DM 1062/2021 funds. The authors want to thank Angela D’Antonio for her contribution to the design and implementation of the software.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Cannavò, A., Pesando, R., Lamberti, F. (2023). A Framework for Animating Customized Avatars from Monocular Videos in Virtual Try-On Applications. In: De Paolis, L.T., Arpaia, P., Sacco, M. (eds) Extended Reality. XR Salento 2023. Lecture Notes in Computer Science, vol 14218. Springer, Cham. https://doi.org/10.1007/978-3-031-43401-3_5
Download citation
DOI: https://doi.org/10.1007/978-3-031-43401-3_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43400-6
Online ISBN: 978-3-031-43401-3
eBook Packages: Computer ScienceComputer Science (R0)