A Framework for Animating Customized Avatars from Monocular Videos in Virtual Try-On Applications

Cannavò, Alberto; Pesando, Roberto; Lamberti, Fabrizio

doi:10.1007/978-3-031-43401-3_5

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14218))

Included in the following conference series:

International Conference on Extended Reality

1076 Accesses

Abstract

Generating real-time animations for customized avatars is becoming of paramount importance, especially in Virtual Try-On applications. This technology allows customers to explore or “try on” products virtually. Despite the numerous benefits of this technology, there are some aspects that prevent its applicability in real scenarios. The first limitation regards the difficulties in generating expressive avatar animations. Moreover, potential customers usually expressed concerns regarding the fidelity of the animations. To overcome these two limitations, the current paper is aimed at presenting a framework for animating customized avatars based on state-of-the-art techniques. The focus of the proposed work mainly relies on aspects regarding the animation of the customized avatars. More specifically, the framework encompasses two components. The first one automatizes the operations needed for generating the data structures used for the avatar animation. This component assumes that the mesh of the avatar is described through the Sparse Unified Part-Based Human Representation (SUPR). The second component of the framework is designed to animate the avatar through motion capture by making use of the MediaPipe Holistic pipeline. Experimental evaluations were carried out aimed at assessing the solutions proposed for pose beautification and joint estimations. Results demonstrated improvements in the quality of the reconstructed animation from both an objective and subjective point of view.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Realistic Avatar Control Through Video-Driven Animation for Augmented Reality

Expressive Whole-Body 3D Gaussian Avatar

HAHA: Highly Articulated Gaussian Human Avatars with Textured Mesh Prior

Notes

1.
Statistics on online sales: https://tinyurl.com/fashion-ecommerce-value.
2.
Blender: https://www.blender.org/.
3.
Autodesk Maya: https://tinyurl.com/bda6pcwa.
4.
Holistic landmarks detection: https://tinyurl.com/yc3m6b9v.
5.
Mixamo: https://www.mixamo.com/#/.
6.
Auto-Rig Pro: https://blendermarket.com/products/auto-rig-pro.
7.
Rigify: https://tinyurl.com/rigify.
8.
Unity: https://unity.com/.
9.
MediaPipe Guide: https://tinyurl.com/MediaPipe-Holistic.
10.
MediaPipeUnityPlugin Public: https://github.com/homuler/MediaPipe Unity.
11.
Face landmark detection guide: https://tinyurl.com/3vx5fjv9.

References

Achenbach, J., Waltemate, T., Latoschik, M.E., Botsch, M.: Fast generation of realistic virtual humans. In: Proceedings of the ACM Symposium on Virtual Reality Software and Technology, pp. 1–10 (2017)
Google Scholar
Arora, R., Kazi, R.H., Kaufman, D.M., Li, W., Singh, K.: Magicalhands: mid-air hand gestures for animating in VR. In: Proceedings of the ACM Symposium on User Interface Software and Technology, pp. 463–477 (2019)
Google Scholar
Baran, I., Popović, J.: Automatic rigging and animation of 3d characters. ACM Transa. Graph. 26(3), 72-es (2007)
Google Scholar
Bazarevsky, V., Grishchenko, I., Raveendran, K., Zhu, T., Zhang, F., Grundmann, M.: Blazepose: on-device real-time body pose tracking. arXiv preprint arXiv:2006.10204 (2020)
Blázquez, M.: Fashion shopping in multichannel retail: the role of technology in enhancing the customer experience. Int. J. Electron. Commer. 18(4), 97–116 (2014)
Article Google Scholar
Cannavò, A., Lamberti, F., et al.: A virtual character posing system based on reconfigurable tangible user interfaces and immersive virtual reality. In: Proceedings of the Smart Tools and Applications in Graphics, pp. 1–11. Eurographics (2018)
Google Scholar
Cannavò, A., Pratticò, F.G., Ministeri, G., Lamberti, F.: A movement analysis system based on immersive virtual reality and wearable technology for sport training. In: Proceedings of the International Conference on Virtual Reality, pp. 26–31 (2018)
Google Scholar
Cao, C., Weng, Y., Zhou, S., Tong, Y., Zhou, K.: Facewarehouse: a 3D facial expression database for visual computing. IEEE Trans. Vis. Comput. Graph. 20(3), 413–425 (2013)
Google Scholar
Cao, Z., Hidalgo, G., Simon, T., Wei, S.E., Sheikh, Y.: Openpose: realtime multi-person 2D pose estimation using part affinity fields. IEEE Trans. Pattern Anal. Mach. Intell. 43(1), 172–186 (2021)
Article Google Scholar
Cudeiro, D., Bolkart, T., Laidlaw, C., Ranjan, A., Black, M.J.: Capture, learning, and synthesis of 3D speaking styles. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10101–10111 (2019)
Google Scholar
Fang, H.S., Xie, S., Tai, Y.W., Lu, C.: RMPE: regional multi-person pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision. pp. 2334–2343 (2017)
Google Scholar
Gao, Y., Petersson Brooks, E., Brooks, A.L.: The performance of self in the context of shopping in a virtual dressing room system. In: Nah, F.F.-H. (ed.) HCIB 2014. LNCS, vol. 8527, pp. 307–315. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07293-7_30
Chapter Google Scholar
Hangaragi, S., Singh, T., Neelima, N.: Face detection and recognition using face mesh and deep neural network. Procedia Comput. Sci. 218, 741–749 (2023)
Article Google Scholar
Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3. 6m: large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1325–1339 (2013)
Google Scholar
James, D.L., Twigg, C.D.: Skinning mesh animations. ACM Trans. Graph. 24(3), 399–407 (2005)
Google Scholar
John, V., Trucco, E.: Charting-based subspace learning for video-based human action classification. Mach. Vis. Appl. 25, 119–132 (2014)
Article Google Scholar
Knöpfle, C., Jung, Y.: The virtual human platform: simplifying the use of virtual characters. Int. J. Virtual Reality 5(2), 25–30 (2006)
Article Google Scholar
Kulkarni, S., Deshmukh, S., Fernandes, F., Patil, A., Jabade, V.: Poseanalyser: a survey on human pose estimation. SN Comput. Sci. 4(2), 136 (2023)
Article Google Scholar
Lagė, A., Ancutienė, K.: Virtual try-on technologies in the clothing industry: basic block pattern modification. Int. J. Cloth. Sci. Technol. (2019)
Google Scholar
Lee, H., Xu, Y.: Classification of virtual fitting room technologies in the fashion industry: from the perspective of consumer experience. Int. J. Fashion Des. Technol. Educ. 13(1), 1–10 (2020)
Article MathSciNet Google Scholar
Liu, Y., Liu, Y., Xu, S., Cheng, K., Masuko, S., Tanaka, J.: Comparing VR-and AR-based try-on systems using personalized avatars. Electronics 9(11), 1814 (2020)
Article Google Scholar
Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: a skinned multi-person linear model. ACM Trans. Graph. 34(6), 1–16 (2015)
Article Google Scholar
Lugaresi, C., et al.: Mediapipe: a framework for building perception pipelines. arXiv preprint arXiv:1906.08172 (2019)
Mahmood, N., Ghorbani, N., Troje, N.F., Pons-Moll, G., Black, M.J.: Amass: archive of motion capture as surface shapes. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5442–5451 (2019)
Google Scholar
Maji, D., Nagori, S., Mathew, M., Poddar, D.: Yolo-pose: enhancing yolo for multi person pose estimation using object keypoint similarity loss. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2637–2646 (2022)
Google Scholar
Nunnari, F., Heloir, A.: Yet another low-level agent handler. Comput. Animat. Virtual Worlds 30(3–4), e1891 (2019). https://doi.org/10.1002/cav.1891, https://onlinelibrary.wiley.com/doi/abs/10.1002/cav.1891
Osman, A.A.A., Bolkart, T., Black, M.J.: STAR: sparse trained articulated human body regressor. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12351, pp. 598–613. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58539-6_36
Chapter Google Scholar
Osman, A.A., Bolkart, T., Tzionas, D., Black, M.J.: SUPR: a sparse unified part-based human representation. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. Lecture Notes in Computer Science, vol. 13662, pp. 568–585. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20086-1_33
Chapter Google Scholar
Park, S.I., Shin, H.J., Kim, T.H., Shin, S.Y.: On-line motion blending for real-time locomotion generation. Comput. Animat. Virtual Worlds 15(3–4), 125–138 (2004)
Article Google Scholar
Parmar, D., Olafsson, S., Utami, D., Murali, P., Bickmore, T.: Designing empathic virtual agents: manipulating animation, voice, rendering, and empathy to create persuasive agents. Auton. Agent. Multi-Agent Syst. 36(1), 17 (2022)
Article Google Scholar
Pavlakos, G., et al.: Expressive body capture: 3D hands, face, and body from a single image. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10975–10985 (2019)
Google Scholar
Rumman, N.A., Fratarcangeli, M.: Skin deformation methods for interactive character animation. In: Braz, J., et al. (eds.) VISIGRAPP 2016. CCIS, vol. 693, pp. 153–174. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-64870-5_8
Chapter Google Scholar
Savastano, M., Barnabei, R., Ricotta, F.: Going online while purchasing offline: an explorative analysis of omnichannel shopping behaviour in retail settings. In: Proceedings of the International Marketing Trends Conference, vol. 1, p. 22 (2016)
Google Scholar
Scurati, G.W., Bertoni, M., Graziosi, S., Ferrise, F.: Exploring the use of virtual reality to support environmentally sustainable behavior: A framework to design experiences. Sustainability 13(2), 943 (2021)
Article Google Scholar
Song, W., Wang, X., Gao, Y., Hao, A., Hou, X.: Real-time expressive avatar animation generation based on monocular videos. In: Proceedigns of the IEEE International Symposium on Mixed and Augmented Reality Adjunct, pp. 429–434. IEEE (2022)
Google Scholar
Tang, M.T., Zhu, V.L., Popescu, V.: Alterecho: loose avatar-streamer coupling for expressive vtubing. In: Proceedings of the IEEE International Symposium on Mixed and Augmented Reality, pp. 128–137. IEEE (2021)
Google Scholar
Tompson, J., Goroshin, R., Jain, A., LeCun, Y., Bregler, C.: Efficient object localization using convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 648–656 (2015)
Google Scholar
Toshev, A., Szegedy, C.: Deeppose: Human pose estimation via deep neural networks. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 1653–1660 (2014)
Google Scholar
Xu, H., Bazavan, E.G., Zanfir, A., Freeman, W.T., Sukthankar, R., Sminchisescu, C.: GHUM & GHUML: generative 3D human shape and articulated pose models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6184–6193 (2020)
Google Scholar
Xu, Z., Zhou, Y., Kalogerakis, E., Landreth, C., Singh, K.: Rignet: neural rigging for articulated characters. ACM Trans. Graph. 39 (2020)
Google Scholar
Yu, T., Zheng, Z., Guo, K., Liu, P., Dai, Q., Liu, Y.: Function4d: real-time human volumetric capture from very sparse consumer RGBD sensors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5746–5756 (2021)
Google Scholar
Zhang, F., et al.: Mediapipe hands: on-device real-time hand tracking. arXiv preprint arXiv:2006.10214 (2020)
Zhang, Y., Li, Z., An, L., Li, M., Yu, T., Liu, Y.: Lightweight multi-person total motion capture using sparse multi-view cameras. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5560–5569 (2021)
Google Scholar

Download references

Acknowledgements

This research was developed in collaboration with Protocube Reply and VR@POLITO, and was supported by PON “Ricerca e Innovazione” 2014-2020 - DM 1062/2021 funds. The authors want to thank Angela D’Antonio for her contribution to the design and implementation of the software.

Author information

Authors and Affiliations

Department of Control and Computer Engineering, Politecnico di Torino, Turin, Italy
Alberto Cannavò & Fabrizio Lamberti
Protocube Reply, Turin, Italy
Roberto Pesando

Authors

Alberto Cannavò
View author publications
You can also search for this author in PubMed Google Scholar
Roberto Pesando
View author publications
You can also search for this author in PubMed Google Scholar
Fabrizio Lamberti
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alberto Cannavò .

Editor information

Editors and Affiliations

University of Salento, Lecce, Italy
Lucio Tommaso De Paolis
University of Naples Federico II, Naples, Italy
Pasquale Arpaia
CNR-STIIMA, Lecco, Italy
Marco Sacco

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cannavò, A., Pesando, R., Lamberti, F. (2023). A Framework for Animating Customized Avatars from Monocular Videos in Virtual Try-On Applications. In: De Paolis, L.T., Arpaia, P., Sacco, M. (eds) Extended Reality. XR Salento 2023. Lecture Notes in Computer Science, vol 14218. Springer, Cham. https://doi.org/10.1007/978-3-031-43401-3_5

Download citation

DOI: https://doi.org/10.1007/978-3-031-43401-3_5
Published: 05 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43400-6
Online ISBN: 978-3-031-43401-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Framework for Animating Customized Avatars from Monocular Videos in Virtual Try-On Applications

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Realistic Avatar Control Through Video-Driven Animation for Augmented Reality

Expressive Whole-Body 3D Gaussian Avatar

HAHA: Highly Articulated Gaussian Human Avatars with Textured Mesh Prior

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A Framework for Animating Customized Avatars from Monocular Videos in Virtual Try-On Applications

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Realistic Avatar Control Through Video-Driven Animation for Augmented Reality

Expressive Whole-Body 3D Gaussian Avatar

HAHA: Highly Articulated Gaussian Human Avatars with Textured Mesh Prior

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation