Skip to main content

A Framework for Animating Customized Avatars from Monocular Videos in Virtual Try-On Applications

  • Conference paper
  • First Online:
Extended Reality (XR Salento 2023)

Abstract

Generating real-time animations for customized avatars is becoming of paramount importance, especially in Virtual Try-On applications. This technology allows customers to explore or “try on” products virtually. Despite the numerous benefits of this technology, there are some aspects that prevent its applicability in real scenarios. The first limitation regards the difficulties in generating expressive avatar animations. Moreover, potential customers usually expressed concerns regarding the fidelity of the animations. To overcome these two limitations, the current paper is aimed at presenting a framework for animating customized avatars based on state-of-the-art techniques. The focus of the proposed work mainly relies on aspects regarding the animation of the customized avatars. More specifically, the framework encompasses two components. The first one automatizes the operations needed for generating the data structures used for the avatar animation. This component assumes that the mesh of the avatar is described through the Sparse Unified Part-Based Human Representation (SUPR). The second component of the framework is designed to animate the avatar through motion capture by making use of the MediaPipe Holistic pipeline. Experimental evaluations were carried out aimed at assessing the solutions proposed for pose beautification and joint estimations. Results demonstrated improvements in the quality of the reconstructed animation from both an objective and subjective point of view.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Statistics on online sales: https://tinyurl.com/fashion-ecommerce-value.

  2. 2.

    Blender: https://www.blender.org/.

  3. 3.

    Autodesk Maya: https://tinyurl.com/bda6pcwa.

  4. 4.

    Holistic landmarks detection: https://tinyurl.com/yc3m6b9v.

  5. 5.

    Mixamo: https://www.mixamo.com/#/.

  6. 6.

    Auto-Rig Pro: https://blendermarket.com/products/auto-rig-pro.

  7. 7.

    Rigify: https://tinyurl.com/rigify.

  8. 8.

    Unity: https://unity.com/.

  9. 9.

    MediaPipe Guide: https://tinyurl.com/MediaPipe-Holistic.

  10. 10.

    MediaPipeUnityPlugin Public: https://github.com/homuler/MediaPipeUnity.

  11. 11.

    Face landmark detection guide: https://tinyurl.com/3vx5fjv9.

References

  1. Achenbach, J., Waltemate, T., Latoschik, M.E., Botsch, M.: Fast generation of realistic virtual humans. In: Proceedings of the ACM Symposium on Virtual Reality Software and Technology, pp. 1–10 (2017)

    Google Scholar 

  2. Arora, R., Kazi, R.H., Kaufman, D.M., Li, W., Singh, K.: Magicalhands: mid-air hand gestures for animating in VR. In: Proceedings of the ACM Symposium on User Interface Software and Technology, pp. 463–477 (2019)

    Google Scholar 

  3. Baran, I., Popović, J.: Automatic rigging and animation of 3d characters. ACM Transa. Graph. 26(3), 72-es (2007)

    Google Scholar 

  4. Bazarevsky, V., Grishchenko, I., Raveendran, K., Zhu, T., Zhang, F., Grundmann, M.: Blazepose: on-device real-time body pose tracking. arXiv preprint arXiv:2006.10204 (2020)

  5. Blázquez, M.: Fashion shopping in multichannel retail: the role of technology in enhancing the customer experience. Int. J. Electron. Commer. 18(4), 97–116 (2014)

    Article  Google Scholar 

  6. Cannavò, A., Lamberti, F., et al.: A virtual character posing system based on reconfigurable tangible user interfaces and immersive virtual reality. In: Proceedings of the Smart Tools and Applications in Graphics, pp. 1–11. Eurographics (2018)

    Google Scholar 

  7. Cannavò, A., Pratticò, F.G., Ministeri, G., Lamberti, F.: A movement analysis system based on immersive virtual reality and wearable technology for sport training. In: Proceedings of the International Conference on Virtual Reality, pp. 26–31 (2018)

    Google Scholar 

  8. Cao, C., Weng, Y., Zhou, S., Tong, Y., Zhou, K.: Facewarehouse: a 3D facial expression database for visual computing. IEEE Trans. Vis. Comput. Graph. 20(3), 413–425 (2013)

    Google Scholar 

  9. Cao, Z., Hidalgo, G., Simon, T., Wei, S.E., Sheikh, Y.: Openpose: realtime multi-person 2D pose estimation using part affinity fields. IEEE Trans. Pattern Anal. Mach. Intell. 43(1), 172–186 (2021)

    Article  Google Scholar 

  10. Cudeiro, D., Bolkart, T., Laidlaw, C., Ranjan, A., Black, M.J.: Capture, learning, and synthesis of 3D speaking styles. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10101–10111 (2019)

    Google Scholar 

  11. Fang, H.S., Xie, S., Tai, Y.W., Lu, C.: RMPE: regional multi-person pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision. pp. 2334–2343 (2017)

    Google Scholar 

  12. Gao, Y., Petersson Brooks, E., Brooks, A.L.: The performance of self in the context of shopping in a virtual dressing room system. In: Nah, F.F.-H. (ed.) HCIB 2014. LNCS, vol. 8527, pp. 307–315. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07293-7_30

    Chapter  Google Scholar 

  13. Hangaragi, S., Singh, T., Neelima, N.: Face detection and recognition using face mesh and deep neural network. Procedia Comput. Sci. 218, 741–749 (2023)

    Article  Google Scholar 

  14. Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3. 6m: large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1325–1339 (2013)

    Google Scholar 

  15. James, D.L., Twigg, C.D.: Skinning mesh animations. ACM Trans. Graph. 24(3), 399–407 (2005)

    Google Scholar 

  16. John, V., Trucco, E.: Charting-based subspace learning for video-based human action classification. Mach. Vis. Appl. 25, 119–132 (2014)

    Article  Google Scholar 

  17. Knöpfle, C., Jung, Y.: The virtual human platform: simplifying the use of virtual characters. Int. J. Virtual Reality 5(2), 25–30 (2006)

    Article  Google Scholar 

  18. Kulkarni, S., Deshmukh, S., Fernandes, F., Patil, A., Jabade, V.: Poseanalyser: a survey on human pose estimation. SN Comput. Sci. 4(2), 136 (2023)

    Article  Google Scholar 

  19. Lagė, A., Ancutienė, K.: Virtual try-on technologies in the clothing industry: basic block pattern modification. Int. J. Cloth. Sci. Technol. (2019)

    Google Scholar 

  20. Lee, H., Xu, Y.: Classification of virtual fitting room technologies in the fashion industry: from the perspective of consumer experience. Int. J. Fashion Des. Technol. Educ. 13(1), 1–10 (2020)

    Article  MathSciNet  Google Scholar 

  21. Liu, Y., Liu, Y., Xu, S., Cheng, K., Masuko, S., Tanaka, J.: Comparing VR-and AR-based try-on systems using personalized avatars. Electronics 9(11), 1814 (2020)

    Article  Google Scholar 

  22. Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: a skinned multi-person linear model. ACM Trans. Graph. 34(6), 1–16 (2015)

    Article  Google Scholar 

  23. Lugaresi, C., et al.: Mediapipe: a framework for building perception pipelines. arXiv preprint arXiv:1906.08172 (2019)

  24. Mahmood, N., Ghorbani, N., Troje, N.F., Pons-Moll, G., Black, M.J.: Amass: archive of motion capture as surface shapes. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5442–5451 (2019)

    Google Scholar 

  25. Maji, D., Nagori, S., Mathew, M., Poddar, D.: Yolo-pose: enhancing yolo for multi person pose estimation using object keypoint similarity loss. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2637–2646 (2022)

    Google Scholar 

  26. Nunnari, F., Heloir, A.: Yet another low-level agent handler. Comput. Animat. Virtual Worlds 30(3–4), e1891 (2019). https://doi.org/10.1002/cav.1891, https://onlinelibrary.wiley.com/doi/abs/10.1002/cav.1891

  27. Osman, A.A.A., Bolkart, T., Black, M.J.: STAR: sparse trained articulated human body regressor. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12351, pp. 598–613. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58539-6_36

    Chapter  Google Scholar 

  28. Osman, A.A., Bolkart, T., Tzionas, D., Black, M.J.: SUPR: a sparse unified part-based human representation. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. Lecture Notes in Computer Science, vol. 13662, pp. 568–585. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20086-1_33

    Chapter  Google Scholar 

  29. Park, S.I., Shin, H.J., Kim, T.H., Shin, S.Y.: On-line motion blending for real-time locomotion generation. Comput. Animat. Virtual Worlds 15(3–4), 125–138 (2004)

    Article  Google Scholar 

  30. Parmar, D., Olafsson, S., Utami, D., Murali, P., Bickmore, T.: Designing empathic virtual agents: manipulating animation, voice, rendering, and empathy to create persuasive agents. Auton. Agent. Multi-Agent Syst. 36(1), 17 (2022)

    Article  Google Scholar 

  31. Pavlakos, G., et al.: Expressive body capture: 3D hands, face, and body from a single image. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10975–10985 (2019)

    Google Scholar 

  32. Rumman, N.A., Fratarcangeli, M.: Skin deformation methods for interactive character animation. In: Braz, J., et al. (eds.) VISIGRAPP 2016. CCIS, vol. 693, pp. 153–174. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-64870-5_8

    Chapter  Google Scholar 

  33. Savastano, M., Barnabei, R., Ricotta, F.: Going online while purchasing offline: an explorative analysis of omnichannel shopping behaviour in retail settings. In: Proceedings of the International Marketing Trends Conference, vol. 1, p. 22 (2016)

    Google Scholar 

  34. Scurati, G.W., Bertoni, M., Graziosi, S., Ferrise, F.: Exploring the use of virtual reality to support environmentally sustainable behavior: A framework to design experiences. Sustainability 13(2), 943 (2021)

    Article  Google Scholar 

  35. Song, W., Wang, X., Gao, Y., Hao, A., Hou, X.: Real-time expressive avatar animation generation based on monocular videos. In: Proceedigns of the IEEE International Symposium on Mixed and Augmented Reality Adjunct, pp. 429–434. IEEE (2022)

    Google Scholar 

  36. Tang, M.T., Zhu, V.L., Popescu, V.: Alterecho: loose avatar-streamer coupling for expressive vtubing. In: Proceedings of the IEEE International Symposium on Mixed and Augmented Reality, pp. 128–137. IEEE (2021)

    Google Scholar 

  37. Tompson, J., Goroshin, R., Jain, A., LeCun, Y., Bregler, C.: Efficient object localization using convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 648–656 (2015)

    Google Scholar 

  38. Toshev, A., Szegedy, C.: Deeppose: Human pose estimation via deep neural networks. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 1653–1660 (2014)

    Google Scholar 

  39. Xu, H., Bazavan, E.G., Zanfir, A., Freeman, W.T., Sukthankar, R., Sminchisescu, C.: GHUM & GHUML: generative 3D human shape and articulated pose models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6184–6193 (2020)

    Google Scholar 

  40. Xu, Z., Zhou, Y., Kalogerakis, E., Landreth, C., Singh, K.: Rignet: neural rigging for articulated characters. ACM Trans. Graph. 39 (2020)

    Google Scholar 

  41. Yu, T., Zheng, Z., Guo, K., Liu, P., Dai, Q., Liu, Y.: Function4d: real-time human volumetric capture from very sparse consumer RGBD sensors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5746–5756 (2021)

    Google Scholar 

  42. Zhang, F., et al.: Mediapipe hands: on-device real-time hand tracking. arXiv preprint arXiv:2006.10214 (2020)

  43. Zhang, Y., Li, Z., An, L., Li, M., Yu, T., Liu, Y.: Lightweight multi-person total motion capture using sparse multi-view cameras. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5560–5569 (2021)

    Google Scholar 

Download references

Acknowledgements

This research was developed in collaboration with Protocube Reply and VR@POLITO, and was supported by PON “Ricerca e Innovazione” 2014-2020 - DM 1062/2021 funds. The authors want to thank Angela D’Antonio for her contribution to the design and implementation of the software.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alberto Cannavò .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Cannavò, A., Pesando, R., Lamberti, F. (2023). A Framework for Animating Customized Avatars from Monocular Videos in Virtual Try-On Applications. In: De Paolis, L.T., Arpaia, P., Sacco, M. (eds) Extended Reality. XR Salento 2023. Lecture Notes in Computer Science, vol 14218. Springer, Cham. https://doi.org/10.1007/978-3-031-43401-3_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-43401-3_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-43400-6

  • Online ISBN: 978-3-031-43401-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics