Fabric Material Recovery from Video Using Multi-scale Geometric Auto-Encoder

Liang, Junbang; Lin, Ming

doi:10.1007/978-3-031-19836-6_39

Junbang Liang¹² &
Ming Lin¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13697))

Included in the following conference series:

European Conference on Computer Vision

2114 Accesses
1 Citations

Abstract

Fabric materials are central to recreating realistic appearance of avatars in a virtual world and many VR applications, ranging from virtual try-on, teleconferencing, to character animation. We propose an end-to-end network model that uses video input to estimate the fabric materials of the garment worn by a human or an avatar in a virtual world. To achieve the high accuracy, we jointly learn human body and the garment geometry as conditions to material prediction. Due to the highly dynamic and deformable nature of cloth, general data-driven garment modeling remains a challenge. To address this problem, we propose a two-level auto-encoder to account for both global and local features of any garment geometry that would directly affect material perception. Using this network, we can also achieve smooth geometry transitioning between different garment topologies. During the estimation, we use a closed-loop optimization structure to share information between tasks and feed the learned garment features for temporal estimation of garment materials. Experiments show that our proposed network structures greatly improve the material classification accuracy by 1.5x, with applicability to unseen input. It also runs at least three orders of magnitude faster than the state-of-the-art [59, 61]. We demonstrate the recovered fabric materials on virtual try-on, where we recreate the entire avatar appearance, including body shape and pose, garment geometry and materials from only a single video.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Alldieck, T., Magnor, M.A., Bhatnagar, B.L., Theobalt, C., Pons-Moll, G.: Learning to reconstruct people in clothing from a single RGB camera. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, 16–20 June 2019, pp. 1175–1186. Computer Vision Foundation/IEEE (2019). https://doi.org/10.1109/CVPR.2019.00127, http://openaccess.thecvf.com/content_CVPR_2019/html/Alldieck_Learning_to_Reconstruct_People_in_Clothing_From_a_Single_RGB_CVPR_2019_paper.html
Alldieck, T., Magnor, M.A., Xu, W., Theobalt, C., Pons-Moll, G.: Video based reconstruction of 3d people models. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, 18–22 June 2018, pp. 8387–8397. IEEE Computer Society (2018). https://doi.org/10.1109/CVPR.2018.00875, http://openaccess.thecvf.com/content_cvpr_2018/html/Alldieck_Video_Based_Reconstruction_CVPR_2018_paper.html
Alldieck, T., Pons-Moll, G., Theobalt, C., Magnor, M.A.: Tex2shape: detailed full human body geometry from a single image. In: 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27–November 2, 2019, pp. 2293–2303. IEEE (2019). https://doi.org/10.1109/ICCV.2019.00238
Bednarik, J., Parashar, S., Gundogdu, E., Salzmann, M., Fua, P.: Shape reconstruction by learning differentiable surface representations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4716–4725 (2020)
Google Scholar
Bhat, K.S., Twigg, C.D., Hodgins, J.K., Khosla, P., Popovic, Z., Seitz, S.M.: Estimating cloth simulation parameters from video (2003)
Google Scholar
Bhatnagar, B.L., Tiwari, G., Theobalt, C., Pons-Moll, G.: Multi-garment net: learning to dress 3d people from images. In: 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27–November 2, 2019, pp. 5419–5429. IEEE (2019). https://doi.org/10.1109/ICCV.2019.00552
Bi, W., Jin, P., Nienborg, H., Xiao, B.: Estimating mechanical properties of cloth from videos using dense motion trajectories: Human psychophysics and machine learning. J. Vision 18(5), 12–12 (2018)
Article Google Scholar
Bi, W., Xiao, B.: Perceptual constancy of mechanical properties of cloth under variation of external forces. In: Proceedings of the ACM Symposium on Applied Perception, pp. 19–23 (2016)
Google Scholar
Bickel, B., et al.: Design and fabrication of materials with desired deformation behavior. ACM Trans. Graph. (TOG) 29(4), 1–10 (2010)
Article Google Scholar
Bouman, K.L., Xiao, B., Battaglia, P., Freeman, W.T.: Estimating the material properties of fabric from video. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1984–1991 (2013)
Google Scholar
Bradley, D., Popa, T., Sheffer, A., Heidrich, W., Boubekeur, T.: Markerless garment capture. In: ACM SIGGRAPH 2008 papers, pp. 1–9 (2008)
Google Scholar
Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2d pose estimation using part affinity fields. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7291–7299 (2017)
Google Scholar
Casati, R., Daviet, G., Bertails-Descoubes, F.: Inverse elastic cloth design with contact and friction. Ph.D. thesis, Inria Grenoble Rhône-Alpes, Université de Grenoble (2016)
Google Scholar
Chen, X., Zhou, B., Lu, F.X., Wang, L., Bi, L., Tan, P.: Garment modeling with a depth camera. ACM Trans. Graph. 34(6), 203–1 (2015)
Article Google Scholar
Clyde, D., Teran, J., Tamstorf, R.: Modeling and data-driven parameter estimation for woven fabrics. In: Proceedings of the ACM SIGGRAPH/Eurographics Symposium on Computer Animation, pp. 1–11 (2017)
Google Scholar
Daněřek, R., Dibra, E., Öztireli, C., Ziegler, R., Gross, M.: Deepgarment: 3d garment shape estimation from a single image. In: Computer Graphics Forum, vol. 36, pp. 269–280. Wiley Online Library (2017)
Google Scholar
Deprelle, T., Groueix, T., Fisher, M., Kim, V., Russell, B., Aubry, M.: Learning elementary structures for 3d shape generation and matching. In: Advances in Neural Information Processing Systems, pp. 7433–7443 (2019)
Google Scholar
Feydy, J., Séjourné, T., Vialard, F.X., Amari, S.I., Trouvé, A., Peyré, G.: Interpolating between optimal transport and mmd using sinkhorn divergences. arXiv preprint arXiv:1810.08278 (2018)
Gong, W., et al.: Human pose estimation from monocular images: a comprehensive survey. Sensors 16(12), 1966 (2016)
Article Google Scholar
Guarnera, G.C., Hall, P., Chesnais, A., Glencross, M.: Woven fabric model creation from a single image. ACM Trans. Graph. (TOG) 36(5), 1–13 (2017)
Article Google Scholar
Gundogdu, E., Constantin, V., Seifoddini, A., Dang, M., Salzmann, M., Fua, P.: Garnet: a two-stream network for fast and accurate 3d cloth draping. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 8739–8748 (2019)
Google Scholar
Habermann, M., Xu, W., Zollhoefer, M., Pons-Moll, G., Theobalt, C.: Deepcap: monocular human performance capture using weak supervision. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, June 2020
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Huang, Z., Xu, Y., Lassner, C., Li, H., Tung, T.: Arch: animatable reconstruction of clothed humans. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3093–3102 (2020)
Google Scholar
Jeong, M.H., Han, D.H., Ko, H.S.: Garment capture from a photograph. Comput. Animation Virtual Worlds 26(3–4), 291–300 (2015)
Article Google Scholar
Jiang, B., Zhang, J., Hong, Y., Luo, J., Liu, L., Bao, H.: Bcnet: learning body and cloth shape from a single image. arXiv preprint arXiv:2004.00214 (2020)
Kanazawa, A., Black, M.J., Jacobs, D.W., Malik, J.: End-to-end recovery of human shape and pose. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7122–7131 (2018)
Google Scholar
Klokov, R., Lempitsky, V.: Escape from cells: Deep KD-networks for the recognition of 3d point cloud models. In: The IEEE International Conference on Computer Vision (ICCV), October 2017
Google Scholar
Kolotouros, N., Pavlakos, G., Black, M.J., Daniilidis, K.: Learning to reconstruct 3d human pose and shape via model-fitting in the loop. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2252–2261 (2019)
Google Scholar
Lahner, Z., Cremers, D., Tung, T.: Deepwrinkles: accurate and realistic clothing modeling. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 667–684 (2018)
Google Scholar
Li, J., Chen, B.M., Hee Lee, G.: So-net: self-organizing network for point cloud analysis. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018
Google Scholar
Li, Y., Bu, R., Sun, M., Wu, W., Di, X., Chen, B.: PointCNN: convolution on x-transformed points. In: Advances in Neural Information Processing Systems, pp. 820–830 (2018)
Google Scholar
Liang, J., Lin, M.C.: Shape-aware human pose and shape reconstruction using multi-view images. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4352–4362 (2019)
Google Scholar
Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: a skinned multi-person linear model. ACM Trans. Graph. (TOG) 34(6), 1–16 (2015)
Article Google Scholar
Ma, Q., Saito, S., Yang, J., Tang, S., Black, M.J.: Scale: modeling clothed humans with a surface codec of articulated local elements. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16082–16093 (2021)
Google Scholar
Mehta, D., et al.: VNect: real-time 3d human pose estimation with a single RGB camera. ACM Trans. Graph. (TOG) 36(4), 1–14 (2017)
Article Google Scholar
Miguel, E., et al.: Data-driven estimation of cloth simulation models. In: Computer Graphics Forum, vol. 31, pp. 519–528. Wiley Online Library (2012)
Google Scholar
Miguel, E., et al.: Modeling and estimation of internal friction in cloth. ACM Trans. Graph. (TOG) 32(6), 1–10 (2013)
Article MathSciNet Google Scholar
Patel, C., Liao, Z., Pons-Moll, G.: TailorNet: predicting clothing in 3d as a function of human pose, shape and garment style. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, June 2020
Google Scholar
Pumarola, A., Sanchez-Riera, J., Choi, G., Sanfeliu, A., Moreno-Noguer, F.: 3dpeople: Modeling the geometry of dressed humans. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2242–2251 (2019)
Google Scholar
Qi, C.R., Liu, W., Wu, C., Su, H., Guibas, L.J.: Frustum pointnets for 3d object detection from RGB-d data. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018
Google Scholar
Qi, C.R., Su, H., Mo, K., Guibas, L.J.: Pointnet: deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 652–660 (2017)
Google Scholar
Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++: deep hierarchical feature learning on point sets in a metric space. In: Advances in Neural Information Processing Systems, pp. 5099–5108 (2017)
Google Scholar
Rasheed, A.H., Romero, V., Bertails-Descoubes, F., Wuhrer, S., Franco, J.S., Lazarus, A.: Learning to measure the static friction coefficient in cloth contact. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9912–9921 (2020)
Google Scholar
Saito, S., Huang, Z., Natsume, R., Morishima, S., Kanazawa, A., Li, H.: PIFu: Pixel-aligned implicit function for high-resolution clothed human digitization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2304–2314 (2019)
Google Scholar
Saltelli, A.: Sensitivity analysis for importance assessment. Risk Anal. 22(3), 579–590 (2002)
Article Google Scholar
Santesteban, I., Otaduy, M.A., Casas, D.: Learning-based animation of clothing for virtual try-on. In: Computer Graphics Forum, vol. 38, pp. 355–366. Wiley Online Library (2019)
Google Scholar
Smith, D., Loper, M., Hu, X., Mavroidis, P., Romero, J.: Facsimile: fast and accurate scans from an image in less than a second. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5330–5339 (2019)
Google Scholar
Tan, Q., Pan, Z., Gao, L., Manocha, D.: Realtime simulation of thin-shell deformable materials using CNN-based mesh embedding. IEEE Robot. Autom. Lett. 5(2), 2325–2332 (2020)
Article Google Scholar
Thies, J., Zollhöfer, M., Nießner, M.: Deferred neural rendering: image synthesis using neural textures. ACM Trans. Graph. (TOG) 38(4), 1–12 (2019)
Article Google Scholar
Tiwari, G., Bhatnagar, B.L., Tung, T., Pons-Moll, G.: Sizer: a dataset and model for parsing 3d clothing and learning size sensitive 3d clothing. arXiv preprint arXiv:2007.11610 (2020)
Varol, G., et al.: Bodynet: Volumetric inference of 3d human body shapes. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 20–36 (2018)
Google Scholar
Vidaurre, R., Casas, D., Garces, E., Lopez-Moreno, J.: BRDF estimation of complex materials with nested learning. In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1347–1356. IEEE (2019)
Google Scholar
Wang, H., O’Brien, J.F., Ramamoorthi, R.: Data-driven elastic models for cloth: modeling and measurement. ACM Trans. Graph. (TOG) 30(4), 1–12 (2011)
Google Scholar
Wang, T.Y., Ceylan, D., Popovic, J., Mitra, N.J.: Learning a shared shape space for multimodal garment design. arXiv preprint arXiv:1806.11335 (2018)
Wang, Y., Sun, Y., Liu, Z., Sarma, S.E., Bronstein, M.M., Solomon, J.M.: Dynamic graph CNN for learning on point clouds. ACM Trans. Graph. (TOG) 38(5), 1–12 (2019)
Article Google Scholar
Wei, S.E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4724–4732 (2016)
Google Scholar
Xu, Y., Zhu, S.C., Tung, T.: DenseRaC: joint 3d pose and shape estimation by dense render-and-compare. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 7760–7770 (2019)
Google Scholar
Yang, S., et al.: Detailed garment recovery from a single-view image. arXiv preprint arXiv:1608.01250 (2016)
Yang, S., Liang, J., Lin, M.C.: Learning-based cloth material recovery from video. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4383–4393 (2017)
Google Scholar
Yang, S., et al.: Physics-inspired garment recovery from a single-view image. ACM Trans. Graph. (TOG) 37(5), 1–14 (2018)
Article Google Scholar
Yu, T., et al.: SimulCap: Single-view human performance capture with cloth simulation. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5499–5509. IEEE (2019)
Google Scholar
Zakharkin, I., Mazur, K., Grigorev, A., Lempitsky, V.: Point-based modeling of human clothing. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 14718–14727 (2021)
Google Scholar
Zheng, Z., Yu, T., Wei, Y., Dai, Q., Liu, Y.: DeepHuman: 3d human reconstruction from a single image. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 7739–7749 (2019)
Google Scholar
Zhou, B., Chen, X., Fu, Q., Guo, K., Tan, P.: Garment modeling from a single image. In: Computer Graphics Forum, vol. 32, pp. 85–91. Wiley Online Library (2013)
Google Scholar
Zhou, Y., Tuzel, O.: VoxelNet: End-to-end learning for point cloud based 3d object detection. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018
Google Scholar
Zhu, H., et al.: Deep fashion3d: a dataset and benchmark for 3d garment reconstruction from single images. arXiv preprint arXiv:2003.12753 (2020)

Download references

Author information

Authors and Affiliations

Amazon, Seattle, USA
Junbang Liang
University of Maryland, College Park, USA
Ming Lin

Authors

Junbang Liang
View author publications
You can also search for this author in PubMed Google Scholar
Ming Lin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Junbang Liang .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liang, J., Lin, M. (2022). Fabric Material Recovery from Video Using Multi-scale Geometric Auto-Encoder. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13697. Springer, Cham. https://doi.org/10.1007/978-3-031-19836-6_39

Download citation

DOI: https://doi.org/10.1007/978-3-031-19836-6_39
Published: 22 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19835-9
Online ISBN: 978-3-031-19836-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Fabric Material Recovery from Video Using Multi-scale Geometric Auto-Encoder