Three stages of 3D virtual try-on network with appearance flow and shape field

Chen, Ziyi; Yu, Feng; Jiang, Minghua; Wang, Hua; Hua, Ailing; Peng, Tao; Hu, Xinrong; Zhu, Ping

doi:10.1007/s00371-023-02946-3

Three stages of 3D virtual try-on network with appearance flow and shape field

Original article
Published: 24 July 2023

Volume 39, pages 3545–3559, (2023)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Ziyi Chen¹^na1,
Feng Yu ORCID: orcid.org/0000-0001-8252-5131¹^na1,
Minghua Jiang¹,
Hua Wang¹,
Ailing Hua¹,
Tao Peng¹,
Xinrong Hu¹ &
…
Ping Zhu¹

398 Accesses
2 Citations
Explore all metrics

Abstract

The virtual try-on technology can satisfy the demands for online shopping and help consumers experience online clothes through image generation technology. Compared with image-based try-on, the 3D virtual try-on methods can realize the multi-perspective of try-on simulation and get the attention of many researchers. The current 3D virtual try-on methods mainly base on the thin-plate spline method and depth information-based 3D reconstruction, which increases the costs of implementing 3D virtual try-on and the results lack of clothing details such as folds and patterns. To solve those problems, we propose a novel 3D virtual try-on network based on appearance flow and shape field called AFSF-3DVTON. Specifically, this network consists of three modules. First, the appearance flow warping module generates the desired warped clothes according to the appearance flow of the original clothes. Then, the flat try-on module facilitates geometric matching between the warped clothes and reference person images and synthesizes 2D try-on results. Third, to increase the image’s details to 3D try-on synthesis, the shape field-based reconstruction is adopted, which extracts shape features of 2D try-on results to improve the quality of 3D try-on reconstruction. We evaluate the proposed method on the VITON and MPV3D datasets, and several state-of-the-art virtual try-on algorithms are used as comparisons. The qualitative analyses verify the superiority of the proposed method, and the evaluation indexes, including Abs., Sq., and RMSE, demonstrate the outperformance of the proposed network.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

LG-VTON: Fashion Landmark Meets Image-Based Virtual Try-On

PF-VTON: Toward High-Quality Parser-Free Virtual Try-On Network

VTNCT: an image-based virtual try-on network by combining feature with pixel transformation

Article 22 April 2022

Data availability statement

The datasets analyzed during the current study are available in the VITON and MPV3D repository. These datasets were derived from the following public domain resources: https://drive.google.com/file/d/1MxCUvKxejnwWnoZ-KoCyMCXo3TLhRu To/view https://drive.google.com/file/d/1qcynpXZ9eSlzTV- RDCr-Yip 3GcuU314h/view?usp=sharing.

References

Han, X., Wu, Z., Wu, Z., Yu, R., Davis, L.S.: Viton: an image-based virtual try-on network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Wang, B., Zheng, H., Liang, X., Chen, Y., Lin, L., Yang, M.: Toward characteristic-preserving image-based virtual try-on network. In: Proceedings of the European Conference on Computer Vision (ECCV) (2018)
Minar, M.R., Tuan, T.T., Ahn, H., Rosin, P., Lai, Y.-K.: Cp-vton+: clothing shape and texture preserving image-based virtual try-on. In: The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops (2020)
Yang, H., Zhang, R., Guo, X., Liu, W., Zuo, W., Luo, P.: Towards photo-realistic virtual try-on by adaptively generating-preserving image content. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Hu, B., Liu, P., Zheng, Z., Ren, M.: Spg-vton: semantic prediction guidance for multi-pose virtual try-on. IEEE Trans. Multimed. 24, 1233–1246 (2022). https://doi.org/10.1109/TMM.2022.3143712
Article Google Scholar
Choi, S., Park, S., Lee, M., Choo, J.: Viton-hd: High-resolution virtual try-on via misalignment-aware normalization. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14126–14135 (2021). https://doi.org/10.1109/CVPR46437.2021.01391
Wang, K., Xie, J., Zhang, G., Liu, L., Yang, J.: Sequential 3d human pose and shape estimation from point clouds. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7273–7282 (2020). https://doi.org/10.1109/CVPR42600.2020.00730
Xiu, Y., Yang, J., Tzionas, D., Black, M.J.: Icon: implicit clothed humans obtained from normals. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2 (2022)
Jiang, H., Cai, J., Zheng, J.: Skeleton-aware 3d human shape reconstruction from point clouds. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 5430–5440 (2019). https://doi.org/10.1109/ICCV.2019.00553
Yang, Z., Wang, S., Manivasagam, S., Huang, Z., Ma, W.-C., Yan, X., Yumer, E., Urtasun, R.: S3: neural shape, skeleton, and skinning fields for 3d human modeling. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13279–13288 (2021). https://doi.org/10.1109/CVPR46437.2021.01308
Zhang, H., Meng, Y., Zhao, Y., Qian, X., Qiao, Y., Yang, X., Zheng, Y.: 3d human pose and shape reconstruction from videos via confidence-aware temporal feature aggregation. IEEE Trans. Multimed. (2022). https://doi.org/10.1109/TMM.2022.3167887
Article Google Scholar
Zhao, T., Li, S., Ngan, K.N., Wu, F.: 3-d reconstruction of human body shape from a single commodity depth camera. IEEE Trans. Multimed. 21(1), 114–123 (2019). https://doi.org/10.1109/TMM.2018.2844087
Article Google Scholar
Tewari, A., Zollhöfer, M., Bernard, F., Garrido, P., Kim, H., Pérez, P., Theobalt, C.: High-fidelity monocular face reconstruction based on an unsupervised model-based face autoencoder. IEEE Trans. Pattern Anal. Mach. Intell. 42(2), 357–370 (2020). https://doi.org/10.1109/TPAMI.2018.2876842
Article Google Scholar
Su, Z., Wan, W., Yu, T., Liu, L., Fang, L., Wang, W., Liu, Y.: Mulaycap: multi-layer human performance capture using a monocular video camera. IEEE Trans. Vis. Comput. Graphics 28(4), 1862–1879 (2022). https://doi.org/10.1109/TVCG.2020.3027763
Article Google Scholar
Han, X.-F., Laga, H., Bennamoun, M.: Image-based 3d object reconstruction: state-of-the-art and trends in the deep learning era. IEEE Trans. Pattern Anal. Mach. Intell. 43(5), 1578–1604 (2021). https://doi.org/10.1109/TPAMI.2019.2954885
Article Google Scholar
Zhao, F., Xie, Z., Kampffmeyer, M., Dong, H., Han, S., Zheng, T., Zhang, T., Liang, X.: M3d-vton: a monocular-to-3d virtual try-on network. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 13219–13229 (2021). https://doi.org/10.1109/ICCV48922.2021.01299
Liang, J., Lin, M.: Shape-aware human pose and shape reconstruction using multi-view images. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 4351–4361 (2019). https://doi.org/10.1109/ICCV.2019.00445
Bookstein, F.L.: Principal warps: thin-plate splines and the decomposition of deformations. IEEE Trans. Pattern Anal. Mach. Intell. 11(6), 567–585 (1989). https://doi.org/10.1109/34.24792
Article MATH Google Scholar
Zhou, T., Tulsiani, S., Sun, W., Malik, J., Efros, A.A.: View synthesis by appearance flow. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision—ECCV 2016, pp. 286–301. Springer, Cham (2016)
Han, X., Huang, W., Hu, X., Scott, M.: Clothflow: a flow-based model for clothed person generation. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 10470–10479 (2019). https://doi.org/10.1109/ICCV.2019.01057
Dong, H., Liang, X., Shen, X., Wu, B., Chen, B.-C., Yin, J.: Fw-gan: flow-navigated warping gan for video virtual try-on. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1161–1170 (2019). https://doi.org/10.1109/ICCV.2019.00125
Chen, C.-Y., Lo, L., Huang, P.-J., Shuai, H.-H., Cheng, W.-H.: Fashionmirror: co-attention feature-remapping virtual try-on with sequential template poses. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 13789–13798 (2021). https://doi.org/10.1109/ICCV48922.2021.01355
Chopra, A., Jain, R., Hemani, M., Krishnamurthy, B.: Zflow: Gated appearance flow-based virtual try-on with 3d priors. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 5413–5422 (2021). https://doi.org/10.1109/ICCV48922.2021.00538
Ge, Y., Song, Y., Zhang, R., Ge, C., Liu, W., Luo, P.: Parser-free virtual try-on via distilling appearance flows. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8481–8489 (2021). https://doi.org/10.1109/CVPR46437.2021.00838
Gundogdu, E., Constantin, V., Seifoddini, A., Dang, M., Salzmann, M., Fua, P.: Garnet: a two-stream network for fast and accurate 3d cloth draping. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 8738–8747 (2019). https://doi.org/10.1109/ICCV.2019.00883
Mir, A., Alldieck, T., Pons-Moll, G.: Learning to transfer texture from clothing images to 3d humans. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7021–7032 (2020). https://doi.org/10.1109/CVPR42600.2020.00705
Chaudhuri, B., Sarafianos, N., Shapiro, L., Tung, T.: Semi-supervised synthesis of high-resolution editable textures for 3d humans. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7987–7996 (2021). https://doi.org/10.1109/CVPR46437.2021.00790
Gabeur, V., Franco, J.-S., Martin, X., Schmid, C., Rogez, G.: Moulding humans: non-parametric 3d human shape estimation from single images. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 2232–2241 (2019). https://doi.org/10.1109/ICCV.2019.00232
Saito, S., Simon, T., Saragih, J., Joo, H.: Pifuhd: multi-level pixel-aligned implicit function for high-resolution 3d human digitization. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 81–90 (2020). https://doi.org/10.1109/CVPR42600.2020.00016
Mustafa, A., Caliskan, A., Agapito, L., Hilton, A.: Multi-person implicit reconstruction from a single image. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14469–14478 (2021). https://doi.org/10.1109/CVPR46437.2021.01424
Du, C., Yu, F., Jiang, M., Hua, A., Wei, X., Peng, T., Hu, X.: Vton-scfa: a virtual try-on network based on the semantic constraints and flow alignment. IEEE Trans. Multimed. (2022). https://doi.org/10.1109/TMM.2022.3152367
Article Google Scholar
Gong, K., Liang, X., Zhang, D., Shen, X., Lin, L.: Look into person: self-supervised structure-sensitive learning and a new benchmark for human parsing. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 932–940 (2017)
Oktay, O., Schlemper, J., Folgoc, L.L., Lee, M., Heinrich, M., Misawa, K., Mori, K., McDonagh, S., Hammerla, N.Y., Kainz, B., et al.: Attention u-net: learning where to look for the pancreas. arXiv:1804.03999 (2018)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2014)
Isola, P., Zhu, J.-Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1125–1134 (2017)
Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: European Conference on Computer Vision, pp. 694–711 (2016). Springer
Grant, E., Kohli, P., Gerven, M.v.: Deep disentangled representations for volumetric reconstruction. In: European Conference on Computer Vision, pp. 266–279 (2016). Springer
Sun, X., Wu, J., Zhang, X., Zhang, Z., Zhang, C., Xue, T., Tenenbaum, J.B., Freeman, W.T.: Pix3d: dataset and methods for single-image 3d shape modeling. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2974–2983 (2018). https://doi.org/10.1109/CVPR.2018.00314
Wu, J., Zhang, C., Zhang, X., Zhang, Z., Freeman, W.T., Tenenbaum, J.B.: Learning shape priors for single-view 3d completion and reconstruction. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 646–662 (2018)
Sobel, I., Feldman, G., et al.: A 3x3 isotropic gradient operator for image processing. A talk at the Stanford Artificial Project in, 271–272 (1968)
Hu, J., Ozay, M., Zhang, Y., Okatani, T.: Revisiting single image depth estimation: toward higher resolution maps with accurate object boundaries. In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1043–1051 (2019). IEEE
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)
Article Google Scholar
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. In: Guyon I., Von Luxburg U., Bengio S., Wallach H., Fergus R., Vishwanathan S., Garnett R. (eds.) Advances in Neural Information Processing Systems. Curran Associates, Inc., vol. 30, pp.1–12 (2017)
Wang, Z., Chen, J., Hoi, S.C.H.: Deep learning for image super-resolution: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 43(10), 3365–3387 (2021). https://doi.org/10.1109/TPAMI.2020.2982166
Article Google Scholar
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 586–595 (2018). https://doi.org/10.1109/CVPR.2018.00068
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv:1412.6980 (2014)
Jandial, S., Chopra, A., Ayush, K., Hemani, M., Krishnamurthy, B., Halwai, A.: Sievenet: a unified framework for robust image-based virtual try-on. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 2182–2190 (2020)
Ge, C., Song, Y., Ge, Y., Yang, H., Liu, W., Luo, P.: Disentangled cycle consistency for highly-realistic virtual try-on. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16928–16937 (2021)
Tang, S., Tan, F., Cheng, K., Li, Z., Zhu, S., Tan, P.: A neural network for detailed human depth estimation from a single image. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7750–7759 (2019)
Wang, L., Zhao, X., Yu, T., Wang, S., Liu, Y.: Normalgan: learning detailed 3d human from a single rgb-d image. In: European Conference on Computer Vision, pp. 430–446 (2020). Springer
Saito, S., Huang, Z., Natsume, R., Morishima, S., Li, H., Kanazawa, A.: Pifu: pixel-aligned implicit function for high-resolution clothed human digitization. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 2304–2314 (2019). https://doi.org/10.1109/ICCV.2019.00239

Download references

Funding

This work was supported by the National natural science foundation of China (No. 62202346), Hubei key research and development program (No. 2021BAA042), open project of engineering research center of Hubei province for clothing information (No. 2022HBCI01), Wuhan applied basic frontier research project (No. 2022013988065212), MIIT’s AI Industry Innovation Task unveils flagship projects (Key technologies, equipment, and systems for flexible customized and intelligent manufacturing in the clothing industry), and Hubei science and technology project of safe production special fund (Scene control platform based on proprioception information computing of artificial intelligence).

Author information

Ziyi Chen and Feng Yu have contributed equally to this work.

Authors and Affiliations

Department of Computer and Artificial Intelligent, Wuhan Textile University, Yangguang Street, Wuhan, 430200, Hubei, China
Ziyi Chen, Feng Yu, Minghua Jiang, Hua Wang, Ailing Hua, Tao Peng, Xinrong Hu & Ping Zhu

Authors

Ziyi Chen
View author publications
You can also search for this author in PubMed Google Scholar
Feng Yu
View author publications
You can also search for this author in PubMed Google Scholar
Minghua Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Hua Wang
View author publications
You can also search for this author in PubMed Google Scholar
Ailing Hua
View author publications
You can also search for this author in PubMed Google Scholar
Tao Peng
View author publications
You can also search for this author in PubMed Google Scholar
Xinrong Hu
View author publications
You can also search for this author in PubMed Google Scholar
Ping Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Feng Yu.

Ethics declarations

Conflict of interest

All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.

Ethics approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Chen, Z., Yu, F., Jiang, M. et al. Three stages of 3D virtual try-on network with appearance flow and shape field. Vis Comput 39, 3545–3559 (2023). https://doi.org/10.1007/s00371-023-02946-3

Download citation

Accepted: 09 June 2023
Published: 24 July 2023
Issue Date: August 2023
DOI: https://doi.org/10.1007/s00371-023-02946-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Three stages of 3D virtual try-on network with appearance flow and shape field

Abstract

Access this article

Similar content being viewed by others

LG-VTON: Fashion Landmark Meets Image-Based Virtual Try-On

PF-VTON: Toward High-Quality Parser-Free Virtual Try-On Network

VTNCT: an image-based virtual try-on network by combining feature with pixel transformation

Data availability statement

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethics approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Three stages of 3D virtual try-on network with appearance flow and shape field

Abstract

Access this article

Similar content being viewed by others

LG-VTON: Fashion Landmark Meets Image-Based Virtual Try-On

PF-VTON: Toward High-Quality Parser-Free Virtual Try-On Network

VTNCT: an image-based virtual try-on network by combining feature with pixel transformation

Data availability statement

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethics approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation