Skip to main content
Log in

Three stages of 3D virtual try-on network with appearance flow and shape field

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

The virtual try-on technology can satisfy the demands for online shopping and help consumers experience online clothes through image generation technology. Compared with image-based try-on, the 3D virtual try-on methods can realize the multi-perspective of try-on simulation and get the attention of many researchers. The current 3D virtual try-on methods mainly base on the thin-plate spline method and depth information-based 3D reconstruction, which increases the costs of implementing 3D virtual try-on and the results lack of clothing details such as folds and patterns. To solve those problems, we propose a novel 3D virtual try-on network based on appearance flow and shape field called AFSF-3DVTON. Specifically, this network consists of three modules. First, the appearance flow warping module generates the desired warped clothes according to the appearance flow of the original clothes. Then, the flat try-on module facilitates geometric matching between the warped clothes and reference person images and synthesizes 2D try-on results. Third, to increase the image’s details to 3D try-on synthesis, the shape field-based reconstruction is adopted, which extracts shape features of 2D try-on results to improve the quality of 3D try-on reconstruction. We evaluate the proposed method on the VITON and MPV3D datasets, and several state-of-the-art virtual try-on algorithms are used as comparisons. The qualitative analyses verify the superiority of the proposed method, and the evaluation indexes, including Abs., Sq., and RMSE, demonstrate the outperformance of the proposed network.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Data availability statement

The datasets analyzed during the current study are available in the VITON and MPV3D repository. These datasets were derived from the following public domain resources: https://drive.google.com/file/d/1MxCUvKxejnwWnoZ-KoCyMCXo3TLhRu To/viewhttps://drive.google.com/file/d/1qcynpXZ9eSlzTV- RDCr-Yip 3GcuU314h/view?usp=sharing.

References

  1. Han, X., Wu, Z., Wu, Z., Yu, R., Davis, L.S.: Viton: an image-based virtual try-on network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)

  2. Wang, B., Zheng, H., Liang, X., Chen, Y., Lin, L., Yang, M.: Toward characteristic-preserving image-based virtual try-on network. In: Proceedings of the European Conference on Computer Vision (ECCV) (2018)

  3. Minar, M.R., Tuan, T.T., Ahn, H., Rosin, P., Lai, Y.-K.: Cp-vton+: clothing shape and texture preserving image-based virtual try-on. In: The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops (2020)

  4. Yang, H., Zhang, R., Guo, X., Liu, W., Zuo, W., Luo, P.: Towards photo-realistic virtual try-on by adaptively generating-preserving image content. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

  5. Hu, B., Liu, P., Zheng, Z., Ren, M.: Spg-vton: semantic prediction guidance for multi-pose virtual try-on. IEEE Trans. Multimed. 24, 1233–1246 (2022). https://doi.org/10.1109/TMM.2022.3143712

    Article  Google Scholar 

  6. Choi, S., Park, S., Lee, M., Choo, J.: Viton-hd: High-resolution virtual try-on via misalignment-aware normalization. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14126–14135 (2021). https://doi.org/10.1109/CVPR46437.2021.01391

  7. Wang, K., Xie, J., Zhang, G., Liu, L., Yang, J.: Sequential 3d human pose and shape estimation from point clouds. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7273–7282 (2020). https://doi.org/10.1109/CVPR42600.2020.00730

  8. Xiu, Y., Yang, J., Tzionas, D., Black, M.J.: Icon: implicit clothed humans obtained from normals. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2 (2022)

  9. Jiang, H., Cai, J., Zheng, J.: Skeleton-aware 3d human shape reconstruction from point clouds. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 5430–5440 (2019). https://doi.org/10.1109/ICCV.2019.00553

  10. Yang, Z., Wang, S., Manivasagam, S., Huang, Z., Ma, W.-C., Yan, X., Yumer, E., Urtasun, R.: S3: neural shape, skeleton, and skinning fields for 3d human modeling. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13279–13288 (2021). https://doi.org/10.1109/CVPR46437.2021.01308

  11. Zhang, H., Meng, Y., Zhao, Y., Qian, X., Qiao, Y., Yang, X., Zheng, Y.: 3d human pose and shape reconstruction from videos via confidence-aware temporal feature aggregation. IEEE Trans. Multimed. (2022). https://doi.org/10.1109/TMM.2022.3167887

    Article  Google Scholar 

  12. Zhao, T., Li, S., Ngan, K.N., Wu, F.: 3-d reconstruction of human body shape from a single commodity depth camera. IEEE Trans. Multimed. 21(1), 114–123 (2019). https://doi.org/10.1109/TMM.2018.2844087

    Article  Google Scholar 

  13. Tewari, A., Zollhöfer, M., Bernard, F., Garrido, P., Kim, H., Pérez, P., Theobalt, C.: High-fidelity monocular face reconstruction based on an unsupervised model-based face autoencoder. IEEE Trans. Pattern Anal. Mach. Intell. 42(2), 357–370 (2020). https://doi.org/10.1109/TPAMI.2018.2876842

    Article  Google Scholar 

  14. Su, Z., Wan, W., Yu, T., Liu, L., Fang, L., Wang, W., Liu, Y.: Mulaycap: multi-layer human performance capture using a monocular video camera. IEEE Trans. Vis. Comput. Graphics 28(4), 1862–1879 (2022). https://doi.org/10.1109/TVCG.2020.3027763

    Article  Google Scholar 

  15. Han, X.-F., Laga, H., Bennamoun, M.: Image-based 3d object reconstruction: state-of-the-art and trends in the deep learning era. IEEE Trans. Pattern Anal. Mach. Intell. 43(5), 1578–1604 (2021). https://doi.org/10.1109/TPAMI.2019.2954885

    Article  Google Scholar 

  16. Zhao, F., Xie, Z., Kampffmeyer, M., Dong, H., Han, S., Zheng, T., Zhang, T., Liang, X.: M3d-vton: a monocular-to-3d virtual try-on network. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 13219–13229 (2021). https://doi.org/10.1109/ICCV48922.2021.01299

  17. Liang, J., Lin, M.: Shape-aware human pose and shape reconstruction using multi-view images. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 4351–4361 (2019). https://doi.org/10.1109/ICCV.2019.00445

  18. Bookstein, F.L.: Principal warps: thin-plate splines and the decomposition of deformations. IEEE Trans. Pattern Anal. Mach. Intell. 11(6), 567–585 (1989). https://doi.org/10.1109/34.24792

    Article  MATH  Google Scholar 

  19. Zhou, T., Tulsiani, S., Sun, W., Malik, J., Efros, A.A.: View synthesis by appearance flow. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision—ECCV 2016, pp. 286–301. Springer, Cham (2016)

  20. Han, X., Huang, W., Hu, X., Scott, M.: Clothflow: a flow-based model for clothed person generation. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 10470–10479 (2019). https://doi.org/10.1109/ICCV.2019.01057

  21. Dong, H., Liang, X., Shen, X., Wu, B., Chen, B.-C., Yin, J.: Fw-gan: flow-navigated warping gan for video virtual try-on. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1161–1170 (2019). https://doi.org/10.1109/ICCV.2019.00125

  22. Chen, C.-Y., Lo, L., Huang, P.-J., Shuai, H.-H., Cheng, W.-H.: Fashionmirror: co-attention feature-remapping virtual try-on with sequential template poses. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 13789–13798 (2021). https://doi.org/10.1109/ICCV48922.2021.01355

  23. Chopra, A., Jain, R., Hemani, M., Krishnamurthy, B.: Zflow: Gated appearance flow-based virtual try-on with 3d priors. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 5413–5422 (2021). https://doi.org/10.1109/ICCV48922.2021.00538

  24. Ge, Y., Song, Y., Zhang, R., Ge, C., Liu, W., Luo, P.: Parser-free virtual try-on via distilling appearance flows. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8481–8489 (2021). https://doi.org/10.1109/CVPR46437.2021.00838

  25. Gundogdu, E., Constantin, V., Seifoddini, A., Dang, M., Salzmann, M., Fua, P.: Garnet: a two-stream network for fast and accurate 3d cloth draping. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 8738–8747 (2019). https://doi.org/10.1109/ICCV.2019.00883

  26. Mir, A., Alldieck, T., Pons-Moll, G.: Learning to transfer texture from clothing images to 3d humans. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7021–7032 (2020). https://doi.org/10.1109/CVPR42600.2020.00705

  27. Chaudhuri, B., Sarafianos, N., Shapiro, L., Tung, T.: Semi-supervised synthesis of high-resolution editable textures for 3d humans. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7987–7996 (2021). https://doi.org/10.1109/CVPR46437.2021.00790

  28. Gabeur, V., Franco, J.-S., Martin, X., Schmid, C., Rogez, G.: Moulding humans: non-parametric 3d human shape estimation from single images. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 2232–2241 (2019). https://doi.org/10.1109/ICCV.2019.00232

  29. Saito, S., Simon, T., Saragih, J., Joo, H.: Pifuhd: multi-level pixel-aligned implicit function for high-resolution 3d human digitization. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 81–90 (2020). https://doi.org/10.1109/CVPR42600.2020.00016

  30. Mustafa, A., Caliskan, A., Agapito, L., Hilton, A.: Multi-person implicit reconstruction from a single image. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14469–14478 (2021). https://doi.org/10.1109/CVPR46437.2021.01424

  31. Du, C., Yu, F., Jiang, M., Hua, A., Wei, X., Peng, T., Hu, X.: Vton-scfa: a virtual try-on network based on the semantic constraints and flow alignment. IEEE Trans. Multimed. (2022). https://doi.org/10.1109/TMM.2022.3152367

    Article  Google Scholar 

  32. Gong, K., Liang, X., Zhang, D., Shen, X., Lin, L.: Look into person: self-supervised structure-sensitive learning and a new benchmark for human parsing. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 932–940 (2017)

  33. Oktay, O., Schlemper, J., Folgoc, L.L., Lee, M., Heinrich, M., Misawa, K., Mori, K., McDonagh, S., Hammerla, N.Y., Kainz, B., et al.: Attention u-net: learning where to look for the pancreas. arXiv:1804.03999 (2018)

  34. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2014)

  35. Isola, P., Zhu, J.-Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1125–1134 (2017)

  36. Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: European Conference on Computer Vision, pp. 694–711 (2016). Springer

  37. Grant, E., Kohli, P., Gerven, M.v.: Deep disentangled representations for volumetric reconstruction. In: European Conference on Computer Vision, pp. 266–279 (2016). Springer

  38. Sun, X., Wu, J., Zhang, X., Zhang, Z., Zhang, C., Xue, T., Tenenbaum, J.B., Freeman, W.T.: Pix3d: dataset and methods for single-image 3d shape modeling. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2974–2983 (2018). https://doi.org/10.1109/CVPR.2018.00314

  39. Wu, J., Zhang, C., Zhang, X., Zhang, Z., Freeman, W.T., Tenenbaum, J.B.: Learning shape priors for single-view 3d completion and reconstruction. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 646–662 (2018)

  40. Sobel, I., Feldman, G., et al.: A 3x3 isotropic gradient operator for image processing. A talk at the Stanford Artificial Project in, 271–272 (1968)

  41. Hu, J., Ozay, M., Zhang, Y., Okatani, T.: Revisiting single image depth estimation: toward higher resolution maps with accurate object boundaries. In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1043–1051 (2019). IEEE

  42. Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)

    Article  Google Scholar 

  43. Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. In: Guyon I., Von Luxburg U., Bengio S., Wallach H., Fergus R., Vishwanathan S., Garnett R. (eds.) Advances in Neural Information Processing Systems. Curran Associates, Inc., vol. 30, pp.1–12 (2017)

  44. Wang, Z., Chen, J., Hoi, S.C.H.: Deep learning for image super-resolution: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 43(10), 3365–3387 (2021). https://doi.org/10.1109/TPAMI.2020.2982166

    Article  Google Scholar 

  45. Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 586–595 (2018). https://doi.org/10.1109/CVPR.2018.00068

  46. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv:1412.6980 (2014)

  47. Jandial, S., Chopra, A., Ayush, K., Hemani, M., Krishnamurthy, B., Halwai, A.: Sievenet: a unified framework for robust image-based virtual try-on. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 2182–2190 (2020)

  48. Ge, C., Song, Y., Ge, Y., Yang, H., Liu, W., Luo, P.: Disentangled cycle consistency for highly-realistic virtual try-on. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16928–16937 (2021)

  49. Tang, S., Tan, F., Cheng, K., Li, Z., Zhu, S., Tan, P.: A neural network for detailed human depth estimation from a single image. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7750–7759 (2019)

  50. Wang, L., Zhao, X., Yu, T., Wang, S., Liu, Y.: Normalgan: learning detailed 3d human from a single rgb-d image. In: European Conference on Computer Vision, pp. 430–446 (2020). Springer

  51. Saito, S., Huang, Z., Natsume, R., Morishima, S., Li, H., Kanazawa, A.: Pifu: pixel-aligned implicit function for high-resolution clothed human digitization. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 2304–2314 (2019). https://doi.org/10.1109/ICCV.2019.00239

Download references

Funding

This work was supported by the National natural science foundation of China (No. 62202346), Hubei key research and development program (No. 2021BAA042), open project of engineering research center of Hubei province for clothing information (No. 2022HBCI01), Wuhan applied basic frontier research project (No. 2022013988065212), MIIT’s AI Industry Innovation Task unveils flagship projects (Key technologies, equipment, and systems for flexible customized and intelligent manufacturing in the clothing industry), and Hubei science and technology project of safe production special fund (Scene control platform based on proprioception information computing of artificial intelligence).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Feng Yu.

Ethics declarations

Conflict of interest

All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.

Ethics approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, Z., Yu, F., Jiang, M. et al. Three stages of 3D virtual try-on network with appearance flow and shape field. Vis Comput 39, 3545–3559 (2023). https://doi.org/10.1007/s00371-023-02946-3

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-023-02946-3

Keywords

Navigation