Abstract
The virtual try-on technology can satisfy the demands for online shopping and help consumers experience online clothes through image generation technology. Compared with image-based try-on, the 3D virtual try-on methods can realize the multi-perspective of try-on simulation and get the attention of many researchers. The current 3D virtual try-on methods mainly base on the thin-plate spline method and depth information-based 3D reconstruction, which increases the costs of implementing 3D virtual try-on and the results lack of clothing details such as folds and patterns. To solve those problems, we propose a novel 3D virtual try-on network based on appearance flow and shape field called AFSF-3DVTON. Specifically, this network consists of three modules. First, the appearance flow warping module generates the desired warped clothes according to the appearance flow of the original clothes. Then, the flat try-on module facilitates geometric matching between the warped clothes and reference person images and synthesizes 2D try-on results. Third, to increase the image’s details to 3D try-on synthesis, the shape field-based reconstruction is adopted, which extracts shape features of 2D try-on results to improve the quality of 3D try-on reconstruction. We evaluate the proposed method on the VITON and MPV3D datasets, and several state-of-the-art virtual try-on algorithms are used as comparisons. The qualitative analyses verify the superiority of the proposed method, and the evaluation indexes, including Abs., Sq., and RMSE, demonstrate the outperformance of the proposed network.
Similar content being viewed by others
Data availability statement
The datasets analyzed during the current study are available in the VITON and MPV3D repository. These datasets were derived from the following public domain resources: https://drive.google.com/file/d/1MxCUvKxejnwWnoZ-KoCyMCXo3TLhRu To/viewhttps://drive.google.com/file/d/1qcynpXZ9eSlzTV- RDCr-Yip 3GcuU314h/view?usp=sharing.
References
Han, X., Wu, Z., Wu, Z., Yu, R., Davis, L.S.: Viton: an image-based virtual try-on network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Wang, B., Zheng, H., Liang, X., Chen, Y., Lin, L., Yang, M.: Toward characteristic-preserving image-based virtual try-on network. In: Proceedings of the European Conference on Computer Vision (ECCV) (2018)
Minar, M.R., Tuan, T.T., Ahn, H., Rosin, P., Lai, Y.-K.: Cp-vton+: clothing shape and texture preserving image-based virtual try-on. In: The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops (2020)
Yang, H., Zhang, R., Guo, X., Liu, W., Zuo, W., Luo, P.: Towards photo-realistic virtual try-on by adaptively generating-preserving image content. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Hu, B., Liu, P., Zheng, Z., Ren, M.: Spg-vton: semantic prediction guidance for multi-pose virtual try-on. IEEE Trans. Multimed. 24, 1233–1246 (2022). https://doi.org/10.1109/TMM.2022.3143712
Choi, S., Park, S., Lee, M., Choo, J.: Viton-hd: High-resolution virtual try-on via misalignment-aware normalization. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14126–14135 (2021). https://doi.org/10.1109/CVPR46437.2021.01391
Wang, K., Xie, J., Zhang, G., Liu, L., Yang, J.: Sequential 3d human pose and shape estimation from point clouds. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7273–7282 (2020). https://doi.org/10.1109/CVPR42600.2020.00730
Xiu, Y., Yang, J., Tzionas, D., Black, M.J.: Icon: implicit clothed humans obtained from normals. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2 (2022)
Jiang, H., Cai, J., Zheng, J.: Skeleton-aware 3d human shape reconstruction from point clouds. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 5430–5440 (2019). https://doi.org/10.1109/ICCV.2019.00553
Yang, Z., Wang, S., Manivasagam, S., Huang, Z., Ma, W.-C., Yan, X., Yumer, E., Urtasun, R.: S3: neural shape, skeleton, and skinning fields for 3d human modeling. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13279–13288 (2021). https://doi.org/10.1109/CVPR46437.2021.01308
Zhang, H., Meng, Y., Zhao, Y., Qian, X., Qiao, Y., Yang, X., Zheng, Y.: 3d human pose and shape reconstruction from videos via confidence-aware temporal feature aggregation. IEEE Trans. Multimed. (2022). https://doi.org/10.1109/TMM.2022.3167887
Zhao, T., Li, S., Ngan, K.N., Wu, F.: 3-d reconstruction of human body shape from a single commodity depth camera. IEEE Trans. Multimed. 21(1), 114–123 (2019). https://doi.org/10.1109/TMM.2018.2844087
Tewari, A., Zollhöfer, M., Bernard, F., Garrido, P., Kim, H., Pérez, P., Theobalt, C.: High-fidelity monocular face reconstruction based on an unsupervised model-based face autoencoder. IEEE Trans. Pattern Anal. Mach. Intell. 42(2), 357–370 (2020). https://doi.org/10.1109/TPAMI.2018.2876842
Su, Z., Wan, W., Yu, T., Liu, L., Fang, L., Wang, W., Liu, Y.: Mulaycap: multi-layer human performance capture using a monocular video camera. IEEE Trans. Vis. Comput. Graphics 28(4), 1862–1879 (2022). https://doi.org/10.1109/TVCG.2020.3027763
Han, X.-F., Laga, H., Bennamoun, M.: Image-based 3d object reconstruction: state-of-the-art and trends in the deep learning era. IEEE Trans. Pattern Anal. Mach. Intell. 43(5), 1578–1604 (2021). https://doi.org/10.1109/TPAMI.2019.2954885
Zhao, F., Xie, Z., Kampffmeyer, M., Dong, H., Han, S., Zheng, T., Zhang, T., Liang, X.: M3d-vton: a monocular-to-3d virtual try-on network. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 13219–13229 (2021). https://doi.org/10.1109/ICCV48922.2021.01299
Liang, J., Lin, M.: Shape-aware human pose and shape reconstruction using multi-view images. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 4351–4361 (2019). https://doi.org/10.1109/ICCV.2019.00445
Bookstein, F.L.: Principal warps: thin-plate splines and the decomposition of deformations. IEEE Trans. Pattern Anal. Mach. Intell. 11(6), 567–585 (1989). https://doi.org/10.1109/34.24792
Zhou, T., Tulsiani, S., Sun, W., Malik, J., Efros, A.A.: View synthesis by appearance flow. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision—ECCV 2016, pp. 286–301. Springer, Cham (2016)
Han, X., Huang, W., Hu, X., Scott, M.: Clothflow: a flow-based model for clothed person generation. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 10470–10479 (2019). https://doi.org/10.1109/ICCV.2019.01057
Dong, H., Liang, X., Shen, X., Wu, B., Chen, B.-C., Yin, J.: Fw-gan: flow-navigated warping gan for video virtual try-on. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1161–1170 (2019). https://doi.org/10.1109/ICCV.2019.00125
Chen, C.-Y., Lo, L., Huang, P.-J., Shuai, H.-H., Cheng, W.-H.: Fashionmirror: co-attention feature-remapping virtual try-on with sequential template poses. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 13789–13798 (2021). https://doi.org/10.1109/ICCV48922.2021.01355
Chopra, A., Jain, R., Hemani, M., Krishnamurthy, B.: Zflow: Gated appearance flow-based virtual try-on with 3d priors. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 5413–5422 (2021). https://doi.org/10.1109/ICCV48922.2021.00538
Ge, Y., Song, Y., Zhang, R., Ge, C., Liu, W., Luo, P.: Parser-free virtual try-on via distilling appearance flows. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8481–8489 (2021). https://doi.org/10.1109/CVPR46437.2021.00838
Gundogdu, E., Constantin, V., Seifoddini, A., Dang, M., Salzmann, M., Fua, P.: Garnet: a two-stream network for fast and accurate 3d cloth draping. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 8738–8747 (2019). https://doi.org/10.1109/ICCV.2019.00883
Mir, A., Alldieck, T., Pons-Moll, G.: Learning to transfer texture from clothing images to 3d humans. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7021–7032 (2020). https://doi.org/10.1109/CVPR42600.2020.00705
Chaudhuri, B., Sarafianos, N., Shapiro, L., Tung, T.: Semi-supervised synthesis of high-resolution editable textures for 3d humans. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7987–7996 (2021). https://doi.org/10.1109/CVPR46437.2021.00790
Gabeur, V., Franco, J.-S., Martin, X., Schmid, C., Rogez, G.: Moulding humans: non-parametric 3d human shape estimation from single images. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 2232–2241 (2019). https://doi.org/10.1109/ICCV.2019.00232
Saito, S., Simon, T., Saragih, J., Joo, H.: Pifuhd: multi-level pixel-aligned implicit function for high-resolution 3d human digitization. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 81–90 (2020). https://doi.org/10.1109/CVPR42600.2020.00016
Mustafa, A., Caliskan, A., Agapito, L., Hilton, A.: Multi-person implicit reconstruction from a single image. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14469–14478 (2021). https://doi.org/10.1109/CVPR46437.2021.01424
Du, C., Yu, F., Jiang, M., Hua, A., Wei, X., Peng, T., Hu, X.: Vton-scfa: a virtual try-on network based on the semantic constraints and flow alignment. IEEE Trans. Multimed. (2022). https://doi.org/10.1109/TMM.2022.3152367
Gong, K., Liang, X., Zhang, D., Shen, X., Lin, L.: Look into person: self-supervised structure-sensitive learning and a new benchmark for human parsing. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 932–940 (2017)
Oktay, O., Schlemper, J., Folgoc, L.L., Lee, M., Heinrich, M., Misawa, K., Mori, K., McDonagh, S., Hammerla, N.Y., Kainz, B., et al.: Attention u-net: learning where to look for the pancreas. arXiv:1804.03999 (2018)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2014)
Isola, P., Zhu, J.-Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1125–1134 (2017)
Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: European Conference on Computer Vision, pp. 694–711 (2016). Springer
Grant, E., Kohli, P., Gerven, M.v.: Deep disentangled representations for volumetric reconstruction. In: European Conference on Computer Vision, pp. 266–279 (2016). Springer
Sun, X., Wu, J., Zhang, X., Zhang, Z., Zhang, C., Xue, T., Tenenbaum, J.B., Freeman, W.T.: Pix3d: dataset and methods for single-image 3d shape modeling. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2974–2983 (2018). https://doi.org/10.1109/CVPR.2018.00314
Wu, J., Zhang, C., Zhang, X., Zhang, Z., Freeman, W.T., Tenenbaum, J.B.: Learning shape priors for single-view 3d completion and reconstruction. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 646–662 (2018)
Sobel, I., Feldman, G., et al.: A 3x3 isotropic gradient operator for image processing. A talk at the Stanford Artificial Project in, 271–272 (1968)
Hu, J., Ozay, M., Zhang, Y., Okatani, T.: Revisiting single image depth estimation: toward higher resolution maps with accurate object boundaries. In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1043–1051 (2019). IEEE
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. In: Guyon I., Von Luxburg U., Bengio S., Wallach H., Fergus R., Vishwanathan S., Garnett R. (eds.) Advances in Neural Information Processing Systems. Curran Associates, Inc., vol. 30, pp.1–12 (2017)
Wang, Z., Chen, J., Hoi, S.C.H.: Deep learning for image super-resolution: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 43(10), 3365–3387 (2021). https://doi.org/10.1109/TPAMI.2020.2982166
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 586–595 (2018). https://doi.org/10.1109/CVPR.2018.00068
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv:1412.6980 (2014)
Jandial, S., Chopra, A., Ayush, K., Hemani, M., Krishnamurthy, B., Halwai, A.: Sievenet: a unified framework for robust image-based virtual try-on. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 2182–2190 (2020)
Ge, C., Song, Y., Ge, Y., Yang, H., Liu, W., Luo, P.: Disentangled cycle consistency for highly-realistic virtual try-on. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16928–16937 (2021)
Tang, S., Tan, F., Cheng, K., Li, Z., Zhu, S., Tan, P.: A neural network for detailed human depth estimation from a single image. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7750–7759 (2019)
Wang, L., Zhao, X., Yu, T., Wang, S., Liu, Y.: Normalgan: learning detailed 3d human from a single rgb-d image. In: European Conference on Computer Vision, pp. 430–446 (2020). Springer
Saito, S., Huang, Z., Natsume, R., Morishima, S., Li, H., Kanazawa, A.: Pifu: pixel-aligned implicit function for high-resolution clothed human digitization. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 2304–2314 (2019). https://doi.org/10.1109/ICCV.2019.00239
Funding
This work was supported by the National natural science foundation of China (No. 62202346), Hubei key research and development program (No. 2021BAA042), open project of engineering research center of Hubei province for clothing information (No. 2022HBCI01), Wuhan applied basic frontier research project (No. 2022013988065212), MIIT’s AI Industry Innovation Task unveils flagship projects (Key technologies, equipment, and systems for flexible customized and intelligent manufacturing in the clothing industry), and Hubei science and technology project of safe production special fund (Scene control platform based on proprioception information computing of artificial intelligence).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.
Ethics approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Chen, Z., Yu, F., Jiang, M. et al. Three stages of 3D virtual try-on network with appearance flow and shape field. Vis Comput 39, 3545–3559 (2023). https://doi.org/10.1007/s00371-023-02946-3
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-023-02946-3