Skip to main content

MM-VTON: A Multi-stage Virtual Try-on Method Using Multiple Image Features

  • Conference paper
  • First Online:
International Conference on Neural Computing for Advanced Applications (NCAA 2023)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1869))

Included in the following conference series:

  • 359 Accesses

Abstract

Virtual try-on allows users to see how they look without actually trying the clothes on during their purchase. This technology has numerous applications in the display of clothing effects and is especially useful during the pandemic, because it enables remote try-on without physical contact. The major limitations of current virtual try-on methods, however, lie in the difficulty of addressing clothing deformation, edge synthesis, etc. In this study, we present a new three-stage virtual try-on method to reduce the reliance on clothing regions in human images. To achieve this, we design a new semantic prediction module to fully remove clothing-related information from human images. Additionally, we introduce a new try-on module to fuse the extracted features using an adversarial loss, resulting in significant improvements on the try-on image quality. Experimental results have demonstrated the effectiveness of our method, which achieves competitive results in comparison to state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Belongie, S.J., Malik, J., Puzicha, J.: Shape matching and object recognition using shape contexts. IEEE Trans. Pattern Anal. Mach. Intell. 24(4), 509–522 (2002). https://doi.org/10.1109/34.993558

    Article  Google Scholar 

  2. Bookstein, F.L.: Principal warps: thin-plate splines and the decomposition of deformations. IEEE Trans. Pattern Anal. Mach. Intell. 11(6), 567–585 (1989). https://doi.org/10.1109/34.24792

    Article  MATH  Google Scholar 

  3. Brock, A., Donahue, J., Simonyan, K.: Large scale GAN training for high fidelity natural image synthesis. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, 6–9 May 2019. OpenReview.net (2019)

    Google Scholar 

  4. Chandaliya, P.K., Nain, N.: AW-GAN: face aging and rejuvenation using attention with wavelet GAN. Neural Comput. Appl. 35(3), 2811–2825 (2023)

    Article  Google Scholar 

  5. Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., Abbeel, P.: Infogan: interpretable representation learning by information maximizing generative adversarial nets. In: Lee, D.D., Sugiyama, M., von Luxburg, U., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 5–10 December 2016, Barcelona, Spain, pp. 2172–2180 (2016)

    Google Scholar 

  6. Choi, S., Park, S., Lee, M., Choo, J.: VITON-HD: high-resolution virtual try-on via misalignment-aware normalization. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, 19–25 June 2021, pp. 14131–14140. Computer Vision Foundation/IEEE (2021). https://doi.org/10.1109/CVPR46437.2021.01391

  7. Dong, H., et al.: Towards multi-pose guided virtual try-on network. In: 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), 27 October–2 November 2019, pp. 9025–9034. IEEE (2019).https://doi.org/10.1109/ICCV.2019.00912

  8. Ge, Y., Song, Y., Zhang, R., Ge, C., Liu, W., Luo, P.: Parser-free virtual try-on via distilling appearance flows. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, 19–25 June 2021, pp. 8485–8493. Computer Vision Foundation/IEEE (2021). https://doi.org/10.1109/CVPR46437.2021.00838

  9. Gong, K., Liang, X., Zhang, D., Shen, X., Lin, L.: Look into person: self-supervised structure-sensitive learning and a new benchmark for human parsing. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, 21–26 July 2017, pp. 6757–6765. IEEE Computer Society (2017). https://doi.org/10.1109/CVPR.2017.715

  10. Goodfellow, I.J., et al.: Generative adversarial nets. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 8–13 December 2014, Montreal, Quebec, Canada, pp. 2672–2680 (2014)

    Google Scholar 

  11. Guan, P., Reiss, L., Hirshberg, D.A., Weiss, A., Black, M.J.: DRAPE: dressing any person. ACM Trans. Graph. 31(4), 35:1–35:10 (2012). https://doi.org/10.1145/2185520.2185531

  12. Güler, R.A., Neverova, N., Kokkinos, I.: Densepose: dense human pose estimation in the wild. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, 18–22 June 2018, pp. 7297–7306. Computer Vision Foundation/IEEE Computer Society (2018). https://doi.org/10.1109/CVPR.2018.00762

  13. Han, X., Huang, W., Hu, X., Scott, M.R.: Clothflow: a flow-based model for clothed person generation. In: 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), 27 October–2 November 2019, pp. 10470–10479. IEEE (2019). https://doi.org/10.1109/ICCV.2019.01057

  14. Han, X., Wu, Z., Wu, Z., Yu, R., Davis, L.S.: VITON: an image-based virtual try-on network. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, 18–22 June 2018, pp. 7543–7552. Computer Vision Foundation/IEEE Computer Society (2018). https://doi.org/10.1109/CVPR.2018.00787

  15. Honda, S.: VITON-GAN: virtual try-on image generator trained with adversarial loss. In: Fusiello, A., Bimber, O. (eds.) 40th Annual Conference of the European Association for Computer Graphics, Eurographics 2019 - Posters, Genoa, Italy, 6–10 May 2019, pp. 9–10. Eurographics Association (2019). https://doi.org/10.2312/egp.20191043

  16. Jandial, S., Chopra, A., Ayush, K., Hemani, M., Kumar, A., Krishnamurthy, B.: Sievenet: a unified framework for robust image-based virtual try-on. In: IEEE Winter Conference on Applications of Computer Vision, WACV 2020, Snowmass Village, CO, USA, 1–5 March 2020, pp. 2171–2179. IEEE (2020). https://doi.org/10.1109/WACV45572.2020.9093458

  17. Lei, J., Sridhar, S., Guerrero, P., Sung, M., Mitra, N., Guibas, L.J.: Pix2Surf: learning parametric 3D surface models of objects from images. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12363, pp. 121–138. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58523-5_8

    Chapter  Google Scholar 

  18. Lin, T., Dollár, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, 21–26 July 2017, pp. 936–944. IEEE Computer Society (2017). https://doi.org/10.1109/CVPR.2017.106

  19. Milletari, F., Navab, N., Ahmadi, S.: V-net: fully convolutional neural networks for volumetric medical image segmentation. In: Fourth International Conference on 3D Vision, 3DV 2016, Stanford, CA, USA, 25–28 October 2016, pp. 565–571. IEEE Computer Society (2016). https://doi.org/10.1109/3DV.2016.79

  20. Minar, M.R., Tuan, T.T., Ahn, H., Rosin, P., Lai, Y.K.: CP-VTON+: clothing shape and texture preserving image-based virtual try-on. In: The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops (2020)

    Google Scholar 

  21. Mirza, M., Osindero, S.: Conditional generative adversarial nets. CoRR abs/1411.1784 (2014). https://arxiv.org/abs/1411.1784

  22. Patel, C., Liao, Z., Pons-Moll, G.: Tailornet: predicting clothing in 3D as a function of human pose, shape and garment style. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, 13–19 June 2020, pp. 7363–7373. Computer Vision Foundation/IEEE (2020). https://doi.org/10.1109/CVPR42600.2020.00739

  23. Pons-Moll, G., Pujades, S., Hu, S., Black, M.J.: Clothcap: seamless 4D clothing capture and retargeting. ACM Trans. Graph. 36(4), 73:1–73:15 (2017). https://doi.org/10.1145/3072959.3073711

  24. Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28

    Chapter  Google Scholar 

  25. Salimans, T., Goodfellow, I.J., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training GANs. In: Lee, D.D., Sugiyama, M., von Luxburg, U., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 5–10 December 2016, Barcelona, Spain, pp. 2226–2234 (2016)

    Google Scholar 

  26. Sekine, M., Sugita, K., Perbet, F., Stenger, B., Nishiyama, M.: Virtual fitting by single-shot body shape estimation. In: International Conference on 3D Body Scanning Technologies, pp. 406–413. Citeseer (2014)

    Google Scholar 

  27. Wang, B., Zheng, H., Liang, X., Chen, Y., Lin, L., Yang, M.: Toward characteristic-preserving image-based virtual try-on network. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11217, pp. 607–623. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01261-8_36

    Chapter  Google Scholar 

  28. Wu, Q., Chen, Y., Meng, J.: DCGAN-based data augmentation for tomato leaf disease identification. IEEE Access 8, 98716–98728 (2020). https://doi.org/10.1109/ACCESS.2020.2997001

    Article  Google Scholar 

  29. Yang, H., Zhang, R., Guo, X., Liu, W., Zuo, W., Luo, P.: Towards photo-realistic virtual try-on by adaptively generating\(\leftrightarrow \)preserving image content. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, 13–19 June 2020, pp. 7847–7856. Computer Vision Foundation/IEEE (2020). https://doi.org/10.1109/CVPR42600.2020.00787

  30. Zhang, H., Sun, Y., Liu, L., Wang, X., Li, L., Liu, W.: Clothingout: a category-supervised GAN model for clothing segmentation and retrieval. Neural Comput. Appl. 32, 4519–4530 (2020)

    Article  Google Scholar 

  31. Zhang, Z., Liu, Q., Wang, Y.: Road extraction by deep residual U-net. IEEE Geosci. Remote Sens. Lett. 15(5), 749–753 (2018). https://doi.org/10.1109/LGRS.2018.2802944

    Article  Google Scholar 

  32. Zhou, D., et al.: Learning to synthesize compatible fashion items using semantic alignment and collocation classification: an outfit generation framework. IEEE Trans. Neural Netw. Learn. Syst. (2022)

    Google Scholar 

  33. Zhou, D., Zhang, H., Li, Q., Ma, J., Xu, X.: Coutfitgan: learning to synthesize compatible outfits supervised by silhouette masks and fashion styles. IEEE Trans. Multimedia (2022)

    Google Scholar 

  34. Zhou, T., Tulsiani, S., Sun, W., Malik, J., Efros, A.A.: View synthesis by appearance flow. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 286–301. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_18

    Chapter  Google Scholar 

  35. Zhou, Z., Rahman Siddiquee, M.M., Tajbakhsh, N., Liang, J.: UNet++: a nested U-net architecture for medical image segmentation. In: Stoyanov, D., et al. (eds.) DLMIA/ML-CDS -2018. LNCS, vol. 11045, pp. 3–11. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00889-5_1

    Chapter  Google Scholar 

Download references

Acknowledgement

This work was supported in part by the National Natural Science Foundation of China under Grant no. 61972112 and no. 61832004, the Guangdong Basic and Applied Basic Research Foundation under Grant no. 2021B1515020088, the Shenzhen Science and Technology Program under Grant no. JCYJ20210324131203009, and the HITSZ-J&A Joint Laboratory of Digital Design and Intelligent Fabrication under Grant no. HITSZ-J&A-2021A01.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Haijun Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, G., Zhang, H., Mu, X., Ma, J. (2023). MM-VTON: A Multi-stage Virtual Try-on Method Using Multiple Image Features. In: Zhang, H., et al. International Conference on Neural Computing for Advanced Applications. NCAA 2023. Communications in Computer and Information Science, vol 1869. Springer, Singapore. https://doi.org/10.1007/978-981-99-5844-3_10

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-5844-3_10

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-5843-6

  • Online ISBN: 978-981-99-5844-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics