MM-VTON: A Multi-stage Virtual Try-on Method Using Multiple Image Features

Li, Guojian; Zhang, Haijun; Mu, Xiangyu; Ma, Jianghong

doi:10.1007/978-981-99-5844-3_10

Guojian Li¹²,
Haijun Zhang¹²,
Xiangyu Mu¹² &
…
Jianghong Ma¹²

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1869))

Included in the following conference series:

International Conference on Neural Computing for Advanced Applications

359 Accesses

Abstract

Virtual try-on allows users to see how they look without actually trying the clothes on during their purchase. This technology has numerous applications in the display of clothing effects and is especially useful during the pandemic, because it enables remote try-on without physical contact. The major limitations of current virtual try-on methods, however, lie in the difficulty of addressing clothing deformation, edge synthesis, etc. In this study, we present a new three-stage virtual try-on method to reduce the reliance on clothing regions in human images. To achieve this, we design a new semantic prediction module to fully remove clothing-related information from human images. Additionally, we introduce a new try-on module to fuse the extracted features using an adversarial loss, resulting in significant improvements on the try-on image quality. Experimental results have demonstrated the effectiveness of our method, which achieves competitive results in comparison to state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Belongie, S.J., Malik, J., Puzicha, J.: Shape matching and object recognition using shape contexts. IEEE Trans. Pattern Anal. Mach. Intell. 24(4), 509–522 (2002). https://doi.org/10.1109/34.993558
Article Google Scholar
Bookstein, F.L.: Principal warps: thin-plate splines and the decomposition of deformations. IEEE Trans. Pattern Anal. Mach. Intell. 11(6), 567–585 (1989). https://doi.org/10.1109/34.24792
Article MATH Google Scholar
Brock, A., Donahue, J., Simonyan, K.: Large scale GAN training for high fidelity natural image synthesis. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, 6–9 May 2019. OpenReview.net (2019)
Google Scholar
Chandaliya, P.K., Nain, N.: AW-GAN: face aging and rejuvenation using attention with wavelet GAN. Neural Comput. Appl. 35(3), 2811–2825 (2023)
Article Google Scholar
Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., Abbeel, P.: Infogan: interpretable representation learning by information maximizing generative adversarial nets. In: Lee, D.D., Sugiyama, M., von Luxburg, U., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 5–10 December 2016, Barcelona, Spain, pp. 2172–2180 (2016)
Google Scholar
Choi, S., Park, S., Lee, M., Choo, J.: VITON-HD: high-resolution virtual try-on via misalignment-aware normalization. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, 19–25 June 2021, pp. 14131–14140. Computer Vision Foundation/IEEE (2021). https://doi.org/10.1109/CVPR46437.2021.01391
Dong, H., et al.: Towards multi-pose guided virtual try-on network. In: 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), 27 October–2 November 2019, pp. 9025–9034. IEEE (2019).https://doi.org/10.1109/ICCV.2019.00912
Ge, Y., Song, Y., Zhang, R., Ge, C., Liu, W., Luo, P.: Parser-free virtual try-on via distilling appearance flows. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, 19–25 June 2021, pp. 8485–8493. Computer Vision Foundation/IEEE (2021). https://doi.org/10.1109/CVPR46437.2021.00838
Gong, K., Liang, X., Zhang, D., Shen, X., Lin, L.: Look into person: self-supervised structure-sensitive learning and a new benchmark for human parsing. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, 21–26 July 2017, pp. 6757–6765. IEEE Computer Society (2017). https://doi.org/10.1109/CVPR.2017.715
Goodfellow, I.J., et al.: Generative adversarial nets. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 8–13 December 2014, Montreal, Quebec, Canada, pp. 2672–2680 (2014)
Google Scholar
Guan, P., Reiss, L., Hirshberg, D.A., Weiss, A., Black, M.J.: DRAPE: dressing any person. ACM Trans. Graph. 31(4), 35:1–35:10 (2012). https://doi.org/10.1145/2185520.2185531
Güler, R.A., Neverova, N., Kokkinos, I.: Densepose: dense human pose estimation in the wild. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, 18–22 June 2018, pp. 7297–7306. Computer Vision Foundation/IEEE Computer Society (2018). https://doi.org/10.1109/CVPR.2018.00762
Han, X., Huang, W., Hu, X., Scott, M.R.: Clothflow: a flow-based model for clothed person generation. In: 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), 27 October–2 November 2019, pp. 10470–10479. IEEE (2019). https://doi.org/10.1109/ICCV.2019.01057
Han, X., Wu, Z., Wu, Z., Yu, R., Davis, L.S.: VITON: an image-based virtual try-on network. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, 18–22 June 2018, pp. 7543–7552. Computer Vision Foundation/IEEE Computer Society (2018). https://doi.org/10.1109/CVPR.2018.00787
Honda, S.: VITON-GAN: virtual try-on image generator trained with adversarial loss. In: Fusiello, A., Bimber, O. (eds.) 40th Annual Conference of the European Association for Computer Graphics, Eurographics 2019 - Posters, Genoa, Italy, 6–10 May 2019, pp. 9–10. Eurographics Association (2019). https://doi.org/10.2312/egp.20191043
Jandial, S., Chopra, A., Ayush, K., Hemani, M., Kumar, A., Krishnamurthy, B.: Sievenet: a unified framework for robust image-based virtual try-on. In: IEEE Winter Conference on Applications of Computer Vision, WACV 2020, Snowmass Village, CO, USA, 1–5 March 2020, pp. 2171–2179. IEEE (2020). https://doi.org/10.1109/WACV45572.2020.9093458
Lei, J., Sridhar, S., Guerrero, P., Sung, M., Mitra, N., Guibas, L.J.: Pix2Surf: learning parametric 3D surface models of objects from images. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12363, pp. 121–138. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58523-5_8
Chapter Google Scholar
Lin, T., Dollár, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, 21–26 July 2017, pp. 936–944. IEEE Computer Society (2017). https://doi.org/10.1109/CVPR.2017.106
Milletari, F., Navab, N., Ahmadi, S.: V-net: fully convolutional neural networks for volumetric medical image segmentation. In: Fourth International Conference on 3D Vision, 3DV 2016, Stanford, CA, USA, 25–28 October 2016, pp. 565–571. IEEE Computer Society (2016). https://doi.org/10.1109/3DV.2016.79
Minar, M.R., Tuan, T.T., Ahn, H., Rosin, P., Lai, Y.K.: CP-VTON+: clothing shape and texture preserving image-based virtual try-on. In: The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops (2020)
Google Scholar
Mirza, M., Osindero, S.: Conditional generative adversarial nets. CoRR abs/1411.1784 (2014). https://arxiv.org/abs/1411.1784
Patel, C., Liao, Z., Pons-Moll, G.: Tailornet: predicting clothing in 3D as a function of human pose, shape and garment style. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, 13–19 June 2020, pp. 7363–7373. Computer Vision Foundation/IEEE (2020). https://doi.org/10.1109/CVPR42600.2020.00739
Pons-Moll, G., Pujades, S., Hu, S., Black, M.J.: Clothcap: seamless 4D clothing capture and retargeting. ACM Trans. Graph. 36(4), 73:1–73:15 (2017). https://doi.org/10.1145/3072959.3073711
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Salimans, T., Goodfellow, I.J., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training GANs. In: Lee, D.D., Sugiyama, M., von Luxburg, U., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 5–10 December 2016, Barcelona, Spain, pp. 2226–2234 (2016)
Google Scholar
Sekine, M., Sugita, K., Perbet, F., Stenger, B., Nishiyama, M.: Virtual fitting by single-shot body shape estimation. In: International Conference on 3D Body Scanning Technologies, pp. 406–413. Citeseer (2014)
Google Scholar
Wang, B., Zheng, H., Liang, X., Chen, Y., Lin, L., Yang, M.: Toward characteristic-preserving image-based virtual try-on network. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11217, pp. 607–623. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01261-8_36
Chapter Google Scholar
Wu, Q., Chen, Y., Meng, J.: DCGAN-based data augmentation for tomato leaf disease identification. IEEE Access 8, 98716–98728 (2020). https://doi.org/10.1109/ACCESS.2020.2997001
Article Google Scholar
Yang, H., Zhang, R., Guo, X., Liu, W., Zuo, W., Luo, P.: Towards photo-realistic virtual try-on by adaptively generating\(\leftrightarrow \)preserving image content. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, 13–19 June 2020, pp. 7847–7856. Computer Vision Foundation/IEEE (2020). https://doi.org/10.1109/CVPR42600.2020.00787
Zhang, H., Sun, Y., Liu, L., Wang, X., Li, L., Liu, W.: Clothingout: a category-supervised GAN model for clothing segmentation and retrieval. Neural Comput. Appl. 32, 4519–4530 (2020)
Article Google Scholar
Zhang, Z., Liu, Q., Wang, Y.: Road extraction by deep residual U-net. IEEE Geosci. Remote Sens. Lett. 15(5), 749–753 (2018). https://doi.org/10.1109/LGRS.2018.2802944
Article Google Scholar
Zhou, D., et al.: Learning to synthesize compatible fashion items using semantic alignment and collocation classification: an outfit generation framework. IEEE Trans. Neural Netw. Learn. Syst. (2022)
Google Scholar
Zhou, D., Zhang, H., Li, Q., Ma, J., Xu, X.: Coutfitgan: learning to synthesize compatible outfits supervised by silhouette masks and fashion styles. IEEE Trans. Multimedia (2022)
Google Scholar
Zhou, T., Tulsiani, S., Sun, W., Malik, J., Efros, A.A.: View synthesis by appearance flow. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 286–301. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_18
Chapter Google Scholar
Zhou, Z., Rahman Siddiquee, M.M., Tajbakhsh, N., Liang, J.: UNet++: a nested U-net architecture for medical image segmentation. In: Stoyanov, D., et al. (eds.) DLMIA/ML-CDS -2018. LNCS, vol. 11045, pp. 3–11. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00889-5_1
Chapter Google Scholar

Download references

Acknowledgement

This work was supported in part by the National Natural Science Foundation of China under Grant no. 61972112 and no. 61832004, the Guangdong Basic and Applied Basic Research Foundation under Grant no. 2021B1515020088, the Shenzhen Science and Technology Program under Grant no. JCYJ20210324131203009, and the HITSZ-J&A Joint Laboratory of Digital Design and Intelligent Fabrication under Grant no. HITSZ-J&A-2021A01.

Author information

Authors and Affiliations

Department of Computer Science, Harbin Institute of Technology, Shenzhen, 518055, China
Guojian Li, Haijun Zhang, Xiangyu Mu & Jianghong Ma

Authors

Guojian Li
View author publications
You can also search for this author in PubMed Google Scholar
Haijun Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xiangyu Mu
View author publications
You can also search for this author in PubMed Google Scholar
Jianghong Ma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Haijun Zhang .

Editor information

Editors and Affiliations

Harbin Institute of Technology, Shenzhen, China
Haijun Zhang
Chaohu University, Hefei, China
Yinggen Ke
Chongqing University, Chongqing, China
Zhou Wu
South China Normal University, Guangzhou, China
Tianyong Hao
Hefei University of Technology, Hefei, China
Zhao Zhang
Technical University of Denmark, Kongens Lyngby, Denmark
Weizhi Meng
Chaohu University, Hefei, China
Yuanyuan Mu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, G., Zhang, H., Mu, X., Ma, J. (2023). MM-VTON: A Multi-stage Virtual Try-on Method Using Multiple Image Features. In: Zhang, H., et al. International Conference on Neural Computing for Advanced Applications. NCAA 2023. Communications in Computer and Information Science, vol 1869. Springer, Singapore. https://doi.org/10.1007/978-981-99-5844-3_10

Download citation

DOI: https://doi.org/10.1007/978-981-99-5844-3_10
Published: 31 August 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-5843-6
Online ISBN: 978-981-99-5844-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics