Skip to main content

TOAC: Try-On Aligning Conformer forĀ Image-Based Virtual Try-On Alignment

  • Conference paper
  • First Online:
Artificial Intelligence (CICAI 2023)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14474))

Included in the following conference series:

  • 274 Accesses

Abstract

Recently, Image-based Virtual Try-on has garnered increasing attention within the realm of online apparel e-commerce, which aims to virtually superimpose garments onto images of portraits. Image-based virtual try-on generally consists of two steps: Image-based Virtual Try-on Alignment and Image-based Virtual Try-on Generation. In this paper, we focus on Image-based Virtual Try-on Alignment (IVTA), which plays a pivotal role in virtual try-on and aligns the target garment with the portrait. Current approaches for IVTA mostly adopt Convolutional Neural Networks (CNN) to extract local detailed features of both garments and portraits, ignoring the significance of global extensive features. To address this problem, we propose a novel model named Try-On Aligning Conformer (TOAC) to effectively aligns the target garment with the portrait and improve virtual try-on. Firstly, we integrate both Swin Transformer and CNN to comprehensively extract both global patterns and local details. Secondly, we propose a robust learned perceptual loss between generative reconstructed garment images and the ground truth to alleviate the overlap problem. Extensive experiments demonstrate the superiority of our proposed model compared to the state-of-the-art methods for virtual try-on alignment.

Supported by National Natural Science Foundation of China (No. 62036012, 62276257).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Barratt, S., Sharma, R.: A note on the inception score. arXiv preprint arXiv:1801.01973 (2018)

  2. Choi, S., Park, S., Lee, M., Choo, J.: VITON-HD: high-resolution virtual try-on via misalignment-aware normalization. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, 19ā€“25 June 2021, pp. 14131ā€“14140 (2021)

    Google ScholarĀ 

  3. Chopra, A., Jain, R., Hemani, M., Krishnamurthy, B.: Zflow: gated appearance flow-based virtual try-on with 3d priors. In: 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, 10ā€“17 October 2021, pp. 5413ā€“5422 (2021)

    Google ScholarĀ 

  4. Dong, X., et al.: Dressing in the wild by watching dance videos. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, 18ā€“24 June 2022, pp. 3470ā€“3479 (2022)

    Google ScholarĀ 

  5. FrĆ¼hstĆ¼ck, A., Singh, K.K., Shechtman, E., Mitra, N.J., Wonka, P., Lu, J.: Insetgan for full-body image generation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, 18ā€“24 June 2022, pp. 7713ā€“7722 (2022)

    Google ScholarĀ 

  6. Fu, J. et al.: StyleGAN-human: a data-centric odyssey of human generation. In: Avidan, S., Brostow, G., CissĆ©, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision - ECCV 2022, ECCV 2022, LNCS, Part XVI, vol. 13676, pp. 1ā€“19. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19787-1_1

  7. Ge, Y., Song, Y., Zhang, R., Ge, C., Liu, W., Luo, P.: Parser-free virtual try-on via distilling appearance flows. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, 19ā€“25 June 2021, pp. 8485ā€“8493 (2021)

    Google ScholarĀ 

  8. Gong, K., Liang, X., Li, Y., Chen, Y., Yang, M., Lin, L.: Instance-level human parsing via part grouping network. In: Computer Vision - ECCV 2018ā€“15th European Conference, Munich, Germany, 8ā€“14 September 2018, Proceedings, Part IV, pp. 805ā€“822 (2018)

    Google ScholarĀ 

  9. Goodfellow, I.J., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 8ā€“13 December 2014, Montreal, Quebec, Canada, pp. 2672ā€“2680 (2014)

    Google ScholarĀ 

  10. GĆ¼ler, R.A., Neverova, N., Kokkinos, I.: Densepose: dense human pose estimation in the wild. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, 18ā€“22 June 2018, pp. 7297ā€“7306 (2018)

    Google ScholarĀ 

  11. Han, X., Huang, W., Hu, X., Scott, M.R.: Clothflow: a flow-based model for clothed person generation. In: 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019, pp. 10470ā€“10479 (2019)

    Google ScholarĀ 

  12. Han, X., Wu, Z., Wu, Z., Yu, R., Davis, L.S.: VITON: an image-based virtual try-on network. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, 18ā€“22 June 2018, pp. 7543ā€“7552 (2018)

    Google ScholarĀ 

  13. He, K., Chen, X., Xie, S., Li, Y., DollĆ”r, P., Girshick, R.B.: Masked autoencoders are scalable vision learners. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, 18ā€“24 June 2022, pp. 15979ā€“15988 (2022)

    Google ScholarĀ 

  14. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770ā€“778 (2016)

    Google ScholarĀ 

  15. He, S., Song, Y., Xiang, T.: Style-based global appearance flow for virtual try-on. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, 18ā€“24 June 2022, pp. 3460ā€“3469 (2022)

    Google ScholarĀ 

  16. Huang, Z., Li, H., Xie, Z., Kampffmeyer, M., Cai, Q., Liang, X.: Towards hard-pose virtual try-on via 3d-aware global correspondence learning. arXiv preprint arXiv:2211.14052 (2022)

  17. Karras, T., Aittala, M., Hellsten, J., Laine, S., Lehtinen, J., Aila, T.: Training generative adversarial networks with limited data. In: Advances in Neural Information Processing Systems, pp. 12104ā€“12114 (2020)

    Google ScholarĀ 

  18. Karras, T., et al.: Alias-free generative adversarial networks. In: Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, 6ā€“14 December 2021, virtual, pp. 852ā€“863 (2021)

    Google ScholarĀ 

  19. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012, Proceedings of a meeting held 3ā€“6 December 2012, Lake Tahoe, Nevada, United States, pp. 1106ā€“1114 (2012)

    Google ScholarĀ 

  20. Liu, Z., et al.: Swin transformer V2: scaling up capacity and resolution. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, 18ā€“24 June 2022, pp. 11999ā€“12009 (2022)

    Google ScholarĀ 

  21. Mao, X., Li, Q., Xie, H., Lau, R.Y.K., Wang, Z., Smolley, S.P.: Least squares generative adversarial networks. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, 22ā€“29 October 2017, pp. 2813ā€“2821 (2017)

    Google ScholarĀ 

  22. Morelli, D., Baldrati, A., Cartella, G., Cornia, M., Bertini, M., Cucchiara, R.: LaDI-VTON: latent diffusion textual-inversion enhanced virtual try-on. arXiv preprint arXiv:2305.13501 (2023)

  23. Parmar, G., Zhang, R., Zhu, J.: On aliased resizing and surprising subtleties in GAN evaluation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, 18ā€“24 June 2022, pp. 11400ā€“11410 (2022)

    Google ScholarĀ 

  24. Qian, S., Chen, H., Xue, D., Fang, Q., Xu, C.: Open-world social event classification. In: Proceedings of the ACM Web Conference 2023, pp. 1562ā€“1571 (2023)

    Google ScholarĀ 

  25. Qian, S., Xue, D., Fang, Q., Xu, C.: Adaptive label-aware graph convolutional networks for cross-modal retrieval. IEEE Trans. Multimedia 24, 3520ā€“3532 (2021)

    ArticleĀ  Google ScholarĀ 

  26. Qian, S., Xue, D., Fang, Q., Xu, C.: Integrating multi-label contrastive learning with dual adversarial graph neural networks for cross-modal retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 45(4), 4794ā€“4811 (2022)

    Google ScholarĀ 

  27. Qian, S., Xue, D., Zhang, H., Fang, Q., Xu, C.: Dual adversarial graph neural networks for multi-label cross-modal retrieval. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 2440ā€“2448 (2021)

    Google ScholarĀ 

  28. Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-cam: Visual explanations from deep networks via gradient-based localization. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, 22ā€“29 October 2017, pp. 618ā€“626 (2017)

    Google ScholarĀ 

  29. Wang, B., Zheng, H., Liang, X., Chen, Y., Lin, L., Yang, M.: Toward characteristic-preserving image-based virtual try-on network. In: Computer Vision - ECCV 2018ā€“15th European Conference, Munich, Germany, 8ā€“14 September 2018, Proceedings, Part XIII, pp. 607ā€“623 (2018)

    Google ScholarĀ 

  30. Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600ā€“612 (2004)

    ArticleĀ  Google ScholarĀ 

  31. Xue, D., Qian, S., Fang, Q., Xu, C.: Mmt: Image-guided story ending generation with multimodal memory transformer. In: Proceedings of the 30th ACM International Conference on Multimedia, pp. 750ā€“758 (2022)

    Google ScholarĀ 

  32. Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, 18ā€“22 June 2018, pp. 586ā€“595 (2018)

    Google ScholarĀ 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shengsheng Qian .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

Ā© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, Y., Xiang, W., Zhang, S., Xue, D., Qian, S. (2024). TOAC: Try-On Aligning Conformer forĀ Image-Based Virtual Try-On Alignment. In: Fang, L., Pei, J., Zhai, G., Wang, R. (eds) Artificial Intelligence. CICAI 2023. Lecture Notes in Computer Science(), vol 14474. Springer, Singapore. https://doi.org/10.1007/978-981-99-9119-8_3

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-9119-8_3

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-9118-1

  • Online ISBN: 978-981-99-9119-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics