TOAC: Try-On Aligning Conformer for Image-Based Virtual Try-On Alignment

Wang, Yifei; Xiang, Wang; Zhang, Shengjie; Xue, Dizhan; Qian, Shengsheng

doi:10.1007/978-981-99-9119-8_3

Yifei Wang¹¹,
Wang Xiang¹¹,
Shengjie Zhang¹¹,
Dizhan Xue¹² &
…
Shengsheng Qian¹²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14474))

Included in the following conference series:

CAAI International Conference on Artificial Intelligence

274 Accesses

Abstract

Recently, Image-based Virtual Try-on has garnered increasing attention within the realm of online apparel e-commerce, which aims to virtually superimpose garments onto images of portraits. Image-based virtual try-on generally consists of two steps: Image-based Virtual Try-on Alignment and Image-based Virtual Try-on Generation. In this paper, we focus on Image-based Virtual Try-on Alignment (IVTA), which plays a pivotal role in virtual try-on and aligns the target garment with the portrait. Current approaches for IVTA mostly adopt Convolutional Neural Networks (CNN) to extract local detailed features of both garments and portraits, ignoring the significance of global extensive features. To address this problem, we propose a novel model named Try-On Aligning Conformer (TOAC) to effectively aligns the target garment with the portrait and improve virtual try-on. Firstly, we integrate both Swin Transformer and CNN to comprehensively extract both global patterns and local details. Secondly, we propose a robust learned perceptual loss between generative reconstructed garment images and the ground truth to alleviate the overlap problem. Extensive experiments demonstrate the superiority of our proposed model compared to the state-of-the-art methods for virtual try-on alignment.

Supported by National Natural Science Foundation of China (No. 62036012, 62276257).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Barratt, S., Sharma, R.: A note on the inception score. arXiv preprint arXiv:1801.01973 (2018)
Choi, S., Park, S., Lee, M., Choo, J.: VITON-HD: high-resolution virtual try-on via misalignment-aware normalization. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, 19–25 June 2021, pp. 14131–14140 (2021)
Google Scholar
Chopra, A., Jain, R., Hemani, M., Krishnamurthy, B.: Zflow: gated appearance flow-based virtual try-on with 3d priors. In: 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, 10–17 October 2021, pp. 5413–5422 (2021)
Google Scholar
Dong, X., et al.: Dressing in the wild by watching dance videos. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, 18–24 June 2022, pp. 3470–3479 (2022)
Google Scholar
Frühstück, A., Singh, K.K., Shechtman, E., Mitra, N.J., Wonka, P., Lu, J.: Insetgan for full-body image generation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, 18–24 June 2022, pp. 7713–7722 (2022)
Google Scholar
Fu, J. et al.: StyleGAN-human: a data-centric odyssey of human generation. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision - ECCV 2022, ECCV 2022, LNCS, Part XVI, vol. 13676, pp. 1–19. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19787-1_1
Ge, Y., Song, Y., Zhang, R., Ge, C., Liu, W., Luo, P.: Parser-free virtual try-on via distilling appearance flows. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, 19–25 June 2021, pp. 8485–8493 (2021)
Google Scholar
Gong, K., Liang, X., Li, Y., Chen, Y., Yang, M., Lin, L.: Instance-level human parsing via part grouping network. In: Computer Vision - ECCV 2018–15th European Conference, Munich, Germany, 8–14 September 2018, Proceedings, Part IV, pp. 805–822 (2018)
Google Scholar
Goodfellow, I.J., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 8–13 December 2014, Montreal, Quebec, Canada, pp. 2672–2680 (2014)
Google Scholar
Güler, R.A., Neverova, N., Kokkinos, I.: Densepose: dense human pose estimation in the wild. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, 18–22 June 2018, pp. 7297–7306 (2018)
Google Scholar
Han, X., Huang, W., Hu, X., Scott, M.R.: Clothflow: a flow-based model for clothed person generation. In: 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019, pp. 10470–10479 (2019)
Google Scholar
Han, X., Wu, Z., Wu, Z., Yu, R., Davis, L.S.: VITON: an image-based virtual try-on network. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, 18–22 June 2018, pp. 7543–7552 (2018)
Google Scholar
He, K., Chen, X., Xie, S., Li, Y., Dollár, P., Girshick, R.B.: Masked autoencoders are scalable vision learners. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, 18–24 June 2022, pp. 15979–15988 (2022)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
He, S., Song, Y., Xiang, T.: Style-based global appearance flow for virtual try-on. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, 18–24 June 2022, pp. 3460–3469 (2022)
Google Scholar
Huang, Z., Li, H., Xie, Z., Kampffmeyer, M., Cai, Q., Liang, X.: Towards hard-pose virtual try-on via 3d-aware global correspondence learning. arXiv preprint arXiv:2211.14052 (2022)
Karras, T., Aittala, M., Hellsten, J., Laine, S., Lehtinen, J., Aila, T.: Training generative adversarial networks with limited data. In: Advances in Neural Information Processing Systems, pp. 12104–12114 (2020)
Google Scholar
Karras, T., et al.: Alias-free generative adversarial networks. In: Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, 6–14 December 2021, virtual, pp. 852–863 (2021)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012, Proceedings of a meeting held 3–6 December 2012, Lake Tahoe, Nevada, United States, pp. 1106–1114 (2012)
Google Scholar
Liu, Z., et al.: Swin transformer V2: scaling up capacity and resolution. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, 18–24 June 2022, pp. 11999–12009 (2022)
Google Scholar
Mao, X., Li, Q., Xie, H., Lau, R.Y.K., Wang, Z., Smolley, S.P.: Least squares generative adversarial networks. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, 22–29 October 2017, pp. 2813–2821 (2017)
Google Scholar
Morelli, D., Baldrati, A., Cartella, G., Cornia, M., Bertini, M., Cucchiara, R.: LaDI-VTON: latent diffusion textual-inversion enhanced virtual try-on. arXiv preprint arXiv:2305.13501 (2023)
Parmar, G., Zhang, R., Zhu, J.: On aliased resizing and surprising subtleties in GAN evaluation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, 18–24 June 2022, pp. 11400–11410 (2022)
Google Scholar
Qian, S., Chen, H., Xue, D., Fang, Q., Xu, C.: Open-world social event classification. In: Proceedings of the ACM Web Conference 2023, pp. 1562–1571 (2023)
Google Scholar
Qian, S., Xue, D., Fang, Q., Xu, C.: Adaptive label-aware graph convolutional networks for cross-modal retrieval. IEEE Trans. Multimedia 24, 3520–3532 (2021)
Article Google Scholar
Qian, S., Xue, D., Fang, Q., Xu, C.: Integrating multi-label contrastive learning with dual adversarial graph neural networks for cross-modal retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 45(4), 4794–4811 (2022)
Google Scholar
Qian, S., Xue, D., Zhang, H., Fang, Q., Xu, C.: Dual adversarial graph neural networks for multi-label cross-modal retrieval. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 2440–2448 (2021)
Google Scholar
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-cam: Visual explanations from deep networks via gradient-based localization. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, 22–29 October 2017, pp. 618–626 (2017)
Google Scholar
Wang, B., Zheng, H., Liang, X., Chen, Y., Lin, L., Yang, M.: Toward characteristic-preserving image-based virtual try-on network. In: Computer Vision - ECCV 2018–15th European Conference, Munich, Germany, 8–14 September 2018, Proceedings, Part XIII, pp. 607–623 (2018)
Google Scholar
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)
Article Google Scholar
Xue, D., Qian, S., Fang, Q., Xu, C.: Mmt: Image-guided story ending generation with multimodal memory transformer. In: Proceedings of the 30th ACM International Conference on Multimedia, pp. 750–758 (2022)
Google Scholar
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, 18–22 June 2018, pp. 586–595 (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Zhengzhou University, Zhengzhou, China
Yifei Wang, Wang Xiang & Shengjie Zhang
State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China
Dizhan Xue & Shengsheng Qian

Authors

Yifei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Wang Xiang
View author publications
You can also search for this author in PubMed Google Scholar
Shengjie Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Dizhan Xue
View author publications
You can also search for this author in PubMed Google Scholar
Shengsheng Qian
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shengsheng Qian .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Lu Fang
Duke University, Durham, NC, USA
Jian Pei
Shanghai Jiao Tong Univeristy, Shanghai, China
Guangtao Zhai
Chinese Academy of Sciences, Beijing, China
Ruiping Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, Y., Xiang, W., Zhang, S., Xue, D., Qian, S. (2024). TOAC: Try-On Aligning Conformer for Image-Based Virtual Try-On Alignment. In: Fang, L., Pei, J., Zhai, G., Wang, R. (eds) Artificial Intelligence. CICAI 2023. Lecture Notes in Computer Science(), vol 14474. Springer, Singapore. https://doi.org/10.1007/978-981-99-9119-8_3

Download citation

DOI: https://doi.org/10.1007/978-981-99-9119-8_3
Published: 03 February 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-9118-1
Online ISBN: 978-981-99-9119-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

TOAC: Try-On Aligning Conformer for Image-Based Virtual Try-On Alignment