research-article

Transform, Warp, and Dress: A New Transformation-guided Model for Virtual Try-on

Authors:
Matteo Fincato

University of Modena and Reggio Emilia, Modena, Italy

University of Modena and Reggio Emilia, Modena, Italy
View Profile

,
Marcella Cornia

University of Modena and Reggio Emilia, Modena, Italy

University of Modena and Reggio Emilia, Modena, Italy
View Profile

,
Federico Landi

University of Modena and Reggio Emilia, Modena, Italy

University of Modena and Reggio Emilia, Modena, Italy

0000-0003-2092-1934
View Profile

,
Fabio Cesari

YOOX NET-A-PORTER GROUP, Bologna, Italy

YOOX NET-A-PORTER GROUP, Bologna, Italy
View Profile

,
Rita Cucchiara

University of Modena and Reggio Emilia, Modena, Italy

University of Modena and Reggio Emilia, Modena, Italy

0000-0002-2239-283X
View Profile

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 18 Issue 2Article No.: 62pp 1–24https://doi.org/10.1145/3491226

Published:16 February 2022Publication History

ACM Transactions on Multimedia Computing, Communications, and Applications

Abstract

Virtual try-on has recently emerged in computer vision and multimedia communities with the development of architectures that can generate realistic images of a target person wearing a custom garment. This research interest is motivated by the large role played by e-commerce and online shopping in our society. Indeed, the virtual try-on task can offer many opportunities to improve the efficiency of preparing fashion catalogs and to enhance the online user experience. The problem is far to be solved: current architectures do not reach sufficient accuracy with respect to manually generated images and can only be trained on image pairs with a limited variety. Existing virtual try-on datasets have two main limits: they contain only female models, and all the images are available only in low resolution. This not only affects the generalization capabilities of the trained architectures but makes the deployment to real applications impractical. To overcome these issues, we present Dress Code, a new dataset for virtual try-on that contains high-resolution images of a large variety of upper-body clothes and both male and female models. Leveraging this enriched dataset, we propose a new model for virtual try-on capable of generating high-quality and photo-realistic images using a three-stage pipeline. The first two stages perform two different geometric transformations to warp the desired garment and make it fit into the target person’s body pose and shape. Then, we generate the new image of that same person wearing the try-on garment using a generative network. We test the proposed solution on the most widely used dataset for this task as well as on our newly collected dataset and demonstrate its effectiveness when compared to current state-of-the-art methods. Through extensive analyses on our Dress Code dataset, we show the adaptability of our model, which can generate try-on images even with a higher resolution.

REFERENCES

[1] Ayush Kumar, Jandial Surgan, Chopra Ayush, Hemani Mayur, and Krishnamurthy Balaji. 2019. Robust cloth warping via multi-scale patch adversarial loss for virtual try-on framework. In Proceedings of the ICCV Workshops.Google ScholarCross Ref
[2] Barratt Shane and Sharma Rishi. 2018. A note on the inception score. In Proceedings of the ICML Workshops.Google Scholar
[3] Bertiche Hugo, Madadi Meysam, and Escalera Sergio. 2020. CLOTH3D: Clothed 3D humans. In Proceedings of the ECCV.Google ScholarDigital Library
[4] Bińkowski Mikołaj, Sutherland Dougal J., Arbel Michael, and Gretton Arthur. 2018. Demystifying MMD GANs. In Proceedings of the ICLR.Google Scholar
[5] Bookstein Fred L.. 1989. Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Trans. PAMI 11, 6 (1989), 567–585. Google ScholarDigital Library
[6] Cao Zhe, Simon Tomas, Wei Shih-En, and Sheikh Yaser. 2017. Realtime multi-person 2D pose estimation using part affinity fields. In Proceedings of the CVPR.Google ScholarCross Ref
[7] Cucurull Guillem, Taslakian Perouz, and Vazquez David. 2019. Context-aware visual compatibility prediction. In Proceedings of the CVPR.Google ScholarCross Ref
[8] Deng Jia, Dong Wei, Socher Richard, Li Li-Jia, Li Kai, and Fei-Fei Li. 2009. ImageNet: A large-scale hierarchical image database. In Proceedings of the CVPR.Google ScholarCross Ref
[9] Dong Haoye, Liang Xiaodan, Gong Ke, Lai Hanjiang, Zhu Jia, and Yin Jian. 2018. Soft-gated warping-GAN for pose-guided person image synthesis. In Proceedings of the NeurIPS. Google ScholarDigital Library
[10] Dong Haoye, Liang Xiaodan, Shen Xiaohui, Wang Bochao, Lai Hanjiang, Zhu Jia, Hu Zhiting, and Yin Jian. 2019. Towards multi-pose guided virtual try-on network. In Proceedings of the ICCV.Google ScholarCross Ref
[11] Dong Haoye, Liang Xiaodan, Shen Xiaohui, Wu Bowen, Chen Bing-Cheng, and Yin Jian. 2019. FW-GAN: Flow-navigated warping GAN for video virtual try-on. In Proceedings of the ICCV.Google ScholarCross Ref
[12] Dong Xue, Wu Jianlong, Song Xuemeng, Dai Hongjun, and Nie Liqiang. 2020. Fashion compatibility modeling through a multi-modal try-on-guided scheme. In Proceedings of the ACM SIGIR. Google ScholarDigital Library
[13] Fincato Matteo, Landi Federico, Cornia Marcella, Fabio Cesari, and Cucchiara Rita. 2020. VITON-GT: An image-based virtual try-on model with geometric transformations. In Proceedings of the ICPR.Google Scholar
[14] Ge Yuying, Zhang Ruimao, Wang Xiaogang, Tang Xiaoou, and Luo Ping. 2019. DeepFashion2: A versatile benchmark for detection, pose estimation, segmentation and re-identification of clothing images. In Proceedings of the CVPR.Google ScholarCross Ref
[15] Goodfellow Ian, Pouget-Abadie Jean, Mirza Mehdi, Xu Bing, Warde-Farley David, Ozair Sherjil, Courville Aaron, and Bengio Yoshua. 2014. Generative adversarial nets. In Proceedings of the NeurIPS. Google ScholarDigital Library
[16] Guan Peng, Reiss Loretta, Hirshberg David A., Weiss Alexander, and Black Michael J.. 2012. Drape: Dressing any person. ACM Trans. Graph. 31, 4 (2012), 1–10. Google ScholarDigital Library
[17] Güler Rıza Alp, Neverova Natalia, and Kokkinos Iasonas. 2018. DensePose: Dense human pose estimation in the wild. In Proceedings of the CVPR.Google ScholarCross Ref
[18] Kiapour M. Hadi, Han Xufeng, Lazebnik Svetlana, Berg Alexander C., and Berg Tamara L.. 2015. Where to buy it: Matching street clothing photos in online shops. In Proceedings of the ICCV. Google ScholarDigital Library
[19] Hahn Fabian, Thomaszewski Bernhard, Coros Stelian, Sumner Robert W., Cole Forrester, Meyer Mark, DeRose Tony, and Gross Markus. 2014. Subspace clothing simulation using adaptive bases. ACM Trans. Graph. 33, 4 (2014), 1–9. Google ScholarDigital Library
[20] Han Xintong, Hu Xiaojun, Huang Weilin, and Scott Matthew R.. 2019. ClothFlow: A flow-based model for clothed person generation. In Proceedings of the ICCV.Google ScholarCross Ref
[21] Han Xintong, Wu Zuxuan, Wu Zhe, Yu Ruichi, and Davis Larry S.. 2018. VITON: An image-based virtual try-on network. In Proceedings of the CVPR.Google ScholarCross Ref
[22] Heusel Martin, Ramsauer Hubert, Unterthiner Thomas, Nessler Bernhard, Klambauer Günter, and Hochreiter Sepp. 2017. GANs trained by a two time-scale update rule converge to a nash equilibrium. In Proceedings of the NeurIPS. Google ScholarDigital Library
[23] Hsiao Wei-Lin and Grauman Kristen. 2018. Creating capsule wardrobes from fashion images. In Proceedings of the CVPR.Google ScholarCross Ref
[24] Hsieh Chia-Wei, Chen Chieh-Yun, Chou Chien-Lung, Shuai Hong-Han, and Cheng Wen-Huang. 2019. Fit-me: Image-based virtual try-on with arbitrary poses. In Proceedings of the ICIP.Google ScholarCross Ref
[25] Hsieh Chia-Wei, Chen Chieh-Yun, Chou Chien-Lung, Shuai Hong-Han, Liu Jiaying, and Cheng Wen-Huang. 2019. FashionOn: Semantic-guided image-based virtual try-on with detailed human and clothing information. In Proceedings of the ACM Multimedia. Google ScholarDigital Library
[26] Isola Phillip, Zhu Jun-Yan, Zhou Tinghui, and Efros Alexei A.. 2017. Image-to-image translation with conditional adversarial networks. In Proceedings of the CVPR.Google ScholarCross Ref
[27] Issenhuth Thibaut, Mary Jérémie, and Calauzènes Clément. 2020. Do not mask what you do not need to mask: A parser-free virtual try-on. In Proceedings of the ECCV.Google ScholarDigital Library
[28] Lee Hyug Jae, Lee Rokkyu, Kang Minseok, Cho Myounghoon, and Park Gunhan. 2019. LA-VITON: A network for looking-attractive virtual try-on. In Proceedings of the ICCV Workshops.Google ScholarCross Ref
[29] Jandial Surgan, Chopra Ayush, Ayush Kumar, Hemani Mayur, Krishnamurthy Balaji, and Halwai Abhijeet. 2020. SieveNet: A unified framework for robust image-based virtual try-on. In Proceedings of the WACV.Google ScholarCross Ref
[30] Jetchev Nikolay and Bergmann Urs. 2017. The conditional analogy GAN: Swapping fashion articles on people images. In Proceedings of the ICCV Workshops.Google ScholarCross Ref
[31] Johnson Justin, Alahi Alexandre, and Fei-Fei Li. 2016. Perceptual losses for real-time style transfer and super-resolution. In Proceedings of the ECCV.Google ScholarCross Ref
[32] Karras Tero, Laine Samuli, and Aila Timo. 2019. A style-based generator architecture for generative adversarial networks. In Proceedings of the CVPR.Google ScholarCross Ref
[33] Karras Tero, Laine Samuli, Aittala Miika, Hellsten Janne, Lehtinen Jaakko, and Aila Timo. 2020. Analyzing and improving the image quality of StyleGAN. In Proceedings of the CVPR.Google ScholarCross Ref
[34] Kingma Diederik P. and Ba Jimmy. 2015. Adam: A method for stochastic optimization. In Proceedings of the ICLR.Google Scholar
[35] Kuang Zhanghui, Gao Yiming, Li Guanbin, Luo Ping, Chen Yimin, Lin Liang, and Zhang Wayne. 2019. Fashion retrieval via graph reasoning networks on a similarity pyramid. In Proceedings of the ICCV.Google ScholarCross Ref
[36] Kuppa Gaurav, Jong Andrew, Liu Xin, Liu Ziwei, and Moh Teng-Sheng. 2021. ShineOn: Illuminating design choices for practical video-based virtual clothing try-on. In Proceedings of the WACV Workshops.Google ScholarCross Ref
[37] Lewis Kathleen M., Varadharajan Srivatsan, and Kemelmacher-Shlizerman Ira. 2021. VOGUE: Try-on by stylegan interpolation optimization. Retrieved from https://arXiv:2101.02285.Google Scholar
[38] Li Peike, Xu Yunqiu, Wei Yunchao, and Yang Yi. 2019. Self-correction for human parsing. Retrieved from https://arXiv:1910.09777.Google Scholar
[39] Liang Xiaodan, Liu Si, Shen Xiaohui, Yang Jianchao, Liu Luoqi, Dong Jian, Lin Liang, and Yan Shuicheng. 2015. Deep human parsing with active template regression. IEEE Trans. PAMI 37, 12 (2015), 2402–2414. Google ScholarDigital Library
[40] Liu Ziwei, Luo Ping, Qiu Shi, Wang Xiaogang, and Tang Xiaoou. 2016. DeepFashion: Powering robust clothes recognition and retrieval with rich annotations. In Proceedings of the CVPR.Google ScholarCross Ref
[41] Lorenz Dominik, Bereska Leonard, Milbich Timo, and Ommer Bjorn. 2019. Unsupervised part-based disentangling of object shape and appearance. In Proceedings of the CVPR.Google ScholarCross Ref
[42] Ma Liqian, Sun Qianru, Georgoulis Stamatios, Gool Luc Van, Schiele Bernt, and Fritz Mario. 2018. Disentangled person image generation. In Proceedings of the CVPR.Google ScholarCross Ref
[43] Ma Qianli, Yang Jinlong, Ranjan Anurag, Pujades Sergi, Pons-Moll Gerard, Tang Siyu, and Black Michael J.. 2020. Learning to dress 3D people in generative clothing. In Proceedings of the CVPR.Google ScholarCross Ref
[44] Manfredi Marco, Grana Costantino, Calderara Simone, and Cucchiara Rita. 2014. A complete system for garment segmentation and color classification. Mach. Vision Appl. 25, 4 (2014), 955–969. Google ScholarDigital Library
[45] Minar Matiur Rahman and Ahn Heejune. 2020. CloTH-VTON: Clothing three-dimensional reconstruction for hybrid image-based virtual try-ON. In Proceedings of the ACCV.Google Scholar
[46] Minar Matiur Rahman, Tuan Thai Thanh, Ahn Heejune, Rosin Paul, and Lai Yu-Kun. 2020. CP-VTON+: Clothing shape and texture preserving image-based virtual try-on. In Proceedings of the CVPR Workshops.Google Scholar
[47] Mir Aymen, Alldieck Thiemo, and Pons-Moll Gerard. 2020. Learning to transfer texture from clothing images to 3d humans. In Proceedings of the CVPR.Google ScholarCross Ref
[48] Morelli Davide, Cornia Marcella, and Cucchiara Rita. 2021. FashionSearch++: Improving consumer-to-shop clothes retrieval with hard negatives. In Proceedings of the Italian Information Retrieval Workshop.Google Scholar
[49] Neuberger Assaf, Borenstein Eran, Hilleli Bar, Oks Eduard, and Alpert Sharon. 2020. Image based virtual try-on network from unpaired data. In Proceedings of the CVPR.Google ScholarCross Ref
[50] Philbin James, Chum Ondrej, Isard Michael, Sivic Josef, and Zisserman Andrew. 2007. Object retrieval with large vocabularies and fast spatial matching. In Proceedings of the CVPR.Google ScholarCross Ref
[51] Pons-Moll Gerard, Pujades Sergi, Hu Sonny, and Black Michael J.. 2017. ClothCap: Seamless 4D clothing capture and retargeting. ACM Trans. Graph. 36, 4 (2017), 1–15. Google ScholarDigital Library
[52] Raffiee Amir Hossein and Sollami Michael. 2020. GarmentGAN: Photo-realistic adversarial fashion transfer. Retrieved from https://arXiv:2003.01894.Google Scholar
[53] Raj Amit, Sangkloy Patsorn, Chang Huiwen, Lu Jingwan, Ceylan Duygu, and Hays James. 2018. SwapNet: Image based garment transfer. In Proceedings of the ECCV.Google ScholarCross Ref
[54] Rocco Ignacio, Arandjelovic Relja, and Sivic Josef. 2017. Convolutional neural network architecture for geometric matching. In Proceedings of the CVPR.Google ScholarCross Ref
[55] Ronneberger Olaf, Fischer Philipp, and Brox Thomas. 2015. U-Net: Convolutional networks for biomedical image segmentation. In Proceedings of the MICCAI.Google ScholarCross Ref
[56] Salimans Tim, Goodfellow Ian, Zaremba Wojciech, Cheung Vicki, Radford Alec, and Chen Xi. 2016. Improved techniques for training GANs. In Proceedings of the NeurIPS. Google ScholarDigital Library
[57] Simonyan Karen and Zisserman Andrew. 2015. Very deep convolutional networks for large-scale image recognition. In Proceedings of the ICLR.Google Scholar
[58] Song Xuemeng, Nie Liqiang, and Wang Yinglong. 2019. Compatibility modeling: Data and knowledge applications for clothing matching. Synth. Lect. Info. Conc. Retriev. Serv. 11, 3 (2019), 1–138.Google ScholarCross Ref
[59] Szegedy Christian, Vanhoucke Vincent, Ioffe Sergey, Shlens Jon, and Wojna Zbigniew. 2016. Rethinking the inception architecture for computer vision. In Proceedings of the CVPR.Google ScholarCross Ref
[60] Tiwari Garvita, Bhatnagar Bharat Lal, Tung Tony, and Pons-Moll Gerard. 2020. SIZER: A dataset and model for parsing 3D clothing and learning size sensitive 3D clothing. In Proceedings of the ECCV.Google ScholarDigital Library
[61] Vasileva Mariya I., Plummer Bryan A., Dusad Krishna, Rajpal Shreya, Kumar Ranjitha, and Forsyth David. 2018. Learning type-aware embeddings for fashion compatibility. In Proceedings of the ECCV.Google ScholarCross Ref
[62] Wang Bochao, Zheng Huabin, Liang Xiaodan, Chen Yimin, Lin Liang, and Yang Meng. 2018. Toward characteristic-preserving image-based virtual try-on network. In Proceedings of the ECCV.Google ScholarCross Ref
[63] Wang Ting-Chun, Liu Ming-Yu, Zhu Jun-Yan, Tao Andrew, Kautz Jan, and Catanzaro Bryan. 2018. High-resolution image synthesis and semantic manipulation with conditional GANs. In Proceedings of the CVPR.Google ScholarCross Ref
[64] Wang Wenguan, Xu Yuanlu, Shen Jianbing, and Zhu Song-Chun. 2018. Attentive fashion grammar network for fashion landmark detection and clothing category classification. In Proceedings of the CVPR.Google ScholarCross Ref
[65] Wu Zhonghua, Lin Guosheng, Tao Qingyi, and Cai Jianfei. 2019. M2E-try on net: Fashion from model to everyone. In Proceedings of the ACM Multimedia. Google ScholarDigital Library
[66] Xiao Han, Rasul Kashif, and Vollgraf Roland. 2017. Fashion-MNIST: A novel image dataset for benchmarking machine learning algorithms. Retrieved from https://arXiv:1708.07747.Google Scholar
[67] Yamaguchi Kota, Kiapour M. Hadi, and Berg Tamara L.. 2013. Paper doll parsing: Retrieving similar styles to parse clothing items. In Proceedings of the ICCV. Google ScholarDigital Library
[68] Yang Han, Zhang Ruimao, Guo Xiaobao, Liu Wei, Zuo Wangmeng, and Luo Ping. 2020. Towards photo-realistic virtual try-on by adaptively generating-preserving image content. In Proceedings of the CVPR.Google ScholarCross Ref
[69] Yildirim Gokhan, Jetchev Nikolay, Vollgraf Roland, and Bergmann Urs. 2019. Generating high-resolution fashion model images wearing custom outfits. In Proceedings of the ICCV Workshops.Google ScholarCross Ref
[70] Yu Ruiyun, Wang Xiaoqi, and Xie Xiaohui. 2019. VTNFP: An image-based virtual try-on network with body and clothing feature preservation. In Proceedings of the ICCV.Google ScholarCross Ref
[71] Zhang Richard, Isola Phillip, Efros Alexei A., Shechtman Eli, and Wang Oliver. 2018. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the CVPR.Google ScholarCross Ref
[72] Zhu Heming, Cao Yu, Jin Hang, Chen Weikai, Du Dong, Wang Zhangye, Cui Shuguang, and Han Xiaoguang. 2020. Deep Fashion3D: A dataset and benchmark for 3D garment reconstruction from single images. In Proceedings of the ECCV.Google ScholarDigital Library

Index Terms

Transform, Warp, and Dress: A New Transformation-guided Model for Virtual Try-on
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision representations
        Appearance and texture representations
        Image representations
      2. Computer vision tasks

Recommendations

Per Garment Capture and Synthesis for Real-time Virtual Try-on
UIST '21: The 34th Annual ACM Symposium on User Interface Software and Technology

Virtual try-on is a promising application of computer graphics and human computer interaction that can have a profound real-world impact especially during this pandemic. Existing image-based works try to synthesize a try-on image from a single image of ...
Read More
Dress Code: High-Resolution Multi-category Virtual Try-On
Computer Vision – ECCV 2022
Abstract
Image-based virtual try-on strives to transfer the appearance of a clothing item onto the image of a target person. Prior work focuses mainly on upper-body clothes (e.g. t-shirts, shirts, and tops) and neglects full-body or lower-body items. This ...
Read More
Customized 3D Clothes Modeling for Virtual Try-on System based on Multiple Kinects
GRAPP 2016: Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications: Volume 1: GRAPP

Most existing 3D virtual try-on systems put clothes designed in one environment on a human captured in another environment, which cause the mismatching brightness problem. And also typical 3D clothes modeling starts with manually designed 2D patterns, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Multimedia Computing, Communications, and Applications Volume 18, Issue 2
May 2022
494 pages
ISSN:1551-6857
EISSN:1551-6865
DOI:10.1145/3505207
Editor:
Alberto Del Bimbo
University of Firenze, Italy
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 16 February 2022
- Accepted: 1 August 2021
- Revised: 1 July 2021
- Received: 1 March 2021
Published in tomm Volume 18, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Virtual try-on
geometric transformations
generative adversarial networks
Qualifiers
- research-article
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 9
  Total Citations
  View Citations
- 1,010
  Total Downloads
- Downloads (Last 12 months)317
- Downloads (Last 6 weeks)24
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

View Full Text

HTML Format

View this article in HTML Format .

View HTML Format

Transform, Warp, and Dress: A New Transformation-guided Model for Virtual Try-on

ACM Transactions on Multimedia Computing, Communications, and Applications

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

Per Garment Capture and Synthesis for Real-time Virtual Try-on

Dress Code: High-Resolution Multi-category Virtual Try-On

Customized 3D Clothes Modeling for Virtual Try-on System based on Multiple Kinects