skip to main content
10.1145/3512527.3531355acmconferencesArticle/Chapter ViewAbstractPublication PagesicmrConference Proceedingsconference-collections
research-article

Fashion Image Search via Anchor-Free Detector

Published: 27 June 2022 Publication History

Abstract

Clothes image search is the key technique to effectively search the clothes items that are most relevant to the query clothes given by the customer. In this work, we propose an Anchor-free framework for clothes image search by adopting an additional Re-ID branch for similarity learning and global mask branch for instance segmentation. The Re-ID branch is to extract richer feature of target clothes, where we develop a mask pooling layer to aggregate the feature by utilizing the mask of target clothes as the guidance. In this way, the extracted feature will involve more information covered by the mask area of targets instead of only the center point; the global mask branch is to be trained with detection and Re-ID branches simultaneously, where the estimated mask of target clothes can be utilized in reference procedure to guide the feature extraction. Finally, to further enhance the performance of retrieval, we have introduced a match loss to further fine-tune the Re-ID embedding branch in the framework, so that the clothes target can be closer to the same one, while be farther away from different clothes targets. Extensive simulations have been conducted and the results verify the effectiveness of the proposed work.

Supplementary Material

MP4 File (ICMR22-fp005.mp4)
Presentation Video.

References

[1]
Alexey Bochkovskiy, Chien-Yao Wang, and Hong-Yuan Mark Liao. 2020. Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020).
[2]
Daniel Bolya, Chong Zhou, Fanyi Xiao, and Yong Jae Lee. 2019. Yolact: Real-time instance segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 9157--9166.
[3]
Zeyu Cui, Zekun Li, Shu Wu, Xiao-Yu Zhang, and Liang Wang. 2019. Dressing as a whole: Outfit compatibility learning based on node-wise graph neural networks. In The World Wide Web Conference. 307--317.
[4]
Guodong Ding, Salman Khan, Zhenmin Tang, and Fatih Porikli. 2020. Feature mask network for person re-identification. Pattern Recognition Letters, Vol. 137 (2020), 91--98.
[5]
Yiming Gao, Zhanghui Kuang, Guanbin Li, Ping Luo, Yimin Chen, Liang Lin, and Wayne Zhang. 2020. Fashion Retrieval via Graph Reasoning Networks on a Similarity Pyramid. IEEE Transactions on Pattern Analysis and Machine Intelligence (2020).
[6]
Yuying Ge, Ruimao Zhang, Xiaogang Wang, Xiaoou Tang, and Ping Luo. 2019. Deepfashion2: A versatile benchmark for detection, pose estimation, segmentation and re-identification of clothing images. In Proceedings of the IEEE conference on computer vision and pattern recognition. 5337--5345.
[7]
Ke Gong, Xiaodan Liang, Dongyu Zhang, Xiaohui Shen, and Liang Lin. 2017. Look into person: Self-supervised structure-sensitive learning and a new benchmark for human parsing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 932--940.
[8]
Xiaoxiao Guo, Hui Wu, Yupeng Gao, Steven Rennie, and Rogerio Feris. 2019. The Fashion IQ Dataset: Retrieving Images by Combining Side Information and Relative Natural Language Feedback. arXiv preprint arXiv:1905.12794 (2019).
[9]
Xiaoxiao Guo, Hui Wu, Yupeng Gao, Steven Rennie, and Rogerio Feris. 2020. Fashion IQ: A New Dataset towards Retrieving Images by Natural Language Feedback. arXiv preprint arXiv:1905.12794 (2020).
[10]
Xintong Han, Zuxuan Wu, Yu-Gang Jiang, and Larry S Davis. 2017. Learning fashion compatibility with bidirectional lstms. In Proceedings of the 25th ACM international conference on Multimedia. 1078--1086.
[11]
Xintong Han, Zuxuan Wu, Zhe Wu, Ruichi Yu, and Larry S Davis. 2018. Viton: An image-based virtual try-on network. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7543--7552.
[12]
Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision. 2961--2969.
[13]
Ruining He, Charles Packer, and Julian McAuley. 2016. Learning compatibility across categories for heterogeneous item recommendation. In 2016 IEEE 16th International Conference on Data Mining (ICDM). IEEE, 937--942.
[14]
Mehrdad Hosseinzadeh and Yang Wang. 2020. Composed Query Image Retrieval Using Locally Bounded Features. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3596--3605.
[15]
Tomoharu Iwata, Shinji Wanatabe, and Hiroshi Sawada. 2011. Fashion coordinates recommender system using photographs from fashion magazines. In IJCAI, Vol. 22. Citeseer, 2262.
[16]
Surgan Jandial, Ayush Chopra, Pinkesh Badjatiya, Pranit Chawla, Mausoom Sarkar, and Balaji Krishnamurthy. 2020. TRACE: Transform Aggregate and Compose Visiolinguistic Representations for Image Search with Text Feedback. arXiv preprint arXiv:2009.01485 (2020).
[17]
Alexander Kirillov, Yuxin Wu, Kaiming He, and Ross Girshick. 2020. Pointrend: Image segmentation as rendering. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 9799--9808.
[18]
Zhanghui Kuang, Yiming Gao, Guanbin Li, Ping Luo, Yimin Chen, Liang Lin, and Wayne Zhang. 2019. Fashion retrieval via graph reasoning networks on a similarity pyramid. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3066--3075.
[19]
Hei Law and Jia Deng. 2018. Cornernet: Detecting objects as paired keypoints. Proceedings of the European conference on computer vision (ECCV). 734--750.
[20]
Youngwan Lee and Jongyoul Park. 2020. Centermask: Real-time anchor-free instance segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 13906--13915.
[21]
Jianshu Li, Jian Zhao, Yunchao Wei, Congyan Lang, Yidong Li, Terence Sim, Shuicheng Yan, and Jiashi Feng. 2017. Multiple-human parsing in the wild. arXiv preprint arXiv:1705.07206 (2017).
[22]
Peike Li, Yunqiu Xu, Yunchao Wei, and Yi Yang. 2019. Self-Correction for Human Parsing. arXiv preprint arXiv:1910.09777 (2019).
[23]
Yuncheng Li, Liangliang Cao, Jiang Zhu, and Jiebo Luo. 2017. Mining fashion outfit composition using an end-to-end deep learning approach on set data. IEEE Transactions on Multimedia, Vol. 19, 8 (2017), 1946--1955.
[24]
Xiaodan Liang, Si Liu, Xiaohui Shen, Jianchao Yang, Luoqi Liu, Jian Dong, Liang Lin, and Shuicheng Yan. 2015. Deep human parsing with active template regression. IEEE transactions on pattern analysis and machine intelligence, Vol. 37, 12 (2015), 2402--2414.
[25]
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2017. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision. 2980--2988.
[26]
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In European conference on computer vision. Springer, 740--755.
[27]
Yen-Liang Lin, Son Tran, and Larry S Davis. 2020. Fashion Outfit Complementary Item Retrieval. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3311--3319.
[28]
Jingyuan Liu and Hong Lu. 2018. Deep fashion analysis with feature map upsampling and landmark-driven attention. In Proceedings of the European Conference on Computer Vision (ECCV). 0--0.
[29]
Ziwei Liu, Ping Luo, Shi Qiu, Xiaogang Wang, and Xiaoou Tang. 2016a. DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[30]
Ziwei Liu, Sijie Yan, Ping Luo, Xiaogang Wang, and Xiaoou Tang. 2016b. Fashion Landmark Detection in the Wild. In European Conference on Computer Vision (ECCV).
[31]
Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3431--3440.
[32]
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems. 91--99.
[33]
Yong-Siang Shih, Kai-Yueh Chang, Hsuan-Tien Lin, and Min Sun. 2018. Compatibility family learning for item recommendation and generation. In Thirty-Second AAAI Conference on Artificial Intelligence.
[34]
Chunfeng Song, Yan Huang, Wanli Ouyang, and Liang Wang. 2018. Mask-guided contrastive attention model for person re-identification. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1179--1188.
[35]
Zhi Tian, Chunhua Shen, and Hao Chen. 2020. Conditional convolutions for instance segmentation. In Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part I 16. Springer, 282--298.
[36]
Zhi Tian, Chunhua Shen, Hao Chen, and Tong He. 2019. Fcos: Fully convolutional one-stage object detection. In Proceedings of the IEEE/CVF international conference on computer vision. 9627--9636.
[37]
Andreas Veit, Balazs Kovacs, Sean Bell, Julian McAuley, Kavita Bala, and Serge Belongie. 2015. Learning visual clothing style with heterogeneous dyadic co-occurrences. In Proceedings of the IEEE International Conference on Computer Vision. 4642--4650.
[38]
Bochao Wang, Huabin Zheng, Xiaodan Liang, Yimin Chen, Liang Lin, and Meng Yang. 2018. Toward characteristic-preserving image-based virtual try-on network. In Proceedings of the European Conference on Computer Vision (ECCV). 589--604.
[39]
Wenguan Wang, Yuanlu Xu, Jianbing Shen, and Song-Chun Zhu. 2018. Attentive fashion grammar network for fashion landmark detection and clothing category classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4271--4280.
[40]
Wenguan Wang, Zhijie Zhang, Siyuan Qi, Jianbing Shen, Yanwei Pang, and Ling Shao. 2019. Learning compositional neural information fusion for human parsing. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 5703--5713.
[41]
Wenguan Wang, Hailong Zhu, Jifeng Dai, Yanwei Pang, Jianbing Shen, and Ling Shao. 2020. Hierarchical human parsing with typed part-relation reasoning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8929--8939.
[42]
Xinlong Wang, Tao Kong, Chunhua Shen, Yuning Jiang, and Lei Li. 2020. Solo: Segmenting objects by locations. In European Conference on Computer Vision. Springer, 649--665.
[43]
Xinlong Wang, Rufeng Zhang, Tao Kong, Lei Li, and Chunhua Shen. 2020 b. Solov2: Dynamic, faster and stronger. arXiv e-prints (2020), arXiv--2003.
[44]
Yuqing Wang, Zhaoliang Xu, Hao Shen, Baoshan Cheng, and Lirong Yang. 2020 a. Centermask: single shot instance segmentation with point representation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 9313--9321.
[45]
Jialian Wu, Jiale Cao, Liangchen Song, Yu Wang, Ming Yang, and Junsong Yuan. 2021. Track to Detect and Segment: An Online Multi-Object Tracker. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12352--12361.
[46]
Baoming Yan, Bo Gao, et al. 2020. Watch and Buy: A Large-Scale Multimodal Dataset for Fashion Identification in Livestreaming.
[47]
Yichao Yan, Jinpeng Li, Jie Qin, Song Bai, Shengcai Liao, Li Liu, Fan Zhu, and Ling Shao. 2021. Anchor-Free Person Search. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7690--7699.
[48]
Wei Zeng, Mingbo Zhao, Yuan Gao, and Zhao Zhang. 2020. TileGAN: category-oriented attention-based high-quality tiled clothes generation from dressed person. NEURAL COMPUTING & APPLICATIONS (2020).
[49]
Yifu Zhang, Chunyu Wang, Xinggang Wang, Wenjun Zeng, and Wenyu Liu. 2020. Fairmot: On the fairness of detection and re-identification in multiple object tracking. arXiv preprint arXiv:2004.01888 (2020).
[50]
Mingbo Zhao, Yu Liu, Xianrui Li, Zhao Zhang, and Yue Zhang. 2020. An end-to-end framework for clothing collocation based on semantic feature fusion. IEEE MultiMedia, Vol. 27, 4 (2020), 122--132.
[51]
Xingyi Zhou, Vladlen Koltun, and Philipp Krahenbühl. 2020. Tracking objects as points. In European Conference on Computer Vision. Springer, 474--490.
[52]
Xingyi Zhou, Dequan Wang, and Philipp Krahenbühl. 2019. Objects as points. arXiv preprint arXiv:1904.07850 (2019).

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICMR '22: Proceedings of the 2022 International Conference on Multimedia Retrieval
June 2022
714 pages
ISBN:9781450392389
DOI:10.1145/3512527
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 June 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. anchor-free detectors
  2. clothes image search
  3. end-to-end learning
  4. object detection
  5. re-identity

Qualifiers

  • Research-article

Funding Sources

Conference

ICMR '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 254 of 830 submissions, 31%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)3
Reflects downloads up to 30 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media