research-article

Fashion Image Search via Anchor-Free Detector

Authors:

Mingbo ZhaoAuthors Info & Claims

ICMR '22: Proceedings of the 2022 International Conference on Multimedia Retrieval

Pages 416 - 425

https://doi.org/10.1145/3512527.3531355

Published: 27 June 2022 Publication History

Abstract

Clothes image search is the key technique to effectively search the clothes items that are most relevant to the query clothes given by the customer. In this work, we propose an Anchor-free framework for clothes image search by adopting an additional Re-ID branch for similarity learning and global mask branch for instance segmentation. The Re-ID branch is to extract richer feature of target clothes, where we develop a mask pooling layer to aggregate the feature by utilizing the mask of target clothes as the guidance. In this way, the extracted feature will involve more information covered by the mask area of targets instead of only the center point; the global mask branch is to be trained with detection and Re-ID branches simultaneously, where the estimated mask of target clothes can be utilized in reference procedure to guide the feature extraction. Finally, to further enhance the performance of retrieval, we have introduced a match loss to further fine-tune the Re-ID embedding branch in the framework, so that the clothes target can be closer to the same one, while be farther away from different clothes targets. Extensive simulations have been conducted and the results verify the effectiveness of the proposed work.

Supplementary Material

MP4 File (ICMR22-fp005.mp4)

Presentation Video.

Download
22.99 MB

References

[1]

Alexey Bochkovskiy, Chien-Yao Wang, and Hong-Yuan Mark Liao. 2020. Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020).

[2]

Daniel Bolya, Chong Zhou, Fanyi Xiao, and Yong Jae Lee. 2019. Yolact: Real-time instance segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 9157--9166.

[3]

Zeyu Cui, Zekun Li, Shu Wu, Xiao-Yu Zhang, and Liang Wang. 2019. Dressing as a whole: Outfit compatibility learning based on node-wise graph neural networks. In The World Wide Web Conference. 307--317.

Digital Library

[4]

Guodong Ding, Salman Khan, Zhenmin Tang, and Fatih Porikli. 2020. Feature mask network for person re-identification. Pattern Recognition Letters, Vol. 137 (2020), 91--98.

[5]

Yiming Gao, Zhanghui Kuang, Guanbin Li, Ping Luo, Yimin Chen, Liang Lin, and Wayne Zhang. 2020. Fashion Retrieval via Graph Reasoning Networks on a Similarity Pyramid. IEEE Transactions on Pattern Analysis and Machine Intelligence (2020).

Digital Library

[6]

Yuying Ge, Ruimao Zhang, Xiaogang Wang, Xiaoou Tang, and Ping Luo. 2019. Deepfashion2: A versatile benchmark for detection, pose estimation, segmentation and re-identification of clothing images. In Proceedings of the IEEE conference on computer vision and pattern recognition. 5337--5345.

[7]

Ke Gong, Xiaodan Liang, Dongyu Zhang, Xiaohui Shen, and Liang Lin. 2017. Look into person: Self-supervised structure-sensitive learning and a new benchmark for human parsing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 932--940.

[8]

Xiaoxiao Guo, Hui Wu, Yupeng Gao, Steven Rennie, and Rogerio Feris. 2019. The Fashion IQ Dataset: Retrieving Images by Combining Side Information and Relative Natural Language Feedback. arXiv preprint arXiv:1905.12794 (2019).

[9]

Xiaoxiao Guo, Hui Wu, Yupeng Gao, Steven Rennie, and Rogerio Feris. 2020. Fashion IQ: A New Dataset towards Retrieving Images by Natural Language Feedback. arXiv preprint arXiv:1905.12794 (2020).

[10]

Xintong Han, Zuxuan Wu, Yu-Gang Jiang, and Larry S Davis. 2017. Learning fashion compatibility with bidirectional lstms. In Proceedings of the 25th ACM international conference on Multimedia. 1078--1086.

Digital Library

[11]

Xintong Han, Zuxuan Wu, Zhe Wu, Ruichi Yu, and Larry S Davis. 2018. Viton: An image-based virtual try-on network. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7543--7552.

[12]

Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision. 2961--2969.

[13]

Ruining He, Charles Packer, and Julian McAuley. 2016. Learning compatibility across categories for heterogeneous item recommendation. In 2016 IEEE 16th International Conference on Data Mining (ICDM). IEEE, 937--942.

[14]

Mehrdad Hosseinzadeh and Yang Wang. 2020. Composed Query Image Retrieval Using Locally Bounded Features. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3596--3605.

[15]

Tomoharu Iwata, Shinji Wanatabe, and Hiroshi Sawada. 2011. Fashion coordinates recommender system using photographs from fashion magazines. In IJCAI, Vol. 22. Citeseer, 2262.

[16]

Surgan Jandial, Ayush Chopra, Pinkesh Badjatiya, Pranit Chawla, Mausoom Sarkar, and Balaji Krishnamurthy. 2020. TRACE: Transform Aggregate and Compose Visiolinguistic Representations for Image Search with Text Feedback. arXiv preprint arXiv:2009.01485 (2020).

[17]

Alexander Kirillov, Yuxin Wu, Kaiming He, and Ross Girshick. 2020. Pointrend: Image segmentation as rendering. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 9799--9808.

[18]

Zhanghui Kuang, Yiming Gao, Guanbin Li, Ping Luo, Yimin Chen, Liang Lin, and Wayne Zhang. 2019. Fashion retrieval via graph reasoning networks on a similarity pyramid. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3066--3075.

[19]

Hei Law and Jia Deng. 2018. Cornernet: Detecting objects as paired keypoints. Proceedings of the European conference on computer vision (ECCV). 734--750.

Digital Library

[20]

Youngwan Lee and Jongyoul Park. 2020. Centermask: Real-time anchor-free instance segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 13906--13915.

[21]

Jianshu Li, Jian Zhao, Yunchao Wei, Congyan Lang, Yidong Li, Terence Sim, Shuicheng Yan, and Jiashi Feng. 2017. Multiple-human parsing in the wild. arXiv preprint arXiv:1705.07206 (2017).

[22]

Peike Li, Yunqiu Xu, Yunchao Wei, and Yi Yang. 2019. Self-Correction for Human Parsing. arXiv preprint arXiv:1910.09777 (2019).

[23]

Yuncheng Li, Liangliang Cao, Jiang Zhu, and Jiebo Luo. 2017. Mining fashion outfit composition using an end-to-end deep learning approach on set data. IEEE Transactions on Multimedia, Vol. 19, 8 (2017), 1946--1955.

Digital Library

[24]

Xiaodan Liang, Si Liu, Xiaohui Shen, Jianchao Yang, Luoqi Liu, Jian Dong, Liang Lin, and Shuicheng Yan. 2015. Deep human parsing with active template regression. IEEE transactions on pattern analysis and machine intelligence, Vol. 37, 12 (2015), 2402--2414.

Digital Library

[25]

Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2017. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision. 2980--2988.

[26]

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In European conference on computer vision. Springer, 740--755.

[27]

Yen-Liang Lin, Son Tran, and Larry S Davis. 2020. Fashion Outfit Complementary Item Retrieval. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3311--3319.

[28]

Jingyuan Liu and Hong Lu. 2018. Deep fashion analysis with feature map upsampling and landmark-driven attention. In Proceedings of the European Conference on Computer Vision (ECCV). 0--0.

[29]

Ziwei Liu, Ping Luo, Shi Qiu, Xiaogang Wang, and Xiaoou Tang. 2016a. DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]

Ziwei Liu, Sijie Yan, Ping Luo, Xiaogang Wang, and Xiaoou Tang. 2016b. Fashion Landmark Detection in the Wild. In European Conference on Computer Vision (ECCV).

[31]

Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3431--3440.

[32]

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems. 91--99.

[33]

Yong-Siang Shih, Kai-Yueh Chang, Hsuan-Tien Lin, and Min Sun. 2018. Compatibility family learning for item recommendation and generation. In Thirty-Second AAAI Conference on Artificial Intelligence.

[34]

Chunfeng Song, Yan Huang, Wanli Ouyang, and Liang Wang. 2018. Mask-guided contrastive attention model for person re-identification. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1179--1188.

[35]

Zhi Tian, Chunhua Shen, and Hao Chen. 2020. Conditional convolutions for instance segmentation. In Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part I 16. Springer, 282--298.

[36]

Zhi Tian, Chunhua Shen, Hao Chen, and Tong He. 2019. Fcos: Fully convolutional one-stage object detection. In Proceedings of the IEEE/CVF international conference on computer vision. 9627--9636.

[37]

Andreas Veit, Balazs Kovacs, Sean Bell, Julian McAuley, Kavita Bala, and Serge Belongie. 2015. Learning visual clothing style with heterogeneous dyadic co-occurrences. In Proceedings of the IEEE International Conference on Computer Vision. 4642--4650.

Digital Library

[38]

Bochao Wang, Huabin Zheng, Xiaodan Liang, Yimin Chen, Liang Lin, and Meng Yang. 2018. Toward characteristic-preserving image-based virtual try-on network. In Proceedings of the European Conference on Computer Vision (ECCV). 589--604.

Digital Library

[39]

Wenguan Wang, Yuanlu Xu, Jianbing Shen, and Song-Chun Zhu. 2018. Attentive fashion grammar network for fashion landmark detection and clothing category classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4271--4280.

[40]

Wenguan Wang, Zhijie Zhang, Siyuan Qi, Jianbing Shen, Yanwei Pang, and Ling Shao. 2019. Learning compositional neural information fusion for human parsing. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 5703--5713.

[41]

Wenguan Wang, Hailong Zhu, Jifeng Dai, Yanwei Pang, Jianbing Shen, and Ling Shao. 2020. Hierarchical human parsing with typed part-relation reasoning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8929--8939.

[42]

Xinlong Wang, Tao Kong, Chunhua Shen, Yuning Jiang, and Lei Li. 2020. Solo: Segmenting objects by locations. In European Conference on Computer Vision. Springer, 649--665.

Digital Library

[43]

Xinlong Wang, Rufeng Zhang, Tao Kong, Lei Li, and Chunhua Shen. 2020 b. Solov2: Dynamic, faster and stronger. arXiv e-prints (2020), arXiv--2003.

[44]

Yuqing Wang, Zhaoliang Xu, Hao Shen, Baoshan Cheng, and Lirong Yang. 2020 a. Centermask: single shot instance segmentation with point representation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 9313--9321.

[45]

Jialian Wu, Jiale Cao, Liangchen Song, Yu Wang, Ming Yang, and Junsong Yuan. 2021. Track to Detect and Segment: An Online Multi-Object Tracker. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12352--12361.

[46]

Baoming Yan, Bo Gao, et al. 2020. Watch and Buy: A Large-Scale Multimodal Dataset for Fashion Identification in Livestreaming.

[47]

Yichao Yan, Jinpeng Li, Jie Qin, Song Bai, Shengcai Liao, Li Liu, Fan Zhu, and Ling Shao. 2021. Anchor-Free Person Search. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7690--7699.

[48]

Wei Zeng, Mingbo Zhao, Yuan Gao, and Zhao Zhang. 2020. TileGAN: category-oriented attention-based high-quality tiled clothes generation from dressed person. NEURAL COMPUTING & APPLICATIONS (2020).

[49]

Yifu Zhang, Chunyu Wang, Xinggang Wang, Wenjun Zeng, and Wenyu Liu. 2020. Fairmot: On the fairness of detection and re-identification in multiple object tracking. arXiv preprint arXiv:2004.01888 (2020).

[50]

Mingbo Zhao, Yu Liu, Xianrui Li, Zhao Zhang, and Yue Zhang. 2020. An end-to-end framework for clothing collocation based on semantic feature fusion. IEEE MultiMedia, Vol. 27, 4 (2020), 122--132.

Digital Library

[51]

Xingyi Zhou, Vladlen Koltun, and Philipp Krahenbühl. 2020. Tracking objects as points. In European Conference on Computer Vision. Springer, 474--490.

Digital Library

[52]

Xingyi Zhou, Dequan Wang, and Philipp Krahenbühl. 2019. Objects as points. arXiv preprint arXiv:1904.07850 (2019).

Cited By

Sukel MRudinac SWorring M(2024)Multimodal Temporal Fusion Transformers are Good Product Demand ForecastersIEEE MultiMedia10.1109/MMUL.2024.337382731:2(48-60)Online publication date: 1-Apr-2024
https://dl.acm.org/doi/10.1109/MMUL.2024.3373827
Zhan JLuo YGuo CWu YMeng JLiu J(2024)YOLOPXPattern Recognition10.1016/j.patcog.2023.110152148:COnline publication date: 17-Apr-2024
https://dl.acm.org/doi/10.1016/j.patcog.2023.110152
Islam SJoardar SSekh A(2024)BangleFIR: bridging the gap in fashion image retrieval with a novel dataset of banglesMultimedia Tools and Applications10.1007/s11042-024-19698-4Online publication date: 10-Jul-2024
https://doi.org/10.1007/s11042-024-19698-4
Show More Cited By

Index Terms

Fashion Image Search via Anchor-Free Detector
1. Information systems
  1. Information retrieval
    1. Specialized information retrieval
      1. Multimedia and multimodal retrieval
        Image search

Recommendations

Joint Clothes Detection and Attribution Prediction via Anchor-free Framework with Decoupled Representation Transformer
CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge Management

Clothes attribution prediction is the key technology for users to automatically describe clothing characteristics. Most current methods are first to detect the multiple clothes, and then crop out the clothes and feed to a certain network for clothes ...
Joint clothes image detection and search via anchor free framework
Abstract
Clothes image search is an important learning task in fashion analysis to find the most relevant clothes in a database given a user-provided query. To address this problem, most existing methods employ a two-step approach, i.e., first detect the ...
Fashion Detection and Search via Decoupled Anchor-free Framework
DSDE '24: Proceedings of the 2024 7th International Conference on Data Storage and Data Engineering

Clothing image search serves as a pivotal technique for efficiently retrieving the most relevant clothing items based on customer queries. The prevailing approaches consist of two serial steps, which first detect and crop out the target clothing, and ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMR '22: Proceedings of the 2022 International Conference on Multimedia Retrieval

June 2022

714 pages

ISBN:9781450392389

DOI:10.1145/3512527

General Chairs:
Vincent Oria
New Jersey Institute of Technology, USA
,
Maria Luisa Sapino
Università degli Studi di Torino, Italy
,
Shin'ichi Satoh
National Institute of Informatics, Japan
,
Brigitte Kerhervé
Université du Québec à Montréal, Canada
,
Program Chairs:
Wen-Huang Cheng
National Yang Ming Chao Tung University, Taiwan
,
Ichiro Ide
Nagoya University, Japan
,
Vivek Singh
Rutgers University, USA

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 June 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China
National Natural Science Foundation of China

Conference

ICMR '22

Sponsor:

SIGMM

ICMR '22: International Conference on Multimedia Retrieval

June 27 - 30, 2022

NJ, Newark, USA

Acceptance Rates

Overall Acceptance Rate 254 of 830 submissions, 31%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
163
Total Downloads

Downloads (Last 12 months)11
Downloads (Last 6 weeks)3

Reflects downloads up to 30 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Sukel MRudinac SWorring M(2024)Multimodal Temporal Fusion Transformers are Good Product Demand ForecastersIEEE MultiMedia10.1109/MMUL.2024.337382731:2(48-60)Online publication date: 1-Apr-2024
https://dl.acm.org/doi/10.1109/MMUL.2024.3373827
Zhan JLuo YGuo CWu YMeng JLiu J(2024)YOLOPXPattern Recognition10.1016/j.patcog.2023.110152148:COnline publication date: 17-Apr-2024
https://dl.acm.org/doi/10.1016/j.patcog.2023.110152
Islam SJoardar SSekh A(2024)BangleFIR: bridging the gap in fashion image retrieval with a novel dataset of banglesMultimedia Tools and Applications10.1007/s11042-024-19698-4Online publication date: 10-Jul-2024
https://doi.org/10.1007/s11042-024-19698-4
Mu XZhang HShi JHou JMa JYang Y(2024)Fashion intelligence in the Metaverse: promise and future prospectsArtificial Intelligence Review10.1007/s10462-024-10703-857:3Online publication date: 20-Feb-2024
https://doi.org/10.1007/s10462-024-10703-8
Islam SJoardar SSekh* A(2023)A Survey on Fashion Image RetrievalACM Computing Surveys10.1145/363655256:6(1-25)Online publication date: 13-Dec-2023
https://dl.acm.org/doi/10.1145/3636552

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten