ABSTRACT
Sketch re-identification (Re-ID) refers to using sketches of pedestrians to retrieve their corresponding photos from surveillance videos. It can track pedestrians according to the sketches drawn based on eyewitnesses without querying pedestrian photos. Although the Sketch Re-ID concept has been proposed, the gap between the sketch and the photo still greatly hinders pedestrian identity matching. Based on the idea of transplantation without rejection, we propose a Cross-Compatible Embedding (CCE) approach to narrow the gap. A Semantic Consistent Feature Construction (SCFC) scheme is simultaneously presented to enhance feature discrimination. Under the guidance of identity consistency, the CCE performs cross modal interchange at the local token level in the Transformer framework, enabling the model to extract modal-compatible features. The SCFC improves the representation ability of features by handling the inconsistency of information in the same location of the sketch and the corresponding pedestrian photo. The SCFC scheme divides the local tokens of pedestrian images with different modes into different groups and assigns specific semantic information to each group for constructing a semantic consistent global feature representation. Experiments on the public Sketch Re-ID dataset confirm the effectiveness of the proposed method and its superiority over existing methods. Experiments on Sketch-based image retrieval datasets QMUL-Shoe-v2 and QMUL-Chair-v2 are conducted to assess the method's generalization. The results show that the proposed method outperforms the state-of-the-art works compared. The source code of our method is available at: https://github.com/lhf12278/CCSC.
Supplemental Material
Available for Download
- Ayan Kumar Bhunia, Pinaki Nath Chowdhury, Aneeshan Sain, Yongxin Yang, Tao Xiang, and Yi-Zhe Song. 2021. More photos are all you need: semi-supervised learning for fine-grained sketch based image retrieval. In CVPR. 4245--4254.Google Scholar
- Ayan Kumar Bhunia, Yongxin Yang, Timothy M. Hospedales, Tao Xiang, and Yi-Zhe Song. 2020. Sketch less for more: on-the-fly fine-grained sketch-based image retrieval. In CVPR. 9779--9788.Google Scholar
- Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. 2020. End-to-end object detection with transformers. In ECCV. 213--229.Google Scholar
- Chun-Fu (Richard) Chen, Quanfu Fan, and Rameswar Panda. 2021a. CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification. In ICCV. 347--356.Google Scholar
- Hanting Chen, Yunhe Wang, Tianyu Guo, Chang Xu, Yiping Deng, Zhenhua Liu, Siwei Ma, Chunjing Xu, Chao Xu, and Wen Gao. 2021b. Pre-trained image processing transformer. In CVPR. 12299--12310.Google Scholar
- Yangdong Chen, Zhaolong Zhang, Yanfei Wang, Yuejie Zhang, Rui Feng, Tao Zhang, and Weiguo Fan. 2022. AE-Net: Fine-grained sketch-based image retrieval via attention-enhanced network. Pattern Recognition, Vol. 122 (2022), 108291.Google ScholarDigital Library
- Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In CVPR. 248--255.Google Scholar
- Changxing Ding, Kan Wang, Pengfei Wang, and Dacheng Tao. 2022. Multi-task learning with coarse priors for robust part-aware person re-identification. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 44, 3 (2022), 1474--1488.Google ScholarCross Ref
- Shaojun Gui, Yu Zhu, Xiangxiang Qin, and Xiaofeng Ling. 2020. Learning multi-level domain invariant features for sketch re-identification. Neurocomputing, Vol. 403 (2020), 294--303.Google ScholarCross Ref
- Shuting He, Hao Luo, Pichao Wang, Fan Wang, Hao Li, and Wei Jiang. 2021. TransReID: transformer-based object re-identification. In ICCV.Google Scholar
- Rui Hu and John Collomosse. 2013. A performance evaluation of gradient field hog descriptor for sketch based image retrieval. Computer Vision and Image Understanding, Vol. 117, 7 (2013), 790--806.Google ScholarDigital Library
- Huafeng Li, Yiwen Chen, Dapeng Tao, Zhengtao Yu, and Guanqiu Qi. 2021a. Attribute-aligned domain-invariant feature learning for unsupervised domain adaptation person re-identification. IEEE Transactions on Information Forensics and Security, Vol. 16 (2021), 1480--1494.Google ScholarCross Ref
- Yulin Li, Jianfeng He, Tianzhu Zhang, Xiang Liu, Yongdong Zhang, and Feng Wu. 2021b. Diverse part discovery: occluded person re-identification with part-aware transformer. In CVPR. 2898--2907.Google Scholar
- Shengcai Liao and Ling Shao. 2021. Transformer-based deep image matching for generalizable person re-identification. arXiv preprint arXiv:2105.14432 (2021).Google Scholar
- Hangyu Lin, Yanwei Fu, Peng Lu, Shaogang Gong, Xiangyang Xue, and Yugang Jiang. 2019. TC-Net for iSBIR: Triplet classification network for instance-level sketch based image retrieval. In ACMMM. 1676--1684.Google Scholar
- Zhipu Liu, Lei Zhang, and Yang Yang. 2020. Hierarchical bi-directional feature perception network for person re-identification. In ACMMM. 4289--4298.Google Scholar
- Zhongxing Ma, Yifan Zhao, and Jia Li. 2021. Pose-guided inter- and intra-part relational transformer for occluded person re-identification. In ACMMM. 1487--1496.Google Scholar
- Lu Pang, Yaowei Wang, Yi-Zhe Song, Tiejun Huang, and Yonghong Tian. 2018. Cross-domain adversarial feature learning for sketch re-identification. In ACMMM. 609--617.Google Scholar
- Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. Pytorch: An imperative style, high-performance deep learning library. NeurlPS, Vol. 32, 8026--8037.Google Scholar
- Patsorn Sangkloy, Nathan Burnell, Cusuh Ham, and James Hays. 2016. The sketchy database: learning to retrieve badly drawn bunnies. ACM Transactions on Graphics, Vol. 35, 4 (2016), 1--12.Google ScholarDigital Library
- Jifei Song, Kaiyue Pang, Yi-Zhe Song, Tao Xiang, and Timothy M. Hospedales. 2018. Learning to sketch with shortcut cycle consistency. In CVPR. 801--810.Google Scholar
- Jifei Song, Qian Yu, Yi-Zhe Song, Tao Xiang, and Timothy M. Hospedales. 2017a. Deep spatial-semantic attention for fine-grained sketch-based image retrieval. In ICCV. 5551--5560.Google Scholar
- Jifei Song, Yi zhe Song, Tony Xiang, and Timothy Hospedales. 2017b. Fine-Grained Image Retrieval: the Text/Sketch Input Dilemma. In BMVC. 45.1--45.12.Google Scholar
- Peize Sun, Jinkun Cao, Yi Jiang, Rufeng Zhang, Enze Xie, Zehuan Yuan, Changhu Wang, and Ping Luo. 2020. Transtrack: multiple-object tracking with transformer. arXiv preprint arXiv:2012.15460 (2020). %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% for balanceGoogle Scholar
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In NeurIPS. 5998--6008.Google Scholar
- Xiaogang Wang, Gianfranco Doretto, Thomas Sebastian, Jens Rittscher, and Peter Tu. 2007. Shape and appearance context modeling. In ICCV. 1--8.Google Scholar
- Enze Xie, Wenjia Wang, Wenhai Wang, Peize Sun, Hang Xu, Ding Liang, and Ping Luo. 2021. Segmenting transparent objects in the wild with transformer. In IJCAI. 1194--1200.Google Scholar
- Lan Yang, Kaiyue Pang, Honggang Zhang, and Yi-Zhe Song. 2021. SketchAA: abstract representation for abstract sketches. In ICCV. 10077--10086.Google Scholar
- Qian Yu, Feng Liu, Yi-Zhe Song, Tao Xiang, Timothy M. Hospedales, and Chen Change Loy. 2016. Sketch me that shoe. In CVPR. 799--807.Google Scholar
- Qian Yu, Jifei Song, Yi-Zhe Song, Tao Xiang, and Timothy M. Hospedales. 2021. Fine-grained instance-level sketch-based image retrieval. International Journal of Computer Vision, Vol. 129 (2021), 484--500.Google ScholarDigital Library
- Liang Zheng, Liyue Shen, Lu Tian, Shengjin Wang, Jingdong Wang, and Qi Tian. 2015. Scalable person re-identification: A benchmark. In ICCV. 1116--1124.Google Scholar
- Zhun Zhong, Liang Zheng, Guoliang Kang, Shaozi Li, and Yi Yang. 2020. Random erasing data augmentation. In AAAI. 13001--13008.Google Scholar
- Kuan Zhu, Haiyun Guo, Shiliang Zhang, Yaowei Wang, Gaopan Huang, Honglin Qiao, Jing Liu, Jinqiao Wang, and Ming Tang. 2021. AAformer: auto-aligned transformer for person re-identification. arXiv preprint arXiv:2104.00921 (2021).Google Scholar
Index Terms
- Cross-Compatible Embedding and Semantic Consistent Feature Construction for Sketch Re-identification
Recommendations
Cross-Domain Adversarial Feature Learning for Sketch Re-identification
MM '18: Proceedings of the 26th ACM international conference on MultimediaUnder person re-identification (Re-ID), a query photo of the target person is often required for retrieval. However, one is not always guaranteed to have such a photo readily available under a practical forensic setting. In this paper, we define the ...
Beyond Domain Gap: Exploiting Subjectivity in Sketch-Based Person Retrieval
MM '23: Proceedings of the 31st ACM International Conference on MultimediaPerson re-identification (re-ID) requires densely distributed cameras. In practice, the person of interest may not be captured by cameras and therefore need to be retrieved using subjective information (e.g., sketches from witnesses). Previous research ...
Cross-view semantic projection learning for person re-identification
We propose a novel feature transformation algorithm for person re-identification.The hand-crafted features are mapped into the common semantic space via view-specific semantic projection functions.The common intrinsic structure is explored via a shared ...
Comments