Abstract
The method of generating pseudo-labels by clustering is proved to be effective in unsupervised domain adaptation (UDA) person re-identification (re-ID). However, the pseudo-labels contain a lot of noise, which hinders the further improvement of the performance of the model. Extracting representative features is the key to solve the above problem. In this paper, we propose the Part-Pixel Transformer with Smooth Alignment Fusion Network (PTFNet) to capture richer discriminative pedestrian features. Specifically, we design a Part-Pixel Transformer (PPformer) to model the long-range dependence between features, which adopts the horizontal splitting method to obtain horizontal parts with more highly correlated regions of the image. At the same time, the interaction of pixel-level information is further captured in each horizontal part. In addition, we also propose a Smooth Alignment Fusion (SAF) module, which is composed of Smooth Alignment block (SA-Block) and Cross-layer Fusion block (CF-Block). Firstly, the cross-layer features are smoothed by SA-Block to reduce the semantic gap between the features of different layers. Then, it is fed into the CF-Block to complete the aggregation of low-level features with spatial information and high-level features with semantic information. Extensive experiments show that our proposed methods can significantly surpass the performance of previous works on UDA tasks for person re-ID.





Similar content being viewed by others
References
Song, L., Zhou, X., Chen, Y.: Global attention-assisted representation learning for vehicle re-identification. SIViP 16(3), 807–815 (2022)
Tagore, N.K., Chattopadhyay, P. A bi-network architecture for occlusion handling in Person re-identification. Signal, Image and Video Processing, 1–9 (2022).
Wu, Q., Dai, P., Chen, P., et al.: Deep adversarial data augmentation with attribute guided for person re-identification. SIViP 15, 655–662 (2021)
Zhang, X., Hou, M., Deng, X., et al.: Multi-cascaded attention and overlapping part features network for person re-identification. SIViP 16(6), 1525–1532 (2022)
Ding, Y., Fan, H., Xu, M., et al.: Adaptive exploration for unsupervised person re-identification. ACM Transact. Multimedia Comput. Commun. Appl. (TOMM) 16(1), 1–19 (2020)
Zhong, Z., Zheng, L., Luo, Z. et al. Invariance matters: Exemplar memory for domain adaptive person re-identification. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 598–607 (2019).
Tao, X., Kong, J., Jiang, M., et al.: Unsupervised domain adaptation by multi-loss gap minimization learning for person re-identification. IEEE Trans. Circuits Syst. Video Technol. 32(7), 4404–4416 (2021)
Song, L., Wang, C., Zhang, L., et al.: Unsupervised domain adaptive re-identification: theory and practice. Pattern Recogn. 102, 107173 (2020)
Kumar, D., Siva, P., Marchwica, P. et al.: Unsupervised domain adaptation in person re-id via k-reciprocal clustering and large-scale heterogeneous environment synthesis. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2645–2654 (2020).
Zhai, Y., Lu, S., Ye, Q. et al. Ad-cluster: augmented discriminative clustering for domain adaptive person re-identification. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9021–9030 (2020).
Dosovitskiy, A., Beyer, L., Kolesnikov, A. et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, (2020).
Zou, Y., Yang, X., Yu, Z. et al. Joint disentangling and adaptation for cross-domain person re-identification. Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16: Springer, 87–104 (2020).
Yang, F., Yan, K., Lu, S., et al.: Part-aware progressive unsupervised domain adaptation for person re-identification. IEEE Trans. Multimedia 23, 1681–1695 (2020)
He, S., Luo, H., Wang, P. et al. Transreid: transformer-based object re-identification. Proceedings of the IEEE/CVF international conference on computer vision, 15013–15022 (2021).
Zhou, D., Kang, B., Jin, X. et al.: Deepvit: towards deeper vision transformer. arXiv preprint arXiv:2103.11886, 2021.
Lin, H., Cheng, X., Wu. X. et al. Cat: cross attention in vision transformer. 2022 IEEE International Conference on Multimedia and Expo (ICME): IEEE, 1–6 (2022).
Chu, X., Tian, Z., Wang, Y., et al.: Twins: revisiting the design of spatial attention in vision transformers. Adv. Neural. Inf. Process. Syst. 34, 9355–9366 (2021)
Liu, W., Anguelov, D., Erhan, D. et al.: Ssd: single shot multibox detector. Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14: Springer, 21–37 (2016).
Simonyan, K., Zisserman, A. Very deep convolutional networks for large-scale image recognition[J]. arXiv preprint arXiv:1409.1556, (2014).
Zhu, Z., Xu, M., Bai, S. et al.: Asymmetric non-local neural networks for semantic segmentation. Proceedings of the IEEE/CVF International Conference on Computer Vision, 593–602 (2019).
Honari, S., Yosinski, J., Vincent, P. et al.: Recombinator networks: learning coarse-to-fine feature aggregation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5743–5752 (2016).
Luo, W., Yang, X., Mo, X. et al.: Cross-x learning for fine-grained visual categorization. Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019: 8242–8251.
Ge, Y., Chen, D., Li, H.: Mutual mean-teaching: pseudo label refinery for unsupervised domain adaptation on person re-identification. arXiv preprint arXiv:2001.01526, (2020).
Si, T., He, F., Wu, H., et al.: Spatial-driven features based on image dependencies for person re-identification. Pattern Recogn. 124, 108462 (2022)
Luo, H., Gu, Y., Liao, X. et al.: Bag of tricks and a strong baseline for deep person re-identification. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition WORKSHOPS, 0–0 (2019).
He, K., Zhang, X., Ren, S. et al. Deep residual learning for image recognition. Proceedings of the IEEE conference on computer vision and pattern recognition, 770–778 (2016).
Ristani, E., Solera, F., Zou, R. et al.: Performance measures and a data set for multi-target, multi-camera tracking[C]. Computer Vision–ECCV 2016 Workshops: Amsterdam, The Netherlands, October 8-10 and 15-16, 2016, Proceedings, Part II: Springer, 17-35 (2016).
Zheng, Z., Zheng, L., Yang, Y.: Unlabeled samples generated by gan improve the person re-identification baseline in vitro. Proceedings of the IEEE International Conference on Computer Vision, 3754–3762 (2017).
Zheng, L., Shen, L., Tian, L. et al.: Scalable person re-identification: a benchmark. Proceedings of the IEEE International Conference on Computer Vision, 1116–1124 (2015).
Wei, L., Zhang, S., Gao, W. et al.: Person transfer gan to bridge domain gap for person re-identification[C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 79–88 (2018).
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
Zhong, Z., Zheng, L., Cao, D. et al.: Re-ranking person re-identification with k-reciprocal encoding[C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1318–1327 (2017).
Zhao, F., Liao, S., Xie, G.-S. et al.: Unsupervised domain adaptation with noise resistible mutual-training for person re-identification. Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XI 16: Springer, 526–544 (2020).
Zhai, Y., Ye, Q., Lu, S., et al.: Multiple expert brainstorming for domain adaptive person re-identification. Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VII 16: Springer, 594–611 (2020).
Ge, Y., Zhu, F., Chen, D., et al.: Self-paced contrastive learning with hybrid memory for domain adaptive object re-id. Adv. Neural. Inf. Process. Syst. 33, 11309–11321 (2020)
Zheng, K., Lan, C., Zeng, W. et al.: Exploiting sample uncertainty for domain adaptive person re-identification. Proceedings of the AAAI Conference on Artificial Intelligence, 3538–3546 (2021).
Wang, W., Zhao, F., Liao, S., et al.: Attentive WaveBlock: Complementarity-enhanced mutual networks for unsupervised domain adaptation in person re-identification and beyond. IEEE Trans. Image Process. 31, 1532–1544 (2022)
Dai, Y., Liu, J., Bai, Y., et al.: Dual-refinement: Joint label and feature refinement for unsupervised domain adaptive person re-identification. IEEE Trans. Image Process. 30, 7815–7829 (2021)
Chen, H., Lagadec, B., Bremond, F.: Enhancing diversity in teacher-student networks via asymmetric branches for unsupervised person re-identification. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 1–10 (2021).
Zheng, K., Liu, W., He, L. et al.: Group-aware label transfer for domain adaptive person re-identification. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5310–5319 (2021).
Zheng, Y., Tang, S., Teng, G. et al.: Online pseudo label generation by hierarchical cluster dynamics for adaptive person re-identification. Proceedings of the IEEE/CVF International Conference on Computer Vision, 8371–8381 (2021).
Han, J., Li, Y.-L., Wang, S.: Delving into probabilistic uncertainty for unsupervised domain adaptive person re-identification. Proceedings of the AAAI Conference on Artificial Intelligence, 790–798 (2022).
Si, T., He, F., Zhang, Z. et al.: Hybrid contrastive learning for unsupervised person re-identification. IEEE Transactions on Multimedia, (2022).
Acknowledgements
This work was partially supported by the Fundamental Research Funds for the Central Universities (No. JUSRP41908), the National Natural Science Foundation of China (Nos. 62371209, 62371208, 61362030 and 61201429), China Postdoctoral Science Foundation (Nos. 2015M581720 and 2016M600360), and 111 Projects under Grant No.B12018.
Author information
Authors and Affiliations
Contributions
JK and HZ contributed significantly to analysis and manuscript preparation. MJ and TL performed the data analyses and wrote the manuscript. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Ethical approval
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Kong, J., Zhou, H., Jiang, M. et al. Part-pixel transformer with smooth alignment fusion for domain adaptation person re-identification. SIViP 18, 3737–3744 (2024). https://doi.org/10.1007/s11760-024-03037-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11760-024-03037-z