A Local-Global Self-attention Interaction Network for RGB-D Cross-Modal Person Re-identification

Zhu, Chuanlei; Li, Xiaohong; Qi, Meibin; Liu, Yimin; Zhang, Long

doi:10.1007/978-3-031-18916-6_8

Chuanlei Zhu¹⁵,
Xiaohong Li¹⁵,
Meibin Qi¹⁵,
Yimin Liu¹⁵ &
…
Long Zhang¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13537))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

1452 Accesses

Abstract

RGB-D cross-modal person re-identification (Re-ID) task aims to match the person images between the RGB and depth modalities. This task is rather challenging for the tremendous discrepancy between these two modalities in addition to common issues such as lighting conditions, human posture, camera angle, etc. Nowadays only few types of research focus on this task, and existing Re-ID methods tend to learn homogeneous structural relationships in an image, which have limited discriminability and weak robustness to noisy images. In this paper, we propose A Local-Global Interaction Network dedicated to processing cross-modal problems. The network can constrain the center distance between two modals, and improve the intra-class cross-modality similarity. Besides, it can also learn the local and global features of different modalities to enrich the features extracted from different modes. We validate the effectiveness of our approach on public benchmark datasets. Experimental results demonstrate our method outperforms other state-of-the-arts in terms of visual quality and quantitative measurement.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Gong, S., Cristani, M., Loy, C.C., Hospedales, T.M.: The re-identification challenge. In: Gong, S., Cristani, M., Yan, S., Loy, C.C. (eds.) Person Re-Identification. ACVPR, pp. 1–20. Springer, London (2014). https://doi.org/10.1007/978-1-4471-6296-4_1
Chapter Google Scholar
Zhang, P., Xu, J., Wu, Q., Huang, Y., Zhang, J.: Top-push constrained modality-adaptive dictionary learning for cross-modality person re-identification. IEEE Trans. Circuits Syst. Video Technol. 30(12), 4554–4566 (2019)
Article Google Scholar
Zhuo, J., Zhu, J., Lai, J., Xie, X.: Person re-identification on heterogeneous camera network. In: Yang, J., et al. (eds.) CCCV 2017. CCIS, vol. 773, pp. 280–291. Springer, Singapore (2017). https://doi.org/10.1007/978-981-10-7305-2_25
Chapter Google Scholar
Liao, S., Hu, Y., Zhu, X., Li, S.Z.: Person re-identification by local maximal occurrence representation and metric learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2197–2206 (2015)
Google Scholar
Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., Tian, Q.: Scalable person re-identification: a benchmark. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1116–1124 (2015)
Google Scholar
Lv, J., Chen, W., Li, Q., et al.: Unsupervised cross-dataset person re-identification by transfer learning of spatial-temporal patterns. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7948–7956 (2018)
Google Scholar
Wu, D., Zheng, S.J., Zhang, X.P., et al.: Deep learning-based methods for person re-identification: a comprehensive review. Neurocomputing 337, 354–371 (2019)
Article Google Scholar
Wu, L., Wang, Y., Gao, J.B., et al.: Where-and-when to look: deep Siamese attention networks for video-based person re-identification. IEEE Trans. Multimedia 21(6), 1412–1424 (2019)
Article Google Scholar
Ye, M., Lan, X.Y., Li, J.W., et al.: Hierarchical discriminative learning for visible thermal person re-identification. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, no. 1, pp. 7501–7508 (2018)
Google Scholar
Mogelmose, A., Bahnsen, C., Moeslund, T., Clapés, A., Escalera, S.: Tri-modal person re-identification with RGB, depth and thermal features. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 301–307 (2013)
Google Scholar
Pala, F., Satta, R., Fumera, G., Roli, F.: Multimodal person reidentification using RGB-D cameras. IEEE Trans. Circuits Syst. Video Technol. 26(4), 788–799 (2015)
Article Google Scholar
Hafner, F.M., Bhuiyan, A., Kooij, J.F., Granger, E.: RGB-depth cross-modal person re-identification. In: 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 1–8. IEEE (2019)
Google Scholar
Wu, J., Jiang, J., Qi, M., et al.: An end-to-end heterogeneous restraint network for RGB-D cross-modal person re-identification. ACM Trans. Multimedia Comput. Commun. Appl. 18(4), 1–22 (2022)
Google Scholar
He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Hermans, A., Beyer, L., Leibe, B.: In defense of the triplet loss for person re-identification. arXiv preprint arXiv:1703.07737 (2017)
Zhu, Y., Yang, Z., Wang, L., et al.: Hetero-center loss for cross-modality person re-identification. Neurocomputing 386(2020), 97–109 (2020)
Article Google Scholar
Liu, W., Wen, Y., Yu, Z., et al.: Large-margin softmax loss for convolutional neural networks. In: ICML, vol. 2, no. 3, p. 7 (2016)
Google Scholar
Liu, Z., Lin, Y., Cao, Y., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021)
Google Scholar
Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Sun, Y., Xu, Q., Li, Y., et al.: Perceive where to focus: learning visibility-aware part-level features for partial person re-identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 393–402 (2019)
Google Scholar
Ye, M., Shen, J., Shao, L.: Visible-infrared person re-identification via homogeneous augmented tri-modal learning. IEEE Trans. Inf. Forensics Secur. 16, 728–739 (2020)
Article Google Scholar
Wang, G., Yang, S., Liu, H., et al.: High-order information matters: learning relation and topology for occluded person re-identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6449–6458 (2020)
Google Scholar
Ye, M., Shen, J., Crandall, D.J., Shao, L., Luo, J.: Dynamic dual-attentive aggregation learning for visible-infrared person re-identification. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12362, pp. 229–247. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58520-4_14
Chapter Google Scholar
Jiang, J., Jin, K., Qi, M., et al.: A cross-modal multi-granularity attention network for RGB-IR person re-identification. Neurocomputing 406, 59–67 (2020)
Article Google Scholar
Li, D., Wei, X., Hong, X., et al.: Infrared-visible cross-modal person re-identification with an x modality. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 4, pp. 4610–4617 (2020)
Google Scholar
Liu, H., Hu, L., Ma, L.: Online RGB-D person re-identification based on metric model update. CAAI Trans. Intell. Technol. 2(1), 48–55 (2017)
Article Google Scholar
Munaro, M., Fossati, A., Basso, A., Menegatti, E., Van Gool, L.: One-shot person re-identification with a consumer depth camera. In: Gong, S., Cristani, M., Yan, S., Loy, C.C. (eds.) Person Re-Identification. ACVPR, pp. 161–181. Springer, London (2014). https://doi.org/10.1007/978-1-4471-6296-4_8
Chapter Google Scholar
Deng, J., Dong, W., Socher, R.: Imagenet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)
Google Scholar
Liao, S., Hu, Y., Zhu, X., et al.: Person re-identification by local maximal occurrence representation and metric learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2197–2206 (2015)
Google Scholar
Lisanti, G., Masi, I., Bagdanov, A.D., et al.: Person re-identification by iterative re-weighted sparse ranking. IEEE Trans. Pattern Anal. Mach. Intell. 37(8), 1629–1642 (2014)
Article Google Scholar

Download references

Acknowledgements

This work was supported by Hefei Municipal Natural Science Foundation under Grant No. 2021050. This work was also supported by The National Natural Science Foundation of China under Grant No. 62172137.

Author information

Authors and Affiliations

Hefei University of Technology, Anhui, 230009, China
Chuanlei Zhu, Xiaohong Li, Meibin Qi, Yimin Liu & Long Zhang

Authors

Chuanlei Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaohong Li
View author publications
You can also search for this author in PubMed Google Scholar
Meibin Qi
View author publications
You can also search for this author in PubMed Google Scholar
Yimin Liu
View author publications
You can also search for this author in PubMed Google Scholar
Long Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaohong Li .

Editor information

Editors and Affiliations

Southern University of Science and Technology, Shenzhen, China
Shiqi Yu
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Zhaoxiang Zhang
Hong Kong Baptist University, Hong Kong, China
Pong C. Yuen
Northwestern Polytechnical University, Xi'an, China
Junwei Han
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Tieniu Tan
Hong Kong Baptist University, Hong Kong, China
Yike Guo
Sun Yat-sen University, Guangzhou, China
Jianhuang Lai
Southern University of Science and Technology, Shenzhen, China
Jianguo Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhu, C., Li, X., Qi, M., Liu, Y., Zhang, L. (2022). A Local-Global Self-attention Interaction Network for RGB-D Cross-Modal Person Re-identification. In: Yu, S., et al. Pattern Recognition and Computer Vision. PRCV 2022. Lecture Notes in Computer Science, vol 13537. Springer, Cham. https://doi.org/10.1007/978-3-031-18916-6_8

Download citation

DOI: https://doi.org/10.1007/978-3-031-18916-6_8
Published: 27 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-18915-9
Online ISBN: 978-3-031-18916-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Local-Global Self-attention Interaction Network for RGB-D Cross-Modal Person Re-identification