H-ViT: Hybrid Vision Transformer for Multi-modal Vehicle Re-identification

Pan, Wenjie; Wu, Hanxiao; Zhu, Jianqing; Zeng, Huanqiang; Zhu, Xiaobin

doi:10.1007/978-3-031-20497-5_21

Wenjie Pan¹²,
Hanxiao Wu¹³,
Jianqing Zhu¹²,
Huanqiang Zeng¹² &
…
Xiaobin Zhu¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13604))

Included in the following conference series:

CAAI International Conference on Artificial Intelligence

1975 Accesses

Abstract

Vehicle re-identification (ReID) is a critical technology in smart city and has drawn much attention. Many studies focus on single-modal (i.e., visible) vehicle re-identification, which are prone to be deteriorated under bad illumination conditions. Therefore, visible, near-infrared, and thermal-infrared multi-modal vehicle re-identification is worthy to study. This paper proposes a hybrid vision transformer (H-ViT) based multi-modal vehicle re-identification. The proposed H-ViT has two new modules: (1) modal-specific controller (MC) and (2) modal information embedding (MIE) structure. In the feature extraction process, the MC flexibly specifies modal-specific layers for different modal data and controls the sharing attribute of the position embedding to alleviate the difficulty brought by heterogeneous multi-modalities. The MIE structure learns inter- and intra-modal information to reduce feature deviations toward modal variations. Experimental results show that our H-ViT method achieves good performance on multi-modal vehicle re-identification datasets (i.e., RGBNT100 and RGBN300) by integrating MC and MIE modules, which are superior to existing algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Vehicle Re-Identification by Separating Representative Spatial Features

Article 19 May 2023

A vehicle re-identification framework based on the improved multi-branch feature fusion network

Article Open access 12 October 2021

Frequency transformer with local feature enhancement for improved vehicle re-identification

Article 03 March 2025

References

Chen, C., Ye, M., Qi, M., Wu, J., Jiang, J., Lin, C.W.: Structure-aware positional transformer for visible-infrared person re-identification. IEEE Trans. Image Process. 31, 2352–2364 (2022)
Article Google Scholar
Deng, J., et al.: Trends in vehicle re-identification past, present, and future: a comprehensive review. Mathematics 9(24), 3162 (2021)
Google Scholar
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale (2020). https://arxiv.org/abs/2010.11929
Han, K., et al.: A survey on visual transformer (2020). https://arxiv.org/abs/2012.12556
Han, X., et al.: Rethinking sampling strategies for unsupervised person re-identification (2021). https://arxiv.org/abs/2107.03024
He, S., Luo, H., Wang, P., Wang, F., Li, H., Jiang, W.: Transreid: transformer-based object re-identification (2021). https://arxiv.org/abs/2102.04378
Li, H., Li, C., Zhu, X., Zheng, A., Luo, B.: Multi-spectral vehicle re-identification: a challenge. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 11345–11353. New York, USA (2020)
Google Scholar
Liu, H., Tan, X., Zhou, X.: Parameter sharing exploration and hetero-center triplet loss for visible-thermal person re-identification. IEEE Trans. Multimedia 23, 4414–4425 (2020)
Article Google Scholar
Lu, Y., et al.: Cross-modality person re-identification with shared-specific feature transfer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13379–13389. Washington, USA (2020)
Google Scholar
Meng, D., Li, L., Wang, S., Gao, X., Zha, Z.J., Huang, Q.: Fine-grained feature alignment with part perspective transformation for vehicle reid. In: Proceedings of the ACM International Conference on Multimedia, pp. 619–627. Washington, USA (2020)
Google Scholar
Vaswani, A., et al.: Attention is all you need (2017). https://arxiv.org/abs/1706.03762
Wang, Z., Wang, Z., Zheng, Y., Chuang, Y.Y., Satoh, S.: Learning to reduce dual-level discrepancy for infrared-visible person re-identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 618–626. California, USA (2019)
Google Scholar
Wang, Z., et al.: Orientation invariant feature embedding and spatial temporal regularization for vehicle re-identification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 379–387. Venice, Italy (2017)
Google Scholar
Wang, Z., Li, C., Zheng, A., He, R., Tang, J.: Interact, embed, and enlarge: boosting modality-specific representations for multi-modal person re-identification. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 3, pp. 2633–2641 (2022)
Google Scholar
Ye, M., Shen, J., Lin, G., Xiang, T., Shao, L., Hoi, S.C.: Deep learning for person re-identification: a survey and outlook (2020). https://arxiv.org/abs/2001.04193
Yu, Z., Pei, J., Zhu, M., Zhang, J., Li, J.: Multi-attribute adaptive aggregation transformer for vehicle re-identification. Inf. Process. Manage. 59(2), 102868 (2022)
Google Scholar
Zhang, G., Zhang, P., Qi, J., Lu, H.: Hat: hierarchical aggregation transformers for person re-identification. In: Proceedings of the ACM International Conference on Multimedia, pp. 516–525. Chengdu, China (2021)
Google Scholar
Zhao, H., Jia, J., Koltun, V.: Exploring self-attention for image recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10076–10085. Washington, USA (2020)
Google Scholar
Zheng, A., Wang, Z., Chen, Z., Li, C., Tang, J.: Robust multi-modality person re-identification. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 3529–3537. Vancouver, Canada (2021)
Google Scholar
Zhong, Z., Zheng, L., Kang, G., Li, S., Yang, Y.: Random erasing data augmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 13001–13008. New York, USA (2020)
Google Scholar
Zhou, Y., Shao, L.: Aware attentive multi-view inference for vehicle re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6489–6498. Utah, USA (2018)
Google Scholar
Zhu, J., Zeng, H., Du, Y., Lei, Z., Zheng, L., Cai, C.: Joint feature and similarity deep learning for vehicle re-identification. IEEE Access 6, 43724–43731 (2018)
Article Google Scholar
Zhu, J., et al.: Vehicle re-identification using quadruple directional deep learning features. IEEE Trans. Intell. Transp. Syst. 21(1), 410–420 (2019)
Google Scholar
Zhu, X., Luo, Z., Fu, P., Ji, X.: Voc-reid: vehicle re-identification based on vehicle-orientation-camera. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 602–603. Washington, USA (2020)
Google Scholar

Download references

Acknowledgements

This work was supported in part by the National Key R &D Program of China under the Grant 2019YFB1405900, in part by the National Natural Science Foundation of China under the Grants 61976098, 61871434, 61876178, and 61901183, in part by the Natural Science Foundation for Outstanding Young Scholars of Fujian Province under the Grant 2022J06023, and in part by Collaborative Innovation Platform Project of Fuzhou-Xiamen-Quanzhou National Independent Innovation Demonstration Zone under the Grant 2021FX03.

Author information

Authors and Affiliations

College of Engineering, Huaqiao University, Quanzhou, 362021, China
Wenjie Pan, Jianqing Zhu & Huanqiang Zeng
College of Information Science and Engineering, Huaqiao University, Xiamen, 361021, China
Hanxiao Wu
School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, 100083, China
Xiaobin Zhu

Authors

Wenjie Pan
View author publications
You can also search for this author in PubMed Google Scholar
Hanxiao Wu
View author publications
You can also search for this author in PubMed Google Scholar
Jianqing Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Huanqiang Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Xiaobin Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jianqing Zhu .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Lu Fang
Xiaomi Inc., Beijing, China
Daniel Povey
Shanghai Jiao Tong University, Shanghai, China
Guangtao Zhai
JD Explore Academy, Beijing, China
Tao Mei
Chinese Academy of Sciences, Beijing, China
Ruiping Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pan, W., Wu, H., Zhu, J., Zeng, H., Zhu, X. (2022). H-ViT: Hybrid Vision Transformer for Multi-modal Vehicle Re-identification. In: Fang, L., Povey, D., Zhai, G., Mei, T., Wang, R. (eds) Artificial Intelligence. CICAI 2022. Lecture Notes in Computer Science(), vol 13604. Springer, Cham. https://doi.org/10.1007/978-3-031-20497-5_21

Download citation

DOI: https://doi.org/10.1007/978-3-031-20497-5_21
Published: 17 December 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20496-8
Online ISBN: 978-3-031-20497-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

H-ViT: Hybrid Vision Transformer for Multi-modal Vehicle Re-identification