Abstract
Vehicle re-identification (ReID) is a critical technology in smart city and has drawn much attention. Many studies focus on single-modal (i.e., visible) vehicle re-identification, which are prone to be deteriorated under bad illumination conditions. Therefore, visible, near-infrared, and thermal-infrared multi-modal vehicle re-identification is worthy to study. This paper proposes a hybrid vision transformer (H-ViT) based multi-modal vehicle re-identification. The proposed H-ViT has two new modules: (1) modal-specific controller (MC) and (2) modal information embedding (MIE) structure. In the feature extraction process, the MC flexibly specifies modal-specific layers for different modal data and controls the sharing attribute of the position embedding to alleviate the difficulty brought by heterogeneous multi-modalities. The MIE structure learns inter- and intra-modal information to reduce feature deviations toward modal variations. Experimental results show that our H-ViT method achieves good performance on multi-modal vehicle re-identification datasets (i.e., RGBNT100 and RGBN300) by integrating MC and MIE modules, which are superior to existing algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Chen, C., Ye, M., Qi, M., Wu, J., Jiang, J., Lin, C.W.: Structure-aware positional transformer for visible-infrared person re-identification. IEEE Trans. Image Process. 31, 2352–2364 (2022)
Deng, J., et al.: Trends in vehicle re-identification past, present, and future: a comprehensive review. Mathematics 9(24), 3162 (2021)
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale (2020). https://arxiv.org/abs/2010.11929
Han, K., et al.: A survey on visual transformer (2020). https://arxiv.org/abs/2012.12556
Han, X., et al.: Rethinking sampling strategies for unsupervised person re-identification (2021). https://arxiv.org/abs/2107.03024
He, S., Luo, H., Wang, P., Wang, F., Li, H., Jiang, W.: Transreid: transformer-based object re-identification (2021). https://arxiv.org/abs/2102.04378
Li, H., Li, C., Zhu, X., Zheng, A., Luo, B.: Multi-spectral vehicle re-identification: a challenge. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 11345–11353. New York, USA (2020)
Liu, H., Tan, X., Zhou, X.: Parameter sharing exploration and hetero-center triplet loss for visible-thermal person re-identification. IEEE Trans. Multimedia 23, 4414–4425 (2020)
Lu, Y., et al.: Cross-modality person re-identification with shared-specific feature transfer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13379–13389. Washington, USA (2020)
Meng, D., Li, L., Wang, S., Gao, X., Zha, Z.J., Huang, Q.: Fine-grained feature alignment with part perspective transformation for vehicle reid. In: Proceedings of the ACM International Conference on Multimedia, pp. 619–627. Washington, USA (2020)
Vaswani, A., et al.: Attention is all you need (2017). https://arxiv.org/abs/1706.03762
Wang, Z., Wang, Z., Zheng, Y., Chuang, Y.Y., Satoh, S.: Learning to reduce dual-level discrepancy for infrared-visible person re-identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 618–626. California, USA (2019)
Wang, Z., et al.: Orientation invariant feature embedding and spatial temporal regularization for vehicle re-identification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 379–387. Venice, Italy (2017)
Wang, Z., Li, C., Zheng, A., He, R., Tang, J.: Interact, embed, and enlarge: boosting modality-specific representations for multi-modal person re-identification. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 3, pp. 2633–2641 (2022)
Ye, M., Shen, J., Lin, G., Xiang, T., Shao, L., Hoi, S.C.: Deep learning for person re-identification: a survey and outlook (2020). https://arxiv.org/abs/2001.04193
Yu, Z., Pei, J., Zhu, M., Zhang, J., Li, J.: Multi-attribute adaptive aggregation transformer for vehicle re-identification. Inf. Process. Manage. 59(2), 102868 (2022)
Zhang, G., Zhang, P., Qi, J., Lu, H.: Hat: hierarchical aggregation transformers for person re-identification. In: Proceedings of the ACM International Conference on Multimedia, pp. 516–525. Chengdu, China (2021)
Zhao, H., Jia, J., Koltun, V.: Exploring self-attention for image recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10076–10085. Washington, USA (2020)
Zheng, A., Wang, Z., Chen, Z., Li, C., Tang, J.: Robust multi-modality person re-identification. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 3529–3537. Vancouver, Canada (2021)
Zhong, Z., Zheng, L., Kang, G., Li, S., Yang, Y.: Random erasing data augmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 13001–13008. New York, USA (2020)
Zhou, Y., Shao, L.: Aware attentive multi-view inference for vehicle re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6489–6498. Utah, USA (2018)
Zhu, J., Zeng, H., Du, Y., Lei, Z., Zheng, L., Cai, C.: Joint feature and similarity deep learning for vehicle re-identification. IEEE Access 6, 43724–43731 (2018)
Zhu, J., et al.: Vehicle re-identification using quadruple directional deep learning features. IEEE Trans. Intell. Transp. Syst. 21(1), 410–420 (2019)
Zhu, X., Luo, Z., Fu, P., Ji, X.: Voc-reid: vehicle re-identification based on vehicle-orientation-camera. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 602–603. Washington, USA (2020)
Acknowledgements
This work was supported in part by the National Key R &D Program of China under the Grant 2019YFB1405900, in part by the National Natural Science Foundation of China under the Grants 61976098, 61871434, 61876178, and 61901183, in part by the Natural Science Foundation for Outstanding Young Scholars of Fujian Province under the Grant 2022J06023, and in part by Collaborative Innovation Platform Project of Fuzhou-Xiamen-Quanzhou National Independent Innovation Demonstration Zone under the Grant 2021FX03.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Pan, W., Wu, H., Zhu, J., Zeng, H., Zhu, X. (2022). H-ViT: Hybrid Vision Transformer for Multi-modal Vehicle Re-identification. In: Fang, L., Povey, D., Zhai, G., Mei, T., Wang, R. (eds) Artificial Intelligence. CICAI 2022. Lecture Notes in Computer Science(), vol 13604. Springer, Cham. https://doi.org/10.1007/978-3-031-20497-5_21
Download citation
DOI: https://doi.org/10.1007/978-3-031-20497-5_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20496-8
Online ISBN: 978-3-031-20497-5
eBook Packages: Computer ScienceComputer Science (R0)