Skip to main content

H-ViT: Hybrid Vision Transformer for Multi-modal Vehicle Re-identification

  • Conference paper
  • First Online:
Artificial Intelligence (CICAI 2022)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13604))

Included in the following conference series:

  • 1975 Accesses

Abstract

Vehicle re-identification (ReID) is a critical technology in smart city and has drawn much attention. Many studies focus on single-modal (i.e., visible) vehicle re-identification, which are prone to be deteriorated under bad illumination conditions. Therefore, visible, near-infrared, and thermal-infrared multi-modal vehicle re-identification is worthy to study. This paper proposes a hybrid vision transformer (H-ViT) based multi-modal vehicle re-identification. The proposed H-ViT has two new modules: (1) modal-specific controller (MC) and (2) modal information embedding (MIE) structure. In the feature extraction process, the MC flexibly specifies modal-specific layers for different modal data and controls the sharing attribute of the position embedding to alleviate the difficulty brought by heterogeneous multi-modalities. The MIE structure learns inter- and intra-modal information to reduce feature deviations toward modal variations. Experimental results show that our H-ViT method achieves good performance on multi-modal vehicle re-identification datasets (i.e., RGBNT100 and RGBN300) by integrating MC and MIE modules, which are superior to existing algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Chen, C., Ye, M., Qi, M., Wu, J., Jiang, J., Lin, C.W.: Structure-aware positional transformer for visible-infrared person re-identification. IEEE Trans. Image Process. 31, 2352–2364 (2022)

    Article  Google Scholar 

  2. Deng, J., et al.: Trends in vehicle re-identification past, present, and future: a comprehensive review. Mathematics 9(24), 3162 (2021)

    Google Scholar 

  3. Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale (2020). https://arxiv.org/abs/2010.11929

  4. Han, K., et al.: A survey on visual transformer (2020). https://arxiv.org/abs/2012.12556

  5. Han, X., et al.: Rethinking sampling strategies for unsupervised person re-identification (2021). https://arxiv.org/abs/2107.03024

  6. He, S., Luo, H., Wang, P., Wang, F., Li, H., Jiang, W.: Transreid: transformer-based object re-identification (2021). https://arxiv.org/abs/2102.04378

  7. Li, H., Li, C., Zhu, X., Zheng, A., Luo, B.: Multi-spectral vehicle re-identification: a challenge. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 11345–11353. New York, USA (2020)

    Google Scholar 

  8. Liu, H., Tan, X., Zhou, X.: Parameter sharing exploration and hetero-center triplet loss for visible-thermal person re-identification. IEEE Trans. Multimedia 23, 4414–4425 (2020)

    Article  Google Scholar 

  9. Lu, Y., et al.: Cross-modality person re-identification with shared-specific feature transfer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13379–13389. Washington, USA (2020)

    Google Scholar 

  10. Meng, D., Li, L., Wang, S., Gao, X., Zha, Z.J., Huang, Q.: Fine-grained feature alignment with part perspective transformation for vehicle reid. In: Proceedings of the ACM International Conference on Multimedia, pp. 619–627. Washington, USA (2020)

    Google Scholar 

  11. Vaswani, A., et al.: Attention is all you need (2017). https://arxiv.org/abs/1706.03762

  12. Wang, Z., Wang, Z., Zheng, Y., Chuang, Y.Y., Satoh, S.: Learning to reduce dual-level discrepancy for infrared-visible person re-identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 618–626. California, USA (2019)

    Google Scholar 

  13. Wang, Z., et al.: Orientation invariant feature embedding and spatial temporal regularization for vehicle re-identification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 379–387. Venice, Italy (2017)

    Google Scholar 

  14. Wang, Z., Li, C., Zheng, A., He, R., Tang, J.: Interact, embed, and enlarge: boosting modality-specific representations for multi-modal person re-identification. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 3, pp. 2633–2641 (2022)

    Google Scholar 

  15. Ye, M., Shen, J., Lin, G., Xiang, T., Shao, L., Hoi, S.C.: Deep learning for person re-identification: a survey and outlook (2020). https://arxiv.org/abs/2001.04193

  16. Yu, Z., Pei, J., Zhu, M., Zhang, J., Li, J.: Multi-attribute adaptive aggregation transformer for vehicle re-identification. Inf. Process. Manage. 59(2), 102868 (2022)

    Google Scholar 

  17. Zhang, G., Zhang, P., Qi, J., Lu, H.: Hat: hierarchical aggregation transformers for person re-identification. In: Proceedings of the ACM International Conference on Multimedia, pp. 516–525. Chengdu, China (2021)

    Google Scholar 

  18. Zhao, H., Jia, J., Koltun, V.: Exploring self-attention for image recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10076–10085. Washington, USA (2020)

    Google Scholar 

  19. Zheng, A., Wang, Z., Chen, Z., Li, C., Tang, J.: Robust multi-modality person re-identification. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 3529–3537. Vancouver, Canada (2021)

    Google Scholar 

  20. Zhong, Z., Zheng, L., Kang, G., Li, S., Yang, Y.: Random erasing data augmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 13001–13008. New York, USA (2020)

    Google Scholar 

  21. Zhou, Y., Shao, L.: Aware attentive multi-view inference for vehicle re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6489–6498. Utah, USA (2018)

    Google Scholar 

  22. Zhu, J., Zeng, H., Du, Y., Lei, Z., Zheng, L., Cai, C.: Joint feature and similarity deep learning for vehicle re-identification. IEEE Access 6, 43724–43731 (2018)

    Article  Google Scholar 

  23. Zhu, J., et al.: Vehicle re-identification using quadruple directional deep learning features. IEEE Trans. Intell. Transp. Syst. 21(1), 410–420 (2019)

    Google Scholar 

  24. Zhu, X., Luo, Z., Fu, P., Ji, X.: Voc-reid: vehicle re-identification based on vehicle-orientation-camera. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 602–603. Washington, USA (2020)

    Google Scholar 

Download references

Acknowledgements

This work was supported in part by the National Key R &D Program of China under the Grant 2019YFB1405900, in part by the National Natural Science Foundation of China under the Grants 61976098, 61871434, 61876178, and 61901183, in part by the Natural Science Foundation for Outstanding Young Scholars of Fujian Province under the Grant 2022J06023, and in part by Collaborative Innovation Platform Project of Fuzhou-Xiamen-Quanzhou National Independent Innovation Demonstration Zone under the Grant 2021FX03.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jianqing Zhu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Pan, W., Wu, H., Zhu, J., Zeng, H., Zhu, X. (2022). H-ViT: Hybrid Vision Transformer for Multi-modal Vehicle Re-identification. In: Fang, L., Povey, D., Zhai, G., Mei, T., Wang, R. (eds) Artificial Intelligence. CICAI 2022. Lecture Notes in Computer Science(), vol 13604. Springer, Cham. https://doi.org/10.1007/978-3-031-20497-5_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-20497-5_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-20496-8

  • Online ISBN: 978-3-031-20497-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics