Abstract
Vehicle re-identification (Re-ID) aims to match the vehicle images with the same identity captured by the non-overlapping surveillance cameras. Most existing vehicle Re-ID methods focus on effective deep network architectures to extract discriminative features from single-scale images. However, these methods ignored the complementary information from different scales, which is a crucial factor in computer vision tasks. Attention mechanism, a commonly used technique in recognition and detection tasks, can selectively focus on discriminative local cues of the image. In this work, we propose a multi-scale attention framework which jointly considers multi-scale mechanism and attention technique for vehicle Re-ID. Specifically, we exploit multi-scale mechanism in feature maps, which can acquire more comprehensive representations for fusing global and local cues. Meanwhile, we exploit attention blocks on each scale subnetwork, which aims to mine complementary and discriminative information. We conduct extensive experiments on three vehicle datasets, VeRi-776, VehicleID and PKU-VD. The promising results demonstrate the effectiveness of the proposed method and yield to a new state of the art for vehicle Re-ID.
Similar content being viewed by others
References
Chen K, Bui T, Fang C, Wang Z, Nevatia R (2017) Amc: Attention guided multi-modal correlation learning for image search. arXiv preprint arXiv:1704.00763
Chen Y, Zhu X, Gong S (2017) Person re-identification by deep learning multi-scale representations. In: IEEE international conference on computer vision, pp 2590–2600
Fei G, Teng H, Sun J, Wang J, Hussain A, Yang E (2018) A new algorithm of sar image target recognition based on improved deep convolutional neural network. Cognit Comput 11(6):809–824
Fu XQY, Jiang YG, Xue TXX (2017) Multi-scale deep learning architectures for person re-identification. In: IEEE international conference on computer vision, pp 1–2
Gao F, Ma F, Wang J, Sun J, Zhou H (2017) Visual saliency modeling for river detection in high-resolution SAR imagery. IEEE Access 6:1000–1014
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE conference on computer vision and pattern recognition, pp 770–778
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: IEEE conference on computer vision and pattern recognition, pp 7132–7141
Jaderberg M, Simonyan K, Zisserman A et al (2015) Spatial transformer networks. In: Advances in neural information processing systems, pp 2017–2025
Ji Y, Zhang H, Wu QJ (2018) Salient object detection via multi-scale attention CNN. Neurocomputing 322:130–140
Jiang B, Zhang Z, Lin D, Tang J, Luo B (2019) Semi-supervised learning with graph learning-convolutional networks. In: IEEE conference on computer vision and pattern recognition, pp. 11313–11320
Kanacı A, Zhu X, Gong S (2017) Vehicle reidentification by fine-grained cross-level deep learning. In: British machine vision conference workshop, pp 1–6
Ke Y, Tian Y, Wang Y, Wei Z, Huang T (2017) Exploiting multi-grain ranking constraints for precisely searching visually-similar vehicles. In: IEEE international conference on computer vision, pp 1–1
Kim TH, Eom IK, Kim YS (2009) Multiscale bayesian texture segmentation using neural networks and Markov random fields. Neural Comput Appl 18(2):141–155
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Li D, Chen X, Zhang Z, Huang K (2017) Learning deep context-aware features over body and latent parts for person re-identification. In: IEEE conference on computer vision and pattern recognition, pp 384–393
Li W, Zhu X, Gong S (2018) Harmonious attention network for person re-identification. In: IEEE conference on computer vision and pattern recognition, p 2
Li X, Wu A, Zheng WS (2018) Adversarial open-world person re-identification. arXiv preprint arXiv:1807.10482
Li X, Zheng WS, Wang X, Xiang T, Gong S (2015) Multi-scale learning for low-resolution person re-identification. In: ieee international conference on computer vision, pp 3765–3773
Li Y, Li Y, Yan H, Liu J (2017) Deep joint discriminative learning for vehicle re-identification and retrieval. In: IEEE international conference on image processing, pp 395–399
Liao S, Hu Y, Zhu X, Li SZ (2015) Person re-identification by local maximal occurrence representation and metric learning. In: IEEE conference on computer vision and pattern recognition, pp 2197–2206
Lin B, Wang F, Zhao F, Sun Y (2018) Scale invariant point feature (sipf) for 3d point clouds and 3d multi-scale object detection. Neural Comput Appl 29(5):1209–1224
Liu H, Feng J, Qi M, Jiang J, Yan S (2017) End-to-end comparative attention networks for person re-identification. IEEE Trans Image Process 26(7):3492–3506
Liu H, Tian Y, Yang Y, Pang L, Huang T (2016) Deep relative distance learning: tell the difference between similar vehicles. In: IEEE conference on computer vision and pattern recognition, pp 2167–2175
Liu HGCZZ, Lu JWH (2018) Learning coarse-to-fine structured feature embedding for vehicle re-identification. In: Association for the advancement of artificial intelligence, pp 1–8
Liu J, Zha ZJ, Tian Q, Liu D, Yao T, Ling Q, Mei T (2016) Multi-scale triplet CNN for person re-identification. In: the ACM on multimedia conference, pp 192–196
Liu X, Liu W, Ma H, Fu H (2016) Large-scale vehicle re-identification in urban surveillance videos. In: IEEE international conference on multimedia and expo, pp 1–6
Liu X, Liu W, Mei T, Ma H (2016) A deep learning-based approach to progressive vehicle re-identification for urban surveillance. In: European conference on computer vision, pp 869–884
Liu X, Liu W, Mei T, Ma H (2018) Provid: progressive and multimodal vehicle reidentification for large-scale urban surveillance. IEEE Trans Multimed 20(3):645–658
Liu X, Zhao H, Tian M, Sheng L, Shao J, Yi S, Yan J, Wang X (2017) Hydraplus-net: Attentive deep features for pedestrian analysis. arXiv preprint arXiv:1709.09930
Liu Z, Song X, Tang Z (2015) Fusing hierarchical multi-scale local binary patterns and virtual mirror samples to perform face recognition. Neural Comput Appl 26(8):2013–2026
Mnih V, Heess N, Graves A et al (2014) Recurrent models of visual attention. In: Advances in neural information processing systems, pp 2204–2212
Sermanet P, Frome A, Real E (2014) Attention for fine-grained categorization. arXiv preprint arXiv:1412.7054
Shen Y, Xiao T, Li H, Yi S, Wang X (2017) Learning deep neural networks for vehicle re-id with visual-spatio-temporal path proposals. In: IEEE international conference on computer vision, pp 1918–1927
Song C, Huang Y, Ouyang W, Wang L (2018) Mask-guided contrastive attention model for person re-identification. In: IEEE conference on computer vision and pattern recognition, pp 1179–1188
Su C, Li J, Zhang S, Xing J, Gao W, Tian Q (2017) Pose-driven deep convolutional model for person re-identification. In: IEEE international conference on computer vision, pp 3980–3989
Tang Z, Naphade M, Liu MY, Yang X, Birchfield S, Wang S, Kumar R, Anastasiu D, Hwang JN (2019) Cityflow: a city-scale benchmark for multi-target multi-camera vehicle tracking and re-identification. arXiv preprint arXiv:1903.09254
Teng S, Liu X, Zhang S, Huang Q (2018) Scan: spatial and channel attention network for vehicle re-identification. In: Pacific Rim conference on multimedia, pp 350–361
Wang B, Cao G, Shang Y, Zhou L, Zhang Y, Li X (2020) Single-column cnn for crowd counting with pixel-wise attention mechanism. Neural Comput Appl 32:2897–2908
Wang Z, Du L, Wang F, Su H, Zhou Y (2015) Multi-scale target detection in SAR image based on visual attention model. In: Synthetic Aperture Radar, pp 704–709
Yang L, Luo P, Change Loy C, Tang X (2015) A large-scale car dataset for fine-grained categorization and verification. In: IEEE conference on computer vision and pattern recognition, pp 3973–3981
Yang L, Luo P, Chen CL, Tang X (2015) A large-scale car dataset for fine-grained categorization and verification. In: IEEE conference on computer vision and pattern recognition, pp 3973–3981
Yang Z, He X, Gao J, Deng L, Smola A Stacked attention networks for image question answering. In: IEEE conference on computer vision and pattern recognition, pp 1–2
Yue Z, Gao F, Xiong Q, Wang J, Huang T, Yang E, Zhou H (2019) A novel semi-supervised convolutional neural network method for synthetic aperture radar image recognition. Cognit Comput. https://doi.org/10.1007/s12559-019-09639-x
Zapletal D, Herout A (2016) Vehicle re-identification for automatic video traffic surveillance. In: IEEE conference on computer vision and pattern recognition workshops, pp 25–31
Zhang H, Ji Y, Huang W, Liu L (2019) Sitcom-star-based clothing retrieval for video advertising: a deep learning framework. Neural Comput Appl 31(11):7361–7380
Zhang J, Du J, Dai L (2018) Multi-scale attention with dense encoder for handwritten mathematical expression recognition. In: International conference on pattern recognition (ICPR), pp 2245–2250
Zhang Y, Liu D, Zha ZJ (2017) Improving triplet-wise training of convolutional neural network for vehicle re-identification. In: ieee international conference on multimedia and expo, pp 1386–1391
Zhao L, Li X, Zhuang Y, Wang J (2017) Deeply-learned part-aligned representations for person re-identification. In: IEEE international conference on computer vision, pp 3239–3248
Zheng L, Shen L, Tian L, Wang S, Wang J, Tian Q (2015) Scalable person re-identification: a benchmark. In: IEEE international conference on computer vision, pp 1116–1124
Zheng L, Yang Y, Hauptmann AG (2016) Person re-identification: past, present and future. arXiv preprint arXiv:1610.02984
Zhou Y, Shao L (2018) Viewpoint-aware attentive multi-view inference for vehicle re-identification. In: IEEE conference on computer vision and pattern recognition, pp 6489–6498
Zhu J, Du Y, Hu Y, Zheng L, Cai C (2018) Vrsdnet: vehicle re-identification with a shortly and densely connected convolutional neural network. Multimed Tools Appl 78(20):29043–29057
Zhu X, Jing XY, Ma F, Cheng L, Ren Y (2018) Simultaneous visual-appearance-level and spatial-temporal-level dictionary learning for video-based person re-identification. Neural Comput Appl 31(11):7303–7315
Acknowledgements
This work was supported by the National Natural Science Foundation of China (61976002, 61860206004), the Natural Science Foundation of Anhui Higher Education Institutions of China (KJ2019A0033), and the Open Project Program of the National Laboratory of Pattern Recognition (NLPR) (201900046).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zheng, A., Lin, X., Dong, J. et al. Multi-scale attention vehicle re-identification. Neural Comput & Applic 32, 17489–17503 (2020). https://doi.org/10.1007/s00521-020-05108-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-020-05108-x