Abstract
Video-based person re-identification (ReID) aims at matching pedestrians in a large video gallery across different cameras. However, some interference factors in most real-world scenarios, such as occlusion, pose variations and new appearances, make ReID a challenging task. Most existing methods learn the features of each frame independently without using the complementary information between different frames, which leads to the fact that the extracted frame features do not have enough discriminability to solve the above problems. In this paper, we propose a novel dual-constrained guided network (DCGN) to capture discriminative features by modeling the relations across frames with two steps. First, to learn the frame-level discriminative features, we design a frame-constrained module (FCM) that learns the channel attention weights by means of combining the intra-frame information and inter-frame information. Next, we propose a sequence-constrained module (SCM) to determine the importance of each frame in a video. This module models the relations between the frame-level features and sequence-level features, alleviating the frame redundancy from a global perspective. We conduct comparison experiments on four representative datasets, i.e., MARS, DukeMTMC-VideoReID, iLIDS-VID and PRID2011. In particular, the Rank-1 reaches 89.65%, 95.35%, 78.51% and 90.82% on four datasets, which outperforms the second-best method by 2.35%, 1.35%, 3.41% and 2.72%, respectively.






Similar content being viewed by others
References
Ali A, Zhu Y, Chen Q, Yu J, Cai H (2019) Leveraging spatio-temporal patterns for predicting citywide traffic crowd flows using deep hybrid neural networks, pp 125–132
Ali A, Zhu Y, Zakarya M (2021) A data aggregation based approach to exploit dynamic spatio-temporal correlations for citywide crowd flows prediction in fog computing. Multimedia Tools and Applications
Chao H, He Y, Zhang J, Feng J (2019) Gaitset: Regarding gait as a set for cross-view gait recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 33, pp 8126–8133
Chen D, Li H, Xiao T, Yi S, Wang X (2018) Video person re-identification with competitive snippet-similarity aggregation and co-attentive snippet embedding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1169–1178
Chen G, Lin C, Ren L, Lu J, Zhou J (2019) Self-critical attention learning for person re-identification. In: Proceedings of the IEEE International Conference on Computer Vision, pp 9637–9646
Chen Z, Zhou Z, Huang J, Zhang P, Li B (2020) Frame-guided region-aligned representation for video person re-identification.. In: AAAI, pp 10591–10598
Cheng L, Jing X-Y, Zhu X, Chang-hui H, Gao G, Wu S (2020) Local and global aligned spatiotemporal attention network for video-based person re-identification. Multimed Tools Appl 79
Cheng L, Jing X Y, Zhu X, Ma F, Qi F (2020) Scale-fusion framework for improving video-based person re-identification performance. Neural Comput Appl 32(7)
Felzenszwalb P, McAllester D, Ramanan D (2008) A discriminatively trained, multiscale, deformable part model. In: 2008 IEEE conference on computer vision and pattern recognition. IEEE, pp 1–8
Fu Y, Wang X, Wei Y, Huang T (2019) Sta: Spatial-temporal attention for large-scale video-based person re-identification. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 33, pp 8287–8294
Gao J, Nevatia R (2018) Revisiting temporal modeling for video-based person reid. arXiv:1805.02104
Gu X, Chang H, Ma B, Zhang H, Chen X (2020) Appearance-preserving 3d convolution for video-based person re-identification. In: European Conference on Computer Vision. Springer, pp 228–243
Gu X, Ma B, Chang H, Shan S, Chen X (2019) Temporal knowledge propagation for image-to-video person re-identification. In: Proceedings of the IEEE International Conference on Computer Vision, pp 9647–9656
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Hermans A, Beyer L, Leibe B (2017) In defense of the triplet loss for person re-identification. arXiv:1703.07737
Hirzer M, Beleznai C, Roth P M, Bischof H (2011) Person re-identification by descriptive and discriminative classification. In: Scandinavian conference on Image analysis. Springer, pp 91–102
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141
Huang H, Yang W, Lin J, Huang G, Xu J, Wang G, Chen X, Huang K (2020) Improve person re-identification with part awareness learning. IEEE Trans Image Process 29:7468–7481
Huang Y, Wu Q, Xu J, Zhong Y (2019) Sbsgan: Suppression of inter-domain background shift for person re-identification. In: Proceedings of the IEEE International Conference on Computer Vision, pp 9527–9536
Huang Y, Xu J, Wu Q, Zheng Z, Zhang Z, Zhang J (2018) Multi-pseudo regularized label for generated data in person re-identification. IEEE Trans Image Process 28(3):1391–1403
Kingma D P, Ba J (2014) Adam: A method for stochastic optimization. arXiv:1412.6980
Lejblle A R, Nasrollahi K, Krogh B, Moeslund T B (2020) Person re-identification using spatial and layer-wise attention. IEEE Trans Inf Forensic Secur 15:1216–1231
Li D, Chen X, Zhang Z, Huang K (2017) Learning deep context-aware features over body and latent parts for person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 384–393
Li J, Zhang S, Huang T (2020) Multi-scale temporal cues learning for video person re-identification. IEEE Trans Image Process 29:4461–4473
Li J, Wang J, Tian Q, Gao W, Zhang S (2019) Global-local temporal representations for video person re-identification. In: Proceedings of the IEEE International Conference on Computer Vision, pp 3958–3967
Li R, Zhang B, Teng Z, Fan J (2020) A divide-and-unite deep network for person re-identification. Appl Intell:1–13
Li S, Bak S, Carr P, Wang X (2018) Diversity regularized spatiotemporal attention for video-based person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 369–378
Liao S, Hu Y, Zhu X, Li S Z (2015) Person re-identification by local maximal occurrence representation and metric learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2197–2206
Lin M, Chen Q, Yan S (2013) Network in network. arXiv:1312.4400
Liu C-T, Wu C-W, Wang Y-C F, Chien S-Y (2019) Spatially and temporally efficient non-local attention network for video-based person re-identification. arXiv:1908.01683
Liu H, Jie Z, Jayashree K, Qi M, Jiang J, Yan S, Feng J (2017) Video-based person re-identification with accumulative motion context. IEEE Trans Circ Syst Video Technol 28(10):2788–2802
Liu Y, Yan J, Ouyang W (2017) Quality aware network for set to set recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5790–5799
Liu Z, Du F, Li W, Liu X, Zou Q (2020) Non-local spatial and temporal attention network for video-based person re-identification. Appl Sci 10:5385
Luo H, Gu Y, Liao X, Lai S, Jiang W (2019) Bag of tricks and a strong baseline for deep person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp 0–0
Matsukawa T, Okabe T, Suzuki E, Sato Y (2016) Hierarchical gaussian descriptor for person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1363–1372
McLaughlin N, Del Rincon J M, Miller P (2016) Recurrent convolutional network for video-based person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1325–1334
Munir A, Martinel N, Micheloni C (2020) Multi branch siamese network for person re-identification. In: 2020 IEEE International Conference on Image Processing (ICIP). IEEE, pp 2351–2355
Qiu Z, Yao T, Mei T (2017) Learning spatio-temporal representation with pseudo-3d residual networks. In: proceedings of the IEEE International Conference on Computer Vision, pp 5533–5541
Ristani E, Solera F, Zou R, Cucchiara R, Tomasi C (2016) Performance measures and a data set for multi-target, multi-camera tracking. In: European Conference on Computer Vision. Springer, pp 17–35
Saquib Sarfraz M, Schumann A, Eberle A, Stiefelhagen R (2018) A pose-sensitive embedding for person re-identification with expanded cross neighborhood re-ranking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 420–429
Song W, Zheng J, Wu Y, Chen C, Liu F (2020) Video-based person re-identification using a novel feature extraction and fusion technique. Multimed Tools Appl:1–21
Subramaniam A, Nambiar A, Mittal A (2019) Co-segmentation inspired attention networks for video-based person re-identification. In: Proceedings of the IEEE International Conference on Computer Vision, pp 562–572
Suh Y, Wang J, Tang S, Mei T, Mu Lee K (2018) Part-aligned bilinear representations for person re-identification. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 402–419
Sun Y, Zheng L, Yang Y, Tian Q, Wang S (2018) Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline). In: Proceedings of the European Conference on Computer Vision (ECCV), pp 480–496
Wang G, Yuan Y, Chen X, Li J, Zhou X (2018) Learning discriminative features with multiple granularities for person re-identification. In: Proceedings of the 26th ACM international conference on Multimedia, pp 274–282
Wang Q, Wu B, Zhu P, Li P, Zuo W, Hu Q (2020) Eca-net: Efficient channel attention for deep convolutional neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 11534–11542
Wang T, Gong S, Zhu X, Wang S (2014) Person re-identification by video ranking. In: European conference on computer vision. Springer, pp 688–703
Wang X, Chan KCK, Yu K, Dong C, Change Loy C (2019) Edvr: Video restoration with enhanced deformable convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp 0–0
Wei L, Zhang S, Yao H, Gao W, Tian Q (2019) Glad: Global-local-alignment descriptor for scalable person re-identification. IEEE Trans Multimed 21 (4):986–999
Wu L, Shen C, Hengel A (2016) Deep recurrent convolutional networks for video-based person re-identification: An end-to-end approach. arXiv:1606.01609
Wu Y, Qiu J, Takamatsu J, Ogasawara T (2018) Temporal-enhanced convolutional network for person re-identification. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 32
Wu Y, Lin Y, Dong X, Yan Y, Ouyang W, Yang Y (2018) Exploit the unknown gradually: One-shot video-based person re-identification by stepwise learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5177–5186
Xiang S, Fu Y, Chen H, Ran W, Liu T (2020) Multi-level feature learning with attention for person re-identification. Multimed Tools Appl 79:1–15
Xiao T, Li S, Wang B, Lin L, Wang X (2017) Joint detection and identification feature learning for person search. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3415–3424
Xu J, Zhao R, Zhu F, Wang H, Ouyang W (2018) Attention-aware compositional network for person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2119–2128
Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Zemel R, Bengio Y (2015) Show, attend and tell: Neural image caption generation with visual attention. In: International conference on machine learning, pp 2048–2057
Xu S, Cheng Y, Gu K, Yang Y, Chang S, Zhou P (2017) Jointly attentive spatial-temporal pooling networks for video-based person re-identification. In: Proceedings of the IEEE international conference on computer vision, pp 4733–4742
Yan Y, Ni B, Song Z, Ma C, Yan Y, Yang X (2016) Person re-identification via recurrent feature aggregation. In: European Conference on Computer Vision. Springer, pp 701–716
Ye M, Shen J, Zhang X, Yuen P C, Chang S F (2020) Augmentation invariant and instance spreading feature for softmax embedding. IEEE Trans Pattern Anal Mach Intell:1–1
You J, Wu A, Li X, Zheng W-S (2016) Top-push video-based person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1345–1353
Zamir A R, Dehghan A, Shah M (2012) Gmcp-tracker: Global multi-object tracking using generalized minimum clique graphs. In: European Conference on Computer Vision. Springer, pp 343–356
Zhang R, Li J, Sun H, Ge Y, Luo P, Wang X, Lin L (2019) Scan: Self-and-collaborative attention network for video person re-identification. IEEE Trans Image Process 28(10):4870–4882
Zhang W, He X, Yu X, Lu W, Zha Z, Tian Q (2020) A multi-scale spatial-temporal attention model for person re-identification in videos. IEEE Trans Image Process 29:3365–3373
Zhang Y, Shi W, Liu S, Bao J, Wei Y (2020) Scale-invariant siamese network for person re-identification. In: 2020 IEEE International Conference on Image Processing (ICIP). IEEE, pp 2436–2440
Zhao Y, Shen X, Jin Z, Lu H, Hua X- (2019) Attribute-driven feature disentangling and temporal aggregation for video person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4913–4922
Zheng L, Bie Z, Sun Y, Wang J, Su C, Wang S, Tian Q (2016) Mars: A video benchmark for large-scale person re-identification. In: European Conference on Computer Vision. Springer, pp 868–884
Zheng M, Karanam S, Wu Z, Radke R J (2019) Re-identification with consistent attentive siamese networks. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 5728–5737
Zheng Z, Zheng L, Yang Y (2019) Pedestrian alignment network for large-scale person re-identification. IEEE Trans Circ Syst Video Technol 29(10):3037–3045
Zhou Q, Zhong B, Lan X, Sun G, Zhang Y, Zhang B, Ji R (2020) Fine-grained spatial alignment model for person re-identification with focal triplet loss. IEEE Trans Image Process 29:7578–7589
Zhou Z, Huang Y, Wang W, Wang L, Tan T (2017) See the forest for the trees: Joint spatial and temporal recurrent neural networks for video-based person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4747–4756
Acknowledgments
This work was supported in part by the National Natural Science Foundation of China under Grant 61771180 and Grant 61876056, the Innovation Fund of Anhui Siliepoch Technology Co., Ltd. The authors would like to thank the anonymous reviewers for their valuable advice and constructive criticism.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The authors declare that they have no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Chen, C., Qi, M., Huang, G. et al. Learning discriminative features with a dual-constrained guided network for video-based person re-identification. Multimed Tools Appl 80, 28673–28696 (2021). https://doi.org/10.1007/s11042-021-11072-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-021-11072-y