Skip to main content
Log in

Learning discriminative features with a dual-constrained guided network for video-based person re-identification

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Video-based person re-identification (ReID) aims at matching pedestrians in a large video gallery across different cameras. However, some interference factors in most real-world scenarios, such as occlusion, pose variations and new appearances, make ReID a challenging task. Most existing methods learn the features of each frame independently without using the complementary information between different frames, which leads to the fact that the extracted frame features do not have enough discriminability to solve the above problems. In this paper, we propose a novel dual-constrained guided network (DCGN) to capture discriminative features by modeling the relations across frames with two steps. First, to learn the frame-level discriminative features, we design a frame-constrained module (FCM) that learns the channel attention weights by means of combining the intra-frame information and inter-frame information. Next, we propose a sequence-constrained module (SCM) to determine the importance of each frame in a video. This module models the relations between the frame-level features and sequence-level features, alleviating the frame redundancy from a global perspective. We conduct comparison experiments on four representative datasets, i.e., MARS, DukeMTMC-VideoReID, iLIDS-VID and PRID2011. In particular, the Rank-1 reaches 89.65%, 95.35%, 78.51% and 90.82% on four datasets, which outperforms the second-best method by 2.35%, 1.35%, 3.41% and 2.72%, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Ali A, Zhu Y, Chen Q, Yu J, Cai H (2019) Leveraging spatio-temporal patterns for predicting citywide traffic crowd flows using deep hybrid neural networks, pp 125–132

  2. Ali A, Zhu Y, Zakarya M (2021) A data aggregation based approach to exploit dynamic spatio-temporal correlations for citywide crowd flows prediction in fog computing. Multimedia Tools and Applications

  3. Chao H, He Y, Zhang J, Feng J (2019) Gaitset: Regarding gait as a set for cross-view gait recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 33, pp 8126–8133

  4. Chen D, Li H, Xiao T, Yi S, Wang X (2018) Video person re-identification with competitive snippet-similarity aggregation and co-attentive snippet embedding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1169–1178

  5. Chen G, Lin C, Ren L, Lu J, Zhou J (2019) Self-critical attention learning for person re-identification. In: Proceedings of the IEEE International Conference on Computer Vision, pp 9637–9646

  6. Chen Z, Zhou Z, Huang J, Zhang P, Li B (2020) Frame-guided region-aligned representation for video person re-identification.. In: AAAI, pp 10591–10598

  7. Cheng L, Jing X-Y, Zhu X, Chang-hui H, Gao G, Wu S (2020) Local and global aligned spatiotemporal attention network for video-based person re-identification. Multimed Tools Appl 79

  8. Cheng L, Jing X Y, Zhu X, Ma F, Qi F (2020) Scale-fusion framework for improving video-based person re-identification performance. Neural Comput Appl 32(7)

  9. Felzenszwalb P, McAllester D, Ramanan D (2008) A discriminatively trained, multiscale, deformable part model. In: 2008 IEEE conference on computer vision and pattern recognition. IEEE, pp 1–8

  10. Fu Y, Wang X, Wei Y, Huang T (2019) Sta: Spatial-temporal attention for large-scale video-based person re-identification. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 33, pp 8287–8294

  11. Gao J, Nevatia R (2018) Revisiting temporal modeling for video-based person reid. arXiv:1805.02104

  12. Gu X, Chang H, Ma B, Zhang H, Chen X (2020) Appearance-preserving 3d convolution for video-based person re-identification. In: European Conference on Computer Vision. Springer, pp 228–243

  13. Gu X, Ma B, Chang H, Shan S, Chen X (2019) Temporal knowledge propagation for image-to-video person re-identification. In: Proceedings of the IEEE International Conference on Computer Vision, pp 9647–9656

  14. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  15. Hermans A, Beyer L, Leibe B (2017) In defense of the triplet loss for person re-identification. arXiv:1703.07737

  16. Hirzer M, Beleznai C, Roth P M, Bischof H (2011) Person re-identification by descriptive and discriminative classification. In: Scandinavian conference on Image analysis. Springer, pp 91–102

  17. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141

  18. Huang H, Yang W, Lin J, Huang G, Xu J, Wang G, Chen X, Huang K (2020) Improve person re-identification with part awareness learning. IEEE Trans Image Process 29:7468–7481

    Article  Google Scholar 

  19. Huang Y, Wu Q, Xu J, Zhong Y (2019) Sbsgan: Suppression of inter-domain background shift for person re-identification. In: Proceedings of the IEEE International Conference on Computer Vision, pp 9527–9536

  20. Huang Y, Xu J, Wu Q, Zheng Z, Zhang Z, Zhang J (2018) Multi-pseudo regularized label for generated data in person re-identification. IEEE Trans Image Process 28(3):1391–1403

    Article  MathSciNet  Google Scholar 

  21. Kingma D P, Ba J (2014) Adam: A method for stochastic optimization. arXiv:1412.6980

  22. Lejblle A R, Nasrollahi K, Krogh B, Moeslund T B (2020) Person re-identification using spatial and layer-wise attention. IEEE Trans Inf Forensic Secur 15:1216–1231

    Article  Google Scholar 

  23. Li D, Chen X, Zhang Z, Huang K (2017) Learning deep context-aware features over body and latent parts for person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 384–393

  24. Li J, Zhang S, Huang T (2020) Multi-scale temporal cues learning for video person re-identification. IEEE Trans Image Process 29:4461–4473

    Article  Google Scholar 

  25. Li J, Wang J, Tian Q, Gao W, Zhang S (2019) Global-local temporal representations for video person re-identification. In: Proceedings of the IEEE International Conference on Computer Vision, pp 3958–3967

  26. Li R, Zhang B, Teng Z, Fan J (2020) A divide-and-unite deep network for person re-identification. Appl Intell:1–13

  27. Li S, Bak S, Carr P, Wang X (2018) Diversity regularized spatiotemporal attention for video-based person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 369–378

  28. Liao S, Hu Y, Zhu X, Li S Z (2015) Person re-identification by local maximal occurrence representation and metric learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2197–2206

  29. Lin M, Chen Q, Yan S (2013) Network in network. arXiv:1312.4400

  30. Liu C-T, Wu C-W, Wang Y-C F, Chien S-Y (2019) Spatially and temporally efficient non-local attention network for video-based person re-identification. arXiv:1908.01683

  31. Liu H, Jie Z, Jayashree K, Qi M, Jiang J, Yan S, Feng J (2017) Video-based person re-identification with accumulative motion context. IEEE Trans Circ Syst Video Technol 28(10):2788–2802

    Article  Google Scholar 

  32. Liu Y, Yan J, Ouyang W (2017) Quality aware network for set to set recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5790–5799

  33. Liu Z, Du F, Li W, Liu X, Zou Q (2020) Non-local spatial and temporal attention network for video-based person re-identification. Appl Sci 10:5385

    Article  Google Scholar 

  34. Luo H, Gu Y, Liao X, Lai S, Jiang W (2019) Bag of tricks and a strong baseline for deep person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp 0–0

  35. Matsukawa T, Okabe T, Suzuki E, Sato Y (2016) Hierarchical gaussian descriptor for person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1363–1372

  36. McLaughlin N, Del Rincon J M, Miller P (2016) Recurrent convolutional network for video-based person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1325–1334

  37. Munir A, Martinel N, Micheloni C (2020) Multi branch siamese network for person re-identification. In: 2020 IEEE International Conference on Image Processing (ICIP). IEEE, pp 2351–2355

  38. Qiu Z, Yao T, Mei T (2017) Learning spatio-temporal representation with pseudo-3d residual networks. In: proceedings of the IEEE International Conference on Computer Vision, pp 5533–5541

  39. Ristani E, Solera F, Zou R, Cucchiara R, Tomasi C (2016) Performance measures and a data set for multi-target, multi-camera tracking. In: European Conference on Computer Vision. Springer, pp 17–35

  40. Saquib Sarfraz M, Schumann A, Eberle A, Stiefelhagen R (2018) A pose-sensitive embedding for person re-identification with expanded cross neighborhood re-ranking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 420–429

  41. Song W, Zheng J, Wu Y, Chen C, Liu F (2020) Video-based person re-identification using a novel feature extraction and fusion technique. Multimed Tools Appl:1–21

  42. Subramaniam A, Nambiar A, Mittal A (2019) Co-segmentation inspired attention networks for video-based person re-identification. In: Proceedings of the IEEE International Conference on Computer Vision, pp 562–572

  43. Suh Y, Wang J, Tang S, Mei T, Mu Lee K (2018) Part-aligned bilinear representations for person re-identification. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 402–419

  44. Sun Y, Zheng L, Yang Y, Tian Q, Wang S (2018) Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline). In: Proceedings of the European Conference on Computer Vision (ECCV), pp 480–496

  45. Wang G, Yuan Y, Chen X, Li J, Zhou X (2018) Learning discriminative features with multiple granularities for person re-identification. In: Proceedings of the 26th ACM international conference on Multimedia, pp 274–282

  46. Wang Q, Wu B, Zhu P, Li P, Zuo W, Hu Q (2020) Eca-net: Efficient channel attention for deep convolutional neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 11534–11542

  47. Wang T, Gong S, Zhu X, Wang S (2014) Person re-identification by video ranking. In: European conference on computer vision. Springer, pp 688–703

  48. Wang X, Chan KCK, Yu K, Dong C, Change Loy C (2019) Edvr: Video restoration with enhanced deformable convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp 0–0

  49. Wei L, Zhang S, Yao H, Gao W, Tian Q (2019) Glad: Global-local-alignment descriptor for scalable person re-identification. IEEE Trans Multimed 21 (4):986–999

    Article  Google Scholar 

  50. Wu L, Shen C, Hengel A (2016) Deep recurrent convolutional networks for video-based person re-identification: An end-to-end approach. arXiv:1606.01609

  51. Wu Y, Qiu J, Takamatsu J, Ogasawara T (2018) Temporal-enhanced convolutional network for person re-identification. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 32

  52. Wu Y, Lin Y, Dong X, Yan Y, Ouyang W, Yang Y (2018) Exploit the unknown gradually: One-shot video-based person re-identification by stepwise learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5177–5186

  53. Xiang S, Fu Y, Chen H, Ran W, Liu T (2020) Multi-level feature learning with attention for person re-identification. Multimed Tools Appl 79:1–15

    Article  Google Scholar 

  54. Xiao T, Li S, Wang B, Lin L, Wang X (2017) Joint detection and identification feature learning for person search. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3415–3424

  55. Xu J, Zhao R, Zhu F, Wang H, Ouyang W (2018) Attention-aware compositional network for person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2119–2128

  56. Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Zemel R, Bengio Y (2015) Show, attend and tell: Neural image caption generation with visual attention. In: International conference on machine learning, pp 2048–2057

  57. Xu S, Cheng Y, Gu K, Yang Y, Chang S, Zhou P (2017) Jointly attentive spatial-temporal pooling networks for video-based person re-identification. In: Proceedings of the IEEE international conference on computer vision, pp 4733–4742

  58. Yan Y, Ni B, Song Z, Ma C, Yan Y, Yang X (2016) Person re-identification via recurrent feature aggregation. In: European Conference on Computer Vision. Springer, pp 701–716

  59. Ye M, Shen J, Zhang X, Yuen P C, Chang S F (2020) Augmentation invariant and instance spreading feature for softmax embedding. IEEE Trans Pattern Anal Mach Intell:1–1

  60. You J, Wu A, Li X, Zheng W-S (2016) Top-push video-based person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1345–1353

  61. Zamir A R, Dehghan A, Shah M (2012) Gmcp-tracker: Global multi-object tracking using generalized minimum clique graphs. In: European Conference on Computer Vision. Springer, pp 343–356

  62. Zhang R, Li J, Sun H, Ge Y, Luo P, Wang X, Lin L (2019) Scan: Self-and-collaborative attention network for video person re-identification. IEEE Trans Image Process 28(10):4870–4882

    Article  MathSciNet  Google Scholar 

  63. Zhang W, He X, Yu X, Lu W, Zha Z, Tian Q (2020) A multi-scale spatial-temporal attention model for person re-identification in videos. IEEE Trans Image Process 29:3365–3373

    Article  Google Scholar 

  64. Zhang Y, Shi W, Liu S, Bao J, Wei Y (2020) Scale-invariant siamese network for person re-identification. In: 2020 IEEE International Conference on Image Processing (ICIP). IEEE, pp 2436–2440

  65. Zhao Y, Shen X, Jin Z, Lu H, Hua X- (2019) Attribute-driven feature disentangling and temporal aggregation for video person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4913–4922

  66. Zheng L, Bie Z, Sun Y, Wang J, Su C, Wang S, Tian Q (2016) Mars: A video benchmark for large-scale person re-identification. In: European Conference on Computer Vision. Springer, pp 868–884

  67. Zheng M, Karanam S, Wu Z, Radke R J (2019) Re-identification with consistent attentive siamese networks. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 5728–5737

  68. Zheng Z, Zheng L, Yang Y (2019) Pedestrian alignment network for large-scale person re-identification. IEEE Trans Circ Syst Video Technol 29(10):3037–3045

    Article  Google Scholar 

  69. Zhou Q, Zhong B, Lan X, Sun G, Zhang Y, Zhang B, Ji R (2020) Fine-grained spatial alignment model for person re-identification with focal triplet loss. IEEE Trans Image Process 29:7578–7589

    Article  Google Scholar 

  70. Zhou Z, Huang Y, Wang W, Wang L, Tan T (2017) See the forest for the trees: Joint spatial and temporal recurrent neural networks for video-based person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4747–4756

Download references

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China under Grant 61771180 and Grant 61876056, the Innovation Fund of Anhui Siliepoch Technology Co., Ltd. The authors would like to thank the anonymous reviewers for their valuable advice and constructive criticism.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cuiqun Chen.

Ethics declarations

Conflict of Interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, C., Qi, M., Huang, G. et al. Learning discriminative features with a dual-constrained guided network for video-based person re-identification. Multimed Tools Appl 80, 28673–28696 (2021). https://doi.org/10.1007/s11042-021-11072-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-021-11072-y

Keywords

Navigation