Skip to main content
Log in

Video person re-identification based on RGB triple pyramid model

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

In order to solve the difficult problem of pedestrian motion extraction in video, in this paper, we propose a novel video action information extraction model named RGB triple pyramid model. Firstly, the model extracts the action information of the three parts by R, G, and B channels in the RGB image and integrates the action information of the three parts to obtain the complete action information. Secondly, two fusion stages are set and the fusion methods and functions of the two fusion stage are different. In the fusion I stage, we fuse R, G, and B action information into a complete person motion information. In the fusion II stage, we integrate the action information into the appearance information to include the action information when processing the appearance information, which complements the overall appearance information. Finally, we improve the method of triplet loss training parameters and apply triplet loss training to video pedestrian re-identification. Video triplet loss includes not only intra-video distance metric loss and an inter-video distance metric loss, but also action loss between intra-video and inter-video and the appearance loss within intra-video and inter-video. Extensive experimental results on the large-scale MARS, iLIDS-VID, and PRID-2011 datasets demonstrate that the proposed method achieves the state-of-the-art performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

References

  1. Prates, R., Schwartz, W.R.: Kernel cross-view collaborative representation based classification for person re-identification. J. Vis. Commun. Image Represent. 58, 304–315 (2019)

    Article  Google Scholar 

  2. Wang, J., Wang, H., Hua, J.: Pedestrian recognition in multi-camera networks based on deep transfer learning and feature visualization. Neurocomputing 316, 166–177 (2018)

    Article  Google Scholar 

  3. Kviatkovsky, I., Adam, A., Rivlin, E.: Color invariants for person reidentification. IEEE Trans. Pattern Anal. Mach. Intell. 35(7), 1622–1634 (2013)

    Article  Google Scholar 

  4. Wang, Z., Wei, D., Hu, X., Luo, Y.: Human skeleton mutual learning for person re-identification. Neurocomputing 388, 309–323 (2020)

    Article  Google Scholar 

  5. Hu X., Wei D., Wang Z., Shen J., Ren H.: Hypergraph Video Pedestrian re-identification based on posture structure relationship and action constraints, Pattern recognition, 111, 2021

  6. Wei, D., Hu, X., Wang, Z., Shen, J., Ren, H.: Pose-guided multi-scale structural relationship learning for video-based Pedestrian. IEEE Access 9, 34845–34858 (2021)

    Article  Google Scholar 

  7. Zhang W., He X., Lu W., Qiao H., Li Y.: Feature aggregation with reinforcement learning for video-based person re-identification, IEEE Trans. Neural Netw. Learn. Syst. (2019)

  8. Li J., Zhang S., Wang J., Gao W., Tian Q.: Global-Local Temporal Representations for Video Person Re-Identification, ICCV, pp. 3957–3966, 2019

  9. Jiang X., Gong Y., Guo X., Yang Q., Huang F., Zheng W., Zheng F., Sun X.: rethinking temporal fusion for video-based person re-identification on semantic and time aspect, AAAI, 11133–11140, 2020

  10. Yan Y., Qin J., Chen J., Liu L., Zhu F., Tai Y., Shao L.: Learning Multi-Granular Hypergraphs for Video-Based Person Re-identification, CVPR, pp. 2896–2905, 2020

  11. Zheng F., Deng C., Sun X., Jiang X., Guo X., Yu Z., Huang F., Ji R: Pyramidal Person re-identification via multi-loss dynamic training, CVPR, pp. 8514–8522, 2019

  12. Fu Y., Wei Y., Zhoul Y., Shi H.: Horizontal pyramid matching for person re-identification, AAAI Conference on Artificial Intelligence, pp. 8295–8302, 2019

  13. He L., Wang Y., Liu W., Zhao H., Sun Z., Feng J.: Foreground-aware pyramid reconstruction for alignment-free occluded person re-identification, ICCV, pp. 8449–8458, 2019

  14. Martinel N., Foresti G.L., Micheloni C.: Aggregating deep pyramidal representations for person re-identification, CVPR Workshops, pp. 1544–1554, 2019

  15. Lin T.Y., Dollar P., Girshick R., He K., Hariharan B., Belongie S.: Feature pyramid networks for object detection, In: Proceedings of the IEEE conference computer and vision pattern recognition, pp. 936–944, 2017

  16. He K., Zhang X., Ren S., Sun J.: Spatial pyramid pooling in deep convolutional networks for visual recognition, IEEE Trans. Pattern Anal. Mach. Intel., pp. 888–897, 2015

  17. Xing Z., An G., Wu D.: Spatial-temporal pyramid based Convolutional Neural Network for action recognition, Neurocomputing, pp. 446–455, 2019

  18. Schroff F., Kalenichenko D., Philbin J., FaceNet: A unified embedding for face recognition and clustering, IEEE Conference on Computer Vision and Pattern Recognition, pp. 815–823, 2015

  19. Zhu, X., Jing, X., Yang, L., You, X., Chen, D., Gao, G., Wang, Y.: Semi-supervised cross-view projection-based dictionary learning for video-based person re-identification. IEEE Trans. Circuits Syst. Video Technol. 28(10), 2599–2611 (2018)

    Article  Google Scholar 

  20. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)

    Article  Google Scholar 

  21. Ranzato M., Huang F., Boureau Y., LeCun Y.: Unsupervised learning of invariant feature hierarchies with applications to object recognition, IEEE Conf. Computer Vision and Pattern Recognition, pp. 1–8, 2007

  22. Ji, S., Xu, W., Yang, M.: 3D Convolutional Neural Networks for Human Action Recognition. IEEE Trans. Pattern Anal. Mach. Intel. 35(1), 221–231 (2013)

    Article  Google Scholar 

  23. Zheng L., Bie Z., Sun Y., Wang J., Su C., Wang S., Tian Q.: "MARS: A video benchmark for large-scale person re-identification, In Proc. Eur. Conf. Comput. Vis., pp. 868–884, 2016

  24. Wang T., Gong S., Zhu X., Wang S.: Person re-identification by video ranking, In Proc. Eur. Conf. Comput. Vis., pp. 688–703, 2014

  25. Hirzer M., Beleznai C., Roth P. M., Bischof H.: Person re-identification by descriptive and discriminative classification, In Proc. Scandin. Conf. Image Anal., pp. 91–102, 2011

  26. Brox T., Bruhn A., Papenberg N., Weickert J.: High accuracy optical flow estimation based on a theory for warping, In Proc. ECCV, 2004

  27. Feichtenhofer C., Pinz A., Zisserman A.: Convolutional two-stream network fusion for video action recognition, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1933–1941, 2016

  28. Chatfield K., Simonyan K., Vedaldi A., Zisserman A., Return of the devil in the details: Delving deep into convolutional nets, In Proc. Brit. Mach. Vis. Conf., pp. 601–612, 2014

  29. Liu, H., Jie, Z., Jayashree, K., Qi, M., Jiang, J., Yan, S.: Video-based person re-identification with accumulative motion context. IEEE Trans. Circuits Syst. Video Technol. 28(10), 2788–2802 (2018)

    Article  Google Scholar 

  30. Huang W., Liang C., Yu Y., Wang Z., Ruan W., Hu R.: Video-based person re-identification via self paced weighting, AAAI, 2018, pp. 2273–2280

  31. Ye M., Li J., Ma A.J., Zheng L., Yuen P.C.: Dynamic graph co-matching for unsupervised video-based person re-identification, IEEE Trans. Image Process. (2019)

  32. Zhu, X., Jing, X.Y., You, X., Zhang, X., Zhang, T.: Video-based Person Re-identification by simultaneously learning intra-video and inter-video distance metrics. IEEE Trans. Image Process. 27(11), 5683–5695 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  33. Zhou Z., Huang Y., Wang W., Wang L., Tan T.: See the forest for the trees: joint spatial and temporal recurrent neural networks for video-based person re-identification, In: IEEE conference on computer vision and pattern recognition, IEEE, 2017, pp. 6776–6785

  34. Xu S., Cheng Y., Gu K., Yang Y., Chang S., Zhou P.: Jointly attentive spatial-temporal pooling networks for video-based person re-identification., 2017 arXiv

  35. Liu Y., Yan J., Ouyang W., Quality aware network for set to set recognition, in: IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 5790–5799

  36. Li J., Zhang S., Huang T.: Multi-scale 3D convolution network for video based person re-identification, In: Proceedings of the AAAI Conference on Artificial Intelligence, 33, 2019, pp. 8618–8625

  37. Gao, C., Chen, Y., Yu, J., Sang, N.: Pose-guided spatiotemporal alignment for video-based person Re-identification. Inf. Sci. 527, 176–190 (2020)

    Article  MathSciNet  Google Scholar 

  38. Chen D., Li H., Xiao T., Yi S., Wang X.: Video person re-identification with competitive snippet-similarity aggregation and co-attentive snippet embedding. In CVPR, pp. 1169–1178, 2018

  39. Zhou, Huang Z., Wang Y., Wang W. L., Tan, T.: See the forest for the trees: Joint spatial and temporal recurrent neural networks for video-based person re-identification. In CVPR, 2017

  40. Si, Zhang J., Li H., Kuen C.-G., Kong J., Kot X. A. C., Wang G.: Dual attention matching network for context-aware feature sequence based person re-identification. arXiv preprint arXiv: 1803.09937, 2018

  41. Zhang J., Wang N., Zhang L.: Multi-shot pedestrian re-identification via sequential decision making. In CVPR, 2018

  42. Hermans A., Beyer L., Leibe B.: In defense of the triplet loss for person re-identification. arXiv preprint arXiv: 1703.07737, 2017

  43. Martinel N., Micheloni C., Foresti G. L., Saliency weighted features for person re-identification, Proceedings of the European Conference on Computer vision, pp.191–208, 2014

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dan Wei.

Ethics declarations

Conflict of interest

This paper does not contain any studies with human or animal subjects, and all authors declare that they have no conflict of interest. This work was supported by the National Science Foundation of China under Grant 62101314.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wei, D., Wang, Z. & Luo, Y. Video person re-identification based on RGB triple pyramid model. Vis Comput 39, 501–517 (2023). https://doi.org/10.1007/s00371-021-02344-7

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-021-02344-7

Keywords

Navigation