Video person re-identification based on RGB triple pyramid model

Wei, Dan; Wang, Ziyang; Luo, Yiping

doi:10.1007/s00371-021-02344-7

Video person re-identification based on RGB triple pyramid model

Original article
Published: 19 January 2022

Volume 39, pages 501–517, (2023)
Cite this article

The Visual Computer Aims and scope Submit manuscript

306 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

In order to solve the difficult problem of pedestrian motion extraction in video, in this paper, we propose a novel video action information extraction model named RGB triple pyramid model. Firstly, the model extracts the action information of the three parts by R, G, and B channels in the RGB image and integrates the action information of the three parts to obtain the complete action information. Secondly, two fusion stages are set and the fusion methods and functions of the two fusion stage are different. In the fusion I stage, we fuse R, G, and B action information into a complete person motion information. In the fusion II stage, we integrate the action information into the appearance information to include the action information when processing the appearance information, which complements the overall appearance information. Finally, we improve the method of triplet loss training parameters and apply triplet loss training to video pedestrian re-identification. Video triplet loss includes not only intra-video distance metric loss and an inter-video distance metric loss, but also action loss between intra-video and inter-video and the appearance loss within intra-video and inter-video. Extensive experimental results on the large-scale MARS, iLIDS-VID, and PRID-2011 datasets demonstrate that the proposed method achieves the state-of-the-art performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Combine Coarse and Fine Cues: Multi-grained Fusion Network for Video-Based Person Re-identification

Video-Based Person Re-identification with Adaptive Multi-part Features Learning

Temporal Correlation-Diversity Representations for Video-Based Person Re-Identification

References

Prates, R., Schwartz, W.R.: Kernel cross-view collaborative representation based classification for person re-identification. J. Vis. Commun. Image Represent. 58, 304–315 (2019)
Article Google Scholar
Wang, J., Wang, H., Hua, J.: Pedestrian recognition in multi-camera networks based on deep transfer learning and feature visualization. Neurocomputing 316, 166–177 (2018)
Article Google Scholar
Kviatkovsky, I., Adam, A., Rivlin, E.: Color invariants for person reidentification. IEEE Trans. Pattern Anal. Mach. Intell. 35(7), 1622–1634 (2013)
Article Google Scholar
Wang, Z., Wei, D., Hu, X., Luo, Y.: Human skeleton mutual learning for person re-identification. Neurocomputing 388, 309–323 (2020)
Article Google Scholar
Hu X., Wei D., Wang Z., Shen J., Ren H.: Hypergraph Video Pedestrian re-identification based on posture structure relationship and action constraints, Pattern recognition, 111, 2021
Wei, D., Hu, X., Wang, Z., Shen, J., Ren, H.: Pose-guided multi-scale structural relationship learning for video-based Pedestrian. IEEE Access 9, 34845–34858 (2021)
Article Google Scholar
Zhang W., He X., Lu W., Qiao H., Li Y.: Feature aggregation with reinforcement learning for video-based person re-identification, IEEE Trans. Neural Netw. Learn. Syst. (2019)
Li J., Zhang S., Wang J., Gao W., Tian Q.: Global-Local Temporal Representations for Video Person Re-Identification, ICCV, pp. 3957–3966, 2019
Jiang X., Gong Y., Guo X., Yang Q., Huang F., Zheng W., Zheng F., Sun X.: rethinking temporal fusion for video-based person re-identification on semantic and time aspect, AAAI, 11133–11140, 2020
Yan Y., Qin J., Chen J., Liu L., Zhu F., Tai Y., Shao L.: Learning Multi-Granular Hypergraphs for Video-Based Person Re-identification, CVPR, pp. 2896–2905, 2020
Zheng F., Deng C., Sun X., Jiang X., Guo X., Yu Z., Huang F., Ji R: Pyramidal Person re-identification via multi-loss dynamic training, CVPR, pp. 8514–8522, 2019
Fu Y., Wei Y., Zhoul Y., Shi H.: Horizontal pyramid matching for person re-identification, AAAI Conference on Artificial Intelligence, pp. 8295–8302, 2019
He L., Wang Y., Liu W., Zhao H., Sun Z., Feng J.: Foreground-aware pyramid reconstruction for alignment-free occluded person re-identification, ICCV, pp. 8449–8458, 2019
Martinel N., Foresti G.L., Micheloni C.: Aggregating deep pyramidal representations for person re-identification, CVPR Workshops, pp. 1544–1554, 2019
Lin T.Y., Dollar P., Girshick R., He K., Hariharan B., Belongie S.: Feature pyramid networks for object detection, In: Proceedings of the IEEE conference computer and vision pattern recognition, pp. 936–944, 2017
He K., Zhang X., Ren S., Sun J.: Spatial pyramid pooling in deep convolutional networks for visual recognition, IEEE Trans. Pattern Anal. Mach. Intel., pp. 888–897, 2015
Xing Z., An G., Wu D.: Spatial-temporal pyramid based Convolutional Neural Network for action recognition, Neurocomputing, pp. 446–455, 2019
Schroff F., Kalenichenko D., Philbin J., FaceNet: A unified embedding for face recognition and clustering, IEEE Conference on Computer Vision and Pattern Recognition, pp. 815–823, 2015
Zhu, X., Jing, X., Yang, L., You, X., Chen, D., Gao, G., Wang, Y.: Semi-supervised cross-view projection-based dictionary learning for video-based person re-identification. IEEE Trans. Circuits Syst. Video Technol. 28(10), 2599–2611 (2018)
Article Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Ranzato M., Huang F., Boureau Y., LeCun Y.: Unsupervised learning of invariant feature hierarchies with applications to object recognition, IEEE Conf. Computer Vision and Pattern Recognition, pp. 1–8, 2007
Ji, S., Xu, W., Yang, M.: 3D Convolutional Neural Networks for Human Action Recognition. IEEE Trans. Pattern Anal. Mach. Intel. 35(1), 221–231 (2013)
Article Google Scholar
Zheng L., Bie Z., Sun Y., Wang J., Su C., Wang S., Tian Q.: "MARS: A video benchmark for large-scale person re-identification, In Proc. Eur. Conf. Comput. Vis., pp. 868–884, 2016
Wang T., Gong S., Zhu X., Wang S.: Person re-identification by video ranking, In Proc. Eur. Conf. Comput. Vis., pp. 688–703, 2014
Hirzer M., Beleznai C., Roth P. M., Bischof H.: Person re-identification by descriptive and discriminative classification, In Proc. Scandin. Conf. Image Anal., pp. 91–102, 2011
Brox T., Bruhn A., Papenberg N., Weickert J.: High accuracy optical flow estimation based on a theory for warping, In Proc. ECCV, 2004
Feichtenhofer C., Pinz A., Zisserman A.: Convolutional two-stream network fusion for video action recognition, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1933–1941, 2016
Chatfield K., Simonyan K., Vedaldi A., Zisserman A., Return of the devil in the details: Delving deep into convolutional nets, In Proc. Brit. Mach. Vis. Conf., pp. 601–612, 2014
Liu, H., Jie, Z., Jayashree, K., Qi, M., Jiang, J., Yan, S.: Video-based person re-identification with accumulative motion context. IEEE Trans. Circuits Syst. Video Technol. 28(10), 2788–2802 (2018)
Article Google Scholar
Huang W., Liang C., Yu Y., Wang Z., Ruan W., Hu R.: Video-based person re-identification via self paced weighting, AAAI, 2018, pp. 2273–2280
Ye M., Li J., Ma A.J., Zheng L., Yuen P.C.: Dynamic graph co-matching for unsupervised video-based person re-identification, IEEE Trans. Image Process. (2019)
Zhu, X., Jing, X.Y., You, X., Zhang, X., Zhang, T.: Video-based Person Re-identification by simultaneously learning intra-video and inter-video distance metrics. IEEE Trans. Image Process. 27(11), 5683–5695 (2018)
Article MathSciNet MATH Google Scholar
Zhou Z., Huang Y., Wang W., Wang L., Tan T.: See the forest for the trees: joint spatial and temporal recurrent neural networks for video-based person re-identification, In: IEEE conference on computer vision and pattern recognition, IEEE, 2017, pp. 6776–6785
Xu S., Cheng Y., Gu K., Yang Y., Chang S., Zhou P.: Jointly attentive spatial-temporal pooling networks for video-based person re-identification., 2017 arXiv
Liu Y., Yan J., Ouyang W., Quality aware network for set to set recognition, in: IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 5790–5799
Li J., Zhang S., Huang T.: Multi-scale 3D convolution network for video based person re-identification, In: Proceedings of the AAAI Conference on Artificial Intelligence, 33, 2019, pp. 8618–8625
Gao, C., Chen, Y., Yu, J., Sang, N.: Pose-guided spatiotemporal alignment for video-based person Re-identification. Inf. Sci. 527, 176–190 (2020)
Article MathSciNet Google Scholar
Chen D., Li H., Xiao T., Yi S., Wang X.: Video person re-identification with competitive snippet-similarity aggregation and co-attentive snippet embedding. In CVPR, pp. 1169–1178, 2018
Zhou, Huang Z., Wang Y., Wang W. L., Tan, T.: See the forest for the trees: Joint spatial and temporal recurrent neural networks for video-based person re-identification. In CVPR, 2017
Si, Zhang J., Li H., Kuen C.-G., Kong J., Kot X. A. C., Wang G.: Dual attention matching network for context-aware feature sequence based person re-identification. arXiv preprint arXiv: 1803.09937, 2018
Zhang J., Wang N., Zhang L.: Multi-shot pedestrian re-identification via sequential decision making. In CVPR, 2018
Hermans A., Beyer L., Leibe B.: In defense of the triplet loss for person re-identification. arXiv preprint arXiv: 1703.07737, 2017
Martinel N., Micheloni C., Foresti G. L., Saliency weighted features for person re-identification, Proceedings of the European Conference on Computer vision, pp.191–208, 2014

Download references

Author information

Authors and Affiliations

School of Mechanical and Automotive Engineering, Shanghai University of Engineering Science, No. 333, Longteng Road, Shanghai, China
Dan Wei, Ziyang Wang & Yiping Luo

Authors

Dan Wei
View author publications
You can also search for this author in PubMed Google Scholar
Ziyang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yiping Luo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dan Wei.

Ethics declarations

Conflict of interest

This paper does not contain any studies with human or animal subjects, and all authors declare that they have no conflict of interest. This work was supported by the National Science Foundation of China under Grant 62101314.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wei, D., Wang, Z. & Luo, Y. Video person re-identification based on RGB triple pyramid model. Vis Comput 39, 501–517 (2023). https://doi.org/10.1007/s00371-021-02344-7

Download citation

Accepted: 24 October 2021
Published: 19 January 2022
Issue Date: January 2023
DOI: https://doi.org/10.1007/s00371-021-02344-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Video person re-identification based on RGB triple pyramid model

Abstract

Access this article

Similar content being viewed by others

Combine Coarse and Fine Cues: Multi-grained Fusion Network for Video-Based Person Re-identification

Video-Based Person Re-identification with Adaptive Multi-part Features Learning

Temporal Correlation-Diversity Representations for Video-Based Person Re-Identification

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Abstract

Access this article

Similar content being viewed by others

Combine Coarse and Fine Cues: Multi-grained Fusion Network for Video-Based Person Re-identification

Video-Based Person Re-identification with Adaptive Multi-part Features Learning

Temporal Correlation-Diversity Representations for Video-Based Person Re-Identification

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation