Abstract
In order to solve the difficult problem of pedestrian motion extraction in video, in this paper, we propose a novel video action information extraction model named RGB triple pyramid model. Firstly, the model extracts the action information of the three parts by R, G, and B channels in the RGB image and integrates the action information of the three parts to obtain the complete action information. Secondly, two fusion stages are set and the fusion methods and functions of the two fusion stage are different. In the fusion I stage, we fuse R, G, and B action information into a complete person motion information. In the fusion II stage, we integrate the action information into the appearance information to include the action information when processing the appearance information, which complements the overall appearance information. Finally, we improve the method of triplet loss training parameters and apply triplet loss training to video pedestrian re-identification. Video triplet loss includes not only intra-video distance metric loss and an inter-video distance metric loss, but also action loss between intra-video and inter-video and the appearance loss within intra-video and inter-video. Extensive experimental results on the large-scale MARS, iLIDS-VID, and PRID-2011 datasets demonstrate that the proposed method achieves the state-of-the-art performance.
Similar content being viewed by others
References
Prates, R., Schwartz, W.R.: Kernel cross-view collaborative representation based classification for person re-identification. J. Vis. Commun. Image Represent. 58, 304–315 (2019)
Wang, J., Wang, H., Hua, J.: Pedestrian recognition in multi-camera networks based on deep transfer learning and feature visualization. Neurocomputing 316, 166–177 (2018)
Kviatkovsky, I., Adam, A., Rivlin, E.: Color invariants for person reidentification. IEEE Trans. Pattern Anal. Mach. Intell. 35(7), 1622–1634 (2013)
Wang, Z., Wei, D., Hu, X., Luo, Y.: Human skeleton mutual learning for person re-identification. Neurocomputing 388, 309–323 (2020)
Hu X., Wei D., Wang Z., Shen J., Ren H.: Hypergraph Video Pedestrian re-identification based on posture structure relationship and action constraints, Pattern recognition, 111, 2021
Wei, D., Hu, X., Wang, Z., Shen, J., Ren, H.: Pose-guided multi-scale structural relationship learning for video-based Pedestrian. IEEE Access 9, 34845–34858 (2021)
Zhang W., He X., Lu W., Qiao H., Li Y.: Feature aggregation with reinforcement learning for video-based person re-identification, IEEE Trans. Neural Netw. Learn. Syst. (2019)
Li J., Zhang S., Wang J., Gao W., Tian Q.: Global-Local Temporal Representations for Video Person Re-Identification, ICCV, pp. 3957–3966, 2019
Jiang X., Gong Y., Guo X., Yang Q., Huang F., Zheng W., Zheng F., Sun X.: rethinking temporal fusion for video-based person re-identification on semantic and time aspect, AAAI, 11133–11140, 2020
Yan Y., Qin J., Chen J., Liu L., Zhu F., Tai Y., Shao L.: Learning Multi-Granular Hypergraphs for Video-Based Person Re-identification, CVPR, pp. 2896–2905, 2020
Zheng F., Deng C., Sun X., Jiang X., Guo X., Yu Z., Huang F., Ji R: Pyramidal Person re-identification via multi-loss dynamic training, CVPR, pp. 8514–8522, 2019
Fu Y., Wei Y., Zhoul Y., Shi H.: Horizontal pyramid matching for person re-identification, AAAI Conference on Artificial Intelligence, pp. 8295–8302, 2019
He L., Wang Y., Liu W., Zhao H., Sun Z., Feng J.: Foreground-aware pyramid reconstruction for alignment-free occluded person re-identification, ICCV, pp. 8449–8458, 2019
Martinel N., Foresti G.L., Micheloni C.: Aggregating deep pyramidal representations for person re-identification, CVPR Workshops, pp. 1544–1554, 2019
Lin T.Y., Dollar P., Girshick R., He K., Hariharan B., Belongie S.: Feature pyramid networks for object detection, In: Proceedings of the IEEE conference computer and vision pattern recognition, pp. 936–944, 2017
He K., Zhang X., Ren S., Sun J.: Spatial pyramid pooling in deep convolutional networks for visual recognition, IEEE Trans. Pattern Anal. Mach. Intel., pp. 888–897, 2015
Xing Z., An G., Wu D.: Spatial-temporal pyramid based Convolutional Neural Network for action recognition, Neurocomputing, pp. 446–455, 2019
Schroff F., Kalenichenko D., Philbin J., FaceNet: A unified embedding for face recognition and clustering, IEEE Conference on Computer Vision and Pattern Recognition, pp. 815–823, 2015
Zhu, X., Jing, X., Yang, L., You, X., Chen, D., Gao, G., Wang, Y.: Semi-supervised cross-view projection-based dictionary learning for video-based person re-identification. IEEE Trans. Circuits Syst. Video Technol. 28(10), 2599–2611 (2018)
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Ranzato M., Huang F., Boureau Y., LeCun Y.: Unsupervised learning of invariant feature hierarchies with applications to object recognition, IEEE Conf. Computer Vision and Pattern Recognition, pp. 1–8, 2007
Ji, S., Xu, W., Yang, M.: 3D Convolutional Neural Networks for Human Action Recognition. IEEE Trans. Pattern Anal. Mach. Intel. 35(1), 221–231 (2013)
Zheng L., Bie Z., Sun Y., Wang J., Su C., Wang S., Tian Q.: "MARS: A video benchmark for large-scale person re-identification, In Proc. Eur. Conf. Comput. Vis., pp. 868–884, 2016
Wang T., Gong S., Zhu X., Wang S.: Person re-identification by video ranking, In Proc. Eur. Conf. Comput. Vis., pp. 688–703, 2014
Hirzer M., Beleznai C., Roth P. M., Bischof H.: Person re-identification by descriptive and discriminative classification, In Proc. Scandin. Conf. Image Anal., pp. 91–102, 2011
Brox T., Bruhn A., Papenberg N., Weickert J.: High accuracy optical flow estimation based on a theory for warping, In Proc. ECCV, 2004
Feichtenhofer C., Pinz A., Zisserman A.: Convolutional two-stream network fusion for video action recognition, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1933–1941, 2016
Chatfield K., Simonyan K., Vedaldi A., Zisserman A., Return of the devil in the details: Delving deep into convolutional nets, In Proc. Brit. Mach. Vis. Conf., pp. 601–612, 2014
Liu, H., Jie, Z., Jayashree, K., Qi, M., Jiang, J., Yan, S.: Video-based person re-identification with accumulative motion context. IEEE Trans. Circuits Syst. Video Technol. 28(10), 2788–2802 (2018)
Huang W., Liang C., Yu Y., Wang Z., Ruan W., Hu R.: Video-based person re-identification via self paced weighting, AAAI, 2018, pp. 2273–2280
Ye M., Li J., Ma A.J., Zheng L., Yuen P.C.: Dynamic graph co-matching for unsupervised video-based person re-identification, IEEE Trans. Image Process. (2019)
Zhu, X., Jing, X.Y., You, X., Zhang, X., Zhang, T.: Video-based Person Re-identification by simultaneously learning intra-video and inter-video distance metrics. IEEE Trans. Image Process. 27(11), 5683–5695 (2018)
Zhou Z., Huang Y., Wang W., Wang L., Tan T.: See the forest for the trees: joint spatial and temporal recurrent neural networks for video-based person re-identification, In: IEEE conference on computer vision and pattern recognition, IEEE, 2017, pp. 6776–6785
Xu S., Cheng Y., Gu K., Yang Y., Chang S., Zhou P.: Jointly attentive spatial-temporal pooling networks for video-based person re-identification., 2017 arXiv
Liu Y., Yan J., Ouyang W., Quality aware network for set to set recognition, in: IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 5790–5799
Li J., Zhang S., Huang T.: Multi-scale 3D convolution network for video based person re-identification, In: Proceedings of the AAAI Conference on Artificial Intelligence, 33, 2019, pp. 8618–8625
Gao, C., Chen, Y., Yu, J., Sang, N.: Pose-guided spatiotemporal alignment for video-based person Re-identification. Inf. Sci. 527, 176–190 (2020)
Chen D., Li H., Xiao T., Yi S., Wang X.: Video person re-identification with competitive snippet-similarity aggregation and co-attentive snippet embedding. In CVPR, pp. 1169–1178, 2018
Zhou, Huang Z., Wang Y., Wang W. L., Tan, T.: See the forest for the trees: Joint spatial and temporal recurrent neural networks for video-based person re-identification. In CVPR, 2017
Si, Zhang J., Li H., Kuen C.-G., Kong J., Kot X. A. C., Wang G.: Dual attention matching network for context-aware feature sequence based person re-identification. arXiv preprint arXiv: 1803.09937, 2018
Zhang J., Wang N., Zhang L.: Multi-shot pedestrian re-identification via sequential decision making. In CVPR, 2018
Hermans A., Beyer L., Leibe B.: In defense of the triplet loss for person re-identification. arXiv preprint arXiv: 1703.07737, 2017
Martinel N., Micheloni C., Foresti G. L., Saliency weighted features for person re-identification, Proceedings of the European Conference on Computer vision, pp.191–208, 2014
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
This paper does not contain any studies with human or animal subjects, and all authors declare that they have no conflict of interest. This work was supported by the National Science Foundation of China under Grant 62101314.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wei, D., Wang, Z. & Luo, Y. Video person re-identification based on RGB triple pyramid model. Vis Comput 39, 501–517 (2023). https://doi.org/10.1007/s00371-021-02344-7
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-021-02344-7