A Spatio-Temporal Identity Verification Method for Person-Action Instance Search in Movies

Niu, Yanrui; Yang, Jingyao; Liang, Chao; Huang, Baojin; Wang, Zhongyuan

doi:10.1007/978-3-031-27077-2_7

Yanrui Niu^15,16,17,
Jingyao Yang^15,16,17,
Chao Liang^15,16,17,
Baojin Huang^15,16,17 &
…
Zhongyuan Wang^15,16,17

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13833))

Included in the following conference series:

International Conference on Multimedia Modeling

1873 Accesses

Abstract

As one of the challenging problems in video search, Person-Action Instance Search (P-A INS) aims to retrieve shots with a specific person carrying out a specific action from massive amounts of video shots. Most existing methods conduct person INS and action INS separately to compute the initial person and action ranking scores, which will be directly fused to generate the final ranking list. However, direct aggregation of two individual INS scores ignores spatial relationships of person and action, thus cannot guarantee their identity consistency and cause identity inconsistency problem (IIP). To address IIP, we propose a simple spatio-temporal identity verification method. Specifically, in the spatial dimension, we propose an identity consistency verification (ICV) step to revise the direct fusion score of person INS and action INS. Moreover, in the temporal dimension, we propose a double-temporal extension (DTE) operation to further improve P-A INS results. The proposed method is evaluated on the large-scale NIST TRECVID INS 2019–2021 tasks, and the experimental results show that it can effectively mitigate the IIP, and its performance surpasses that of the champion team in 2019 INS task and the second place teams in both 2020 and 2021 INS tasks.

Y. Niu and J. Yang—These authors contribute equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Person Search in Videos with One Portrait Through Visual and Temporal Links

Attention-guided spatial–temporal graph relation network for video-based person re-identification

Article 24 March 2023

Can Shuffling Video Benefit Temporal Bias Problem: A Novel Training Framework for Temporal Grounding

References

Awad, G., et al.: TRECVID 2018: Benchmarking video activity detection, video captioning and matching, video storytelling linking and video search. In: Proceedings of TRECVID 2018 (2018)
Google Scholar
Awad, G., et al.: TRECVID 2017: evaluating ad-hoc and instance video search, events detection, video captioning, and hyperlinking. In: TREC Video Retrieval Evaluation (TRECVID) (2017)
Google Scholar
Awad, G., et al.: TRECVID 2020: a comprehensive campaign for evaluating video retrieval tasks across multiple application domains. In: Proceedings of TRECVID 2020 (2020)
Google Scholar
Awad, G., et al.: Evaluating multiple video understanding and retrieval tasks at TRECVID 2021. In: Proceedings of TRECVID 2021 (2021)
Google Scholar
Awad, G., et al.: Trecvid 2019: an evaluation campaign to benchmark video activity detection, video captioning and matching, and video search retrieval. In: Proceedings of TRECVID 2019 (2019)
Google Scholar
Awad, G., et al.: TRECVID 2016: evaluating video search, video event detection, localization, and hyperlinking. In: TREC Video Retrieval Evaluation (TRECVID) (2016)
Google Scholar
Bojanowski, P., Bach, F., Laptev, I., Ponce, J., Schmid, C., Sivic, J.: Finding actors and actions in movies. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 2280–2287 (2013). https://doi.org/10.1109/ICCV.2013.283
Chao, Y.W., Liu, Y., Liu, X., Zeng, H., Deng, J.: Learning to detect human-object interactions. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 381–389 (2018). https://doi.org/10.1109/WACV.2018.00048
Chen, L., Yang, H., Xu, Q., Gao, Z.: Harmonious attention network for person re-identification via complementarity between groups and individuals. Neurocomputing 453, 766–776 (2021). https://doi.org/10.1016/j.neucom.2020.07.118
Article Google Scholar
Deng, J., Guo, J., Ververas, E., Kotsia, I., Zafeiriou, S.: Retinaface: single-shot multi-level face localisation in the wild. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020). https://doi.org/10.1109/CVPR42600.2020.00525
Deng, J., Guo, J., Xue, N., Zafeiriou, S.: Arcface: additive angular margin loss for deep face recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019). https://doi.org/10.1109/CVPR.2019.00482
Gu, C., et al.: AVA: a video dataset of spatio-temporally localized atomic visual actions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 6047–6056 (2018). https://doi.org/10.1109/CVPR.2018.00633
Haq, I.U., Muhammad, K., Ullah, A., Baik, S.W.: Deepstar: Detecting starring characters in movies. IEEE Access 7, 9265–9272 (2019). https://doi.org/10.1109/ACCESS.2018.2890560
Article Google Scholar
Jiang, L., et al.: Whu-nercms at trecvid 2019: Instance search task. In: Proceedings of TRECVID Workshop (2019). https://www-nlpir.nist.gov/projects/tvpubs/tv19.papers/whu_nercms.pdf
Jiang, W., Wu, Y., Jing, C., Yu, T., Jia, Y.: Unsupervised deep quantization for object instance search. Neurocomputing 362, 60–71 (2019). https://doi.org/10.1016/j.neucom.2019.06.088
Article Google Scholar
Klinkigt, M., et al.: Nii hitachi uit at trecvid 2019. In: Proceedings of TRECVID Workshop (2019). https://www-nlpir.nist.gov/projects/tvpubs/tv19.papers/nii_hitachi_uit.pdf
Laptev, I., Perez, P.: Retrieving actions in movies. In: 2007 IEEE 11th International Conference on Computer Vision (ICCV), pp. 1–8 (2007). https://doi.org/10.1109/ICCV.2007.4409105
Le, D.D., et al.: Nii_uit at trecvid 2020. In: Proceedings of TRECVID Workshop (2020). https://www-nlpir.nist.gov/projects/tvpubs/tv20.papers/nii_uit.pdf
Li, Y.L., et al.: Transferable interactiveness knowledge for human-object interaction detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3585–3594 (2019). https://doi.org/10.1109/CVPR.2019.00370
Liao, Y., Liu, S., Wang, F., Chen, Y., Qian, C., Feng, J.: PPDM: Parallel point detection and matching for real-time human-object interaction detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 482–490 (2020). https://doi.org/10.1109/CVPR42600.2020.00056
Meng, J., Yuan, J., Yang, J., Wang, G., Tan, Y.P.: Object instance search in videos via spatio-temporal trajectory discovery. IEEE Trans. Multimedia 18(1), 116–127 (2016). https://doi.org/10.1109/TMM.2015.2500734
Article Google Scholar
Peng, Y., et al.: PKU-ICST at TRECVID 2019: Instance search task. In: Proceedings of TRECVID Workshop (2019). https://www-nlpir.nist.gov/projects/tvpubs/tv19.papers/pku-icst.pdf
Peng, Y., Ye, Z., Zhang, J., Sun, H.: PKU WICT at TRECVID 2020: Instance search task. In: Proceedings of TRECVID Workshop (2020). https://www-nlpir.nist.gov/projects/tvpubs/tv20.papers/pku-wict.pdf
Qi, S., Wang, W., Jia, B., Shen, J., Zhu, S.C.: Learning human-object interactions by graph parsing neural networks. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 401–417 (2018). https://doi.org/10.1007/978-3-030-01240-3_25
Kumar, N., Du, V., Doja, M.N., Shambharkar, P., Nimesh, U.K.: Automatic Face Recognition and Finding Occurrence of Actors in Movies. In: Ranganathan, G., Chen, J., Rocha, Álvaro. (eds.) Inventive Communication and Computational Technologies. LNNS, vol. 145, pp. 115–129. Springer, Singapore (2021). https://doi.org/10.1007/978-981-15-7345-3_10
Chapter Google Scholar
Stoian, A., Ferecatu, M., Benois-Pineau, J., Crucianu, M.: Fast action localization in large-scale video archives. In: IEEE Trans. Cir. and Sys. for Video Technol. 26(10), 1917–1930 (2016). https://doi.org/10.1109/TCSVT.2015.2475835
Tang, J., Xia, J., Mu, X., Pang, B., Lu, C.: Asynchronous Interaction Aggregation for Action Detection. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12360, pp. 71–87. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58555-6_5
Chapter Google Scholar
Ulutan, O., Rallapalli, S., Srivatsa, M., Torres, C., Manjunath, B.S.: Actor conditioned attention maps for video action detection. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 527–536 (2020). https://doi.org/10.1109/WACV45572.2020.9093617
Wang, X., Liu, W., Chen, J., Wang, X., Yan, C., Mei, T.: Listen, look, and find the one: robust person search with multimodality index. ACM Trans. Multimedia Comput. Commun. Appl. 16(2) (2020). https://doi.org/10.1145/3380549
Wu, C.Y., Feichtenhofer, C., Fan, H., He, K., Krahenbuhl, P., Girshick, R.: Long-term feature banks for detailed video understanding. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 284–293 (2019). https://doi.org/10.1109/CVPR.2019.00037
Yang, F., Yan, K., Lu, S., Jia, H., Xie, D., Yu, Z., Guo, X., Huang, F., Gao, W.: Part-aware progressive unsupervised domain adaptation for person re-identification. IEEE Trans. Multimedia 23, 1681–1695 (2021). https://doi.org/10.1109/TMM.2020.3001522
Article Google Scholar
Yang, J., Kang’an Chen, Y.N., Fan, X., Liang, C.: WHU-NERCMS at TRECVID 2020: Instance search task. In: Proceedings of TRECVID Workshop (2020). https://www-nlpir.nist.gov/projects/tvpubs/tv20.papers/whu_nercms.pdf
Yang, S., Luo, P., Loy, C.C., Tang, X.: Wider face: A face detection benchmark. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5525–5533 (2016). https://doi.org/10.1109/CVPR.2016.596
Yu, F., Wang, D., Shelhamer, E., Darrell, T.: Deep layer aggregation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2403–2412 (2018). https://doi.org/10.1109/CVPR.2018.00255
Zhang, W., Wei, Z., Huang, L., Xie, K., Qin, Q.: Adaptive attention-aware network for unsupervised person re-identification. Neurocomputing 411, 20–31 (2020). https://doi.org/10.1016/j.neucom.2020.05.094
Article Google Scholar

Download references

Acknowledgement

This work is supported by the National Natural Science Foundation of China (No. U1903214, 61876135). The numerical calculations in this paper have been done on the supercomputing system in the Supercomputing Center of Wuhan University.

Author information

Authors and Affiliations

National Engineering Research Center for Multimedia Software (NERCMS), Wuhan, China
Yanrui Niu, Jingyao Yang, Chao Liang, Baojin Huang & Zhongyuan Wang
Hubei Key Laboratory of Multimedia and Network Communication Engineering, Wuhan, China
Yanrui Niu, Jingyao Yang, Chao Liang, Baojin Huang & Zhongyuan Wang
School of Computer Science, Wuhan University, Wuhan, China
Yanrui Niu, Jingyao Yang, Chao Liang, Baojin Huang & Zhongyuan Wang

Authors

Yanrui Niu
View author publications
You can also search for this author in PubMed Google Scholar
Jingyao Yang
View author publications
You can also search for this author in PubMed Google Scholar
Chao Liang
View author publications
You can also search for this author in PubMed Google Scholar
Baojin Huang
View author publications
You can also search for this author in PubMed Google Scholar
Zhongyuan Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chao Liang .

Editor information

Editors and Affiliations

University of Bergen, Bergen, Norway
Duc-Tien Dang-Nguyen
Dublin City University, Dublin, Ireland
Cathal Gurrin
Radboud University Nijmegen, Nijmegen, The Netherlands
Martha Larson
Dublin City University, Dublin, Ireland
Alan F. Smeaton
University of Amsterdam, Amsterdam, The Netherlands
Stevan Rudinac
National Institute of Information and Communications Technology, Tokyo, Japan
Minh-Son Dao
Department of Information Science and Media Studies, University of Bergen, Bergen, Norway
Christoph Trattner
La Trobe University, Melbourne, VIC, Australia
Phoebe Chen

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 3185 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Niu, Y., Yang, J., Liang, C., Huang, B., Wang, Z. (2023). A Spatio-Temporal Identity Verification Method for Person-Action Instance Search in Movies. In: Dang-Nguyen, DT., et al. MultiMedia Modeling. MMM 2023. Lecture Notes in Computer Science, vol 13833. Springer, Cham. https://doi.org/10.1007/978-3-031-27077-2_7

Download citation

DOI: https://doi.org/10.1007/978-3-031-27077-2_7
Published: 29 March 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-27076-5
Online ISBN: 978-3-031-27077-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Spatio-Temporal Identity Verification Method for Person-Action Instance Search in Movies

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Person Search in Videos with One Portrait Through Visual and Temporal Links

Attention-guided spatial–temporal graph relation network for video-based person re-identification

Can Shuffling Video Benefit Temporal Bias Problem: A Novel Training Framework for Temporal Grounding

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 3185 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A Spatio-Temporal Identity Verification Method for Person-Action Instance Search in Movies

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Person Search in Videos with One Portrait Through Visual and Temporal Links

Attention-guided spatial–temporal graph relation network for video-based person re-identification

Can Shuffling Video Benefit Temporal Bias Problem: A Novel Training Framework for Temporal Grounding

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 3185 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation