research-article

Person-action Instance Search in Story Videos: An Experimental Study

Authors:
Yanrui Niu

NERCMS, School of Computer Science, Wuhan University, China

NERCMS, School of Computer Science, Wuhan University, China

0000-0002-5056-1477
Search about this author

,
Chao Liang

NERCMS, School of Computer Science, Wuhan University, China

NERCMS, School of Computer Science, Wuhan University, China

0000-0002-8287-8655
Search about this author

,
Ankang Lu

NERCMS, School of Computer Science, Wuhan University, China

NERCMS, School of Computer Science, Wuhan University, China

0009-0002-7009-9205
Search about this author

,
Baojin Huang

NERCMS, School of Computer Science, Wuhan University, China

NERCMS, School of Computer Science, Wuhan University, China

0000-0002-4882-5787
Search about this author

,
Zhongyuan Wang

NERCMS, School of Computer Science, Wuhan University, China

NERCMS, School of Computer Science, Wuhan University, China

0000-0002-9796-488X
Search about this author

,
Jiahao Guo

NERCMS, School of Computer Science, Wuhan University, China

NERCMS, School of Computer Science, Wuhan University, China

0009-0008-6682-7867
Search about this author

ACM Transactions on Information Systems Volume 42 Issue 207 November 2023Article No.: 46pp 1–34https://doi.org/10.1145/3617892

Published:07 November 2023Publication History

ACM Transactions on Information Systems

Abstract

Person-Action instance search (P-A INS) aims to retrieve the instances of a specific person doing a specific action, which appears in the 2019–2021 INS tasks of the world-famous TREC Video Retrieval Evaluation (TRECVID). Most of the top-ranking solutions can be summarized with a Division-Fusion-Optimization (DFO) framework, in which person and action recognition scores are obtained separately, then fused, and, optionally, further optimized to generate the final ranking. However, TRECVID only evaluates the final ranking results, ignoring the effects of intermediate steps and their implementation methods. We argue that conducting the fine-grained evaluations of intermediate steps of DFO framework will (1) provide a quantitative analysis of the different methods’ performance in intermediate steps; (2) find out better design choices that contribute to improving retrieval performance; and (3) inspire new ideas for future research from the limitation analysis of current techniques. Particularly, we propose an indirect evaluation method motivated by the leave-one-out strategy, which finds an optimal solution surpassing the champion teams in 2020–2021 INS tasks. Moreover, to validate the generalizability and robustness of the proposed solution under various scenarios, we specifically construct a new large-scale P-A INS dataset and conduct comparative experiments with both the leading NIST TRECVID INS solution and the state-of-the-art P-A INS method. Finally, we discuss the limitations of our evaluation work and suggest future research directions.

REFERENCES

[1] George Awad, Asad Butt, Keith Curtis, Jonathan G. Fiscus, Afzal A. Godil, Yooyoung Lee, Andrew Delgado, Eliot Godard, Baptiste Chocot, Lukas Diduch, Jeffrey Liu, Yvette Graham, Gareth Jones, and Georges Quenot. 2021. Evaluating multiple video understanding and retrieval tasks at TRECVID 2021. In Proceedings of the TREC Video Retrieval Evaluation. Retrieved from https://www-nlpir.nist.gov/projects/tvpubs/tv21.papers/tv21overview.pdfGoogle Scholar
[2] Cao Qiong, Shen Li, Xie Weidi, Parkhi Omkar M., and Zisserman Andrew. 2018. VGGFace2: A dataset for recognising faces across pose and age. In Proceedings of the 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG’18). IEEE, 67–74.Google ScholarDigital Library
[3] Carreira Joao and Zisserman Andrew. 2017. Quo vadis, action recognition? A new model and the kinetics dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6299–6308.Google ScholarCross Ref
[4] Chao Yu-Wei, Liu Yunfan, Liu Xieyang, Zeng Huayi, and Deng Jia. 2018. Learning to detect human-object interactions. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV’18). IEEE, 381–389.Google ScholarCross Ref
[5] Chen Mingfei, Liao Yue, Liu Si, Chen Zhiyuan, Wang Fei, and Qian Chen. 2021. Reformulating HOI detection as adaptive set prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9004–9013.Google ScholarCross Ref
[6] Cui Yin, Liu Dong, Chen Jiawei, and Chang Shih-Fu. 2014. Building a large concept bank for representing events in video. arXiv preprint arXiv:1403.7591 (2014).Google Scholar
[7] Deng Jiankang, Guo Jia, Ververas Evangelos, Kotsia Irene, and Zafeiriou Stefanos. 2020. RetinaFace: Single-shot multi-level face localisation in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5203–5212.Google ScholarCross Ref
[8] Deng Jiankang, Guo Jia, Xue Niannan, and Zafeiriou Stefanos. 2019. ArcFace: Additive angular margin loss for deep face recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4690–4699.Google ScholarCross Ref
[9] Feichtenhofer Christoph, Fan Haoqi, Malik Jitendra, and He Kaiming. 2019. SlowFast networks for video recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 6202–6211.Google ScholarCross Ref
[10] Feichtenhofer Christoph, Pinz Axel, and Wildes Richard P.. 2016. Dynamic scene recognition with complementary spatiotemporal features. IEEE Trans. Pattern Anal. Mach. Intell. 38, 12 (2016), 2389–2401.Google ScholarDigital Library
[11] Galiyawala Hiren, Shah Kenil, Gajjar Vandit, and Raval Mehul S.. 2018. Person retrieval in surveillance video using height, color and gender. In Proceedings of the 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS’18). IEEE, 1–6.Google ScholarCross Ref
[12] Guo Cuixiang. 2023. Research on sports video retrieval algorithm based on semantic feature extraction. Multim. Tools Applic. 82 (2023), 21941–21955.Google Scholar
[13] Haq Ijaz Ul, Muhammad Khan, Ullah Amin, and Baik Sung Wook. 2019. DeepStar: Detecting starring characters in movies. IEEE Access 7 (2019), 9265–9272.Google ScholarCross Ref
[14] Hara Kensho, Kataoka Hirokatsu, and Satoh Yutaka. 2018. Can spatiotemporal 3D CNNs retrace the history of 2D CNNs and ImageNet? In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6546–6555.Google ScholarCross Ref
[15] He Dongliang, Zhou Zhichao, Gan Chuang, Li Fu, Liu Xiao, Li Yandong, Wang Limin, and Wen Shilei. 2019. StNet: Local and global spatial-temporal modeling for action recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 8401–8408.Google ScholarDigital Library
[16] Hu Jie, Shen Li, and Sun Gang. 2018. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7132–7141.Google ScholarCross Ref
[17] Hu W., Xie D., Fu Z., Zeng W., and Maybank S.. 2007. Semantic-based surveillance video retrieval. IEEE Trans. Image Process. 16 (2007), p.1168–1181.Google ScholarDigital Library
[18] Huang Qingqiu, Xiong Yu, Rao Anyi, Wang Jiaze, and Lin Dahua. 2020. MovieNet: A holistic dataset for movie understanding. In Proceedings of the 16th European Conference on Computer Vision. Springer, 709–727.Google ScholarDigital Library
[19] Iinuma Yuko and Satoh Shin’ichi. 2021. Video action retrieval using action recognition model. In Proceedings of the International Conference on Multimedia Retrieval. 603–606.Google ScholarDigital Library
[20] Jiang Longxiang, Yang Jingyao, Guo Erxuan, Xia Fan, Meng Ruxing, Luo Jingfeng, Li Xiangyu, Yan Xinyi, Xu Zengmin, and Liang Chao. 2019. WHU-NERCMS at TRECVID2019: Instance search task. In Proceedings of the TRECVID Workshop. Retrieved from https://www-nlpir.nist.gov/projects/tvpubs/tv19.papers/whu_nercms.pdfGoogle Scholar
[21] Jiang Yu-Gang, Ngo Chong-Wah, and Yang Jun. 2007. Towards optimal bag-of-features for object categorization and semantic video retrieval. In Proceedings of the 6th ACM International Conference on Image and Video Retrieval. 494–501.Google ScholarDigital Library
[22] Martin Klinkigt, Duy-Dinh Le, Atsushi Hiroike, Hung-Quoc Vo, Mohit Chabra, Vu-Minh-Hieu Dang, Quan Kong, Vinh-Tiep Nguyen, Tomokazu Murakami, Tien-Van Do, Tomoaki Yoshinaga, Duy-Nhat Nguyen, Sinha Saptarshi, Thanh-Duc Ngo, Charles Limasanches, Tushar Agrawal, Jian Manish Vora, Manikandan Ravikiran, Zheng Wang, and Shin'ichi Satoh. 2019. NII Hitachi UIT at TRECVID 2019. In Proceedings of the TRECVID Workshop. Retrieved from https://www-nlpir.nist.gov/projects/tvpubs/tv19.papers/nii_hitachi_uit.pdfGoogle Scholar
[23] Le Duy-Dinh, Vo Hung-Quoc, Nguyen Dung-Minh, Do Tien-Van, Pham Thinh-Le-Gia, Vo Tri-Le-Minh, Nguyen Thua-Ngoc, Nguyen Vinh-Tiep, Ngo Thanh-Duc, Wang Zheng, and Satoh Shin’ichi. 2020. NII_UIT AT TRECVID 2020. In Proceedings of the TRECVID Workshop. Retrieved from https://www-nlpir.nist.gov/projects/tvpubs/tv20.papers/nii_uit.pdfGoogle Scholar
[24] Li Ya, Chen Guanyu, Cheng Xiangqian, Chen Chong, Xu Shaoqiang, Li Xinyu, Xiang Xuanlu, Zhao Yanyun, Zhao Zhicheng, and Su Fei. 2019. BUPT-MCPRL at TRECVID 2019: ActEV and INS. In Proceedings of the TRECVID Workshop. Retrieved from https://www-nlpir.nist.gov/projects/tvpubs/tv19.papers/bupt-mcprl.pdfGoogle Scholar
[25] Liang Chao, Xu Changsheng, Cheng Jian, and Lu Hanqing. 2011. TVParser: An automatic TV video parsing method. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3377–3384.Google ScholarDigital Library
[26] Liao Yue, Liu Si, Wang Fei, Chen Yanjie, Qian Chen, and Feng Jiashi. 2020. PPDM: Parallel point detection and matching for real-time human-object interaction detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 482–490.Google ScholarCross Ref
[27] Lin Ji, Gan Chuang, and Han Song. 2019. TSM: Temporal shift module for efficient video understanding. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 7083–7093.Google ScholarCross Ref
[28] Liu Ze, Ning Jia, Cao Yue, Wei Yixuan, Zhang Zheng, Lin Stephen, and Hu Han. 2022. Video swin transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3202–3211.Google ScholarCross Ref
[29] McKee Robert. 2010. Story: Style, Structure, Substance, and the Principles of Screenwriting. HarperCollins e-books.Google Scholar
[30] Meng Jingjing, Yuan Junsong, Yang Jiong, Wang Gang, and Tan Yap-Peng. 2015. Object instance search in videos via spatio-temporal trajectory discovery. IEEE Trans. Multim. 18, 1 (2015), 116–127.Google ScholarDigital Library
[31] Mikolov Tomas, Chen Kai, Corrado Greg, and Dean Jeffrey. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).Google Scholar
[32] Mizuno Sosuke and Yanai Keiji. 2020. UEC at TRECVID 2020: INS and ActEV. In Proceedings of the TRECVID Workshop. Retrieved from https://www-nlpir.nist.gov/projects/tvpubs/tv20.papers/uec.pdfGoogle Scholar
[33] Mohammadi Majid and Rezaei Jafar. 2020. Ensemble ranking: Aggregation of rankings produced by different multi-criteria decision-making methods. Omega 96 (2020), 102254.Google ScholarCross Ref
[34] Naphade Milind, Smith John R., Tesic Jelena, Chang Shih-Fu, Hsu Winston, Kennedy Lyndon, Hauptmann Alexander, and Curtis Jon. 2006. Large-scale concept ontology for multimedia. IEEE Multim. 13, 3 (2006), 86–91.Google ScholarDigital Library
[35] Niu Yanrui, Yang Jingyao, Liang Chao, Huang Baojin, and Wang Zhongyuan. 2023. A spatio-temporal identity verification method for person-action instance search in movies. In Proceedings of the 29th International Conference on MultiMedia Modeling. Springer, 82–94.Google ScholarDigital Library
[36] Yanrui Niu, Jingyao Yang, Ankang Lu, Baojin Huang, Yue Zhang, Ji Huang, Shishi Wen, Dongshu Xu, Chao Liang, Zhongyuan Wang, and Jun Chen. 2021. WHU-NERCMS at TRECVID2021: Instance search task. In Proceedings of the TRECVID Workshop. Retrieved from https://www-nlpir.nist.gov/projects/tvpubs/tv21.papers/whu-nercms.pdfGoogle Scholar
[37] Ouyang Jianbo, Wu Hui, Wang Min, Zhou Wengang, and Li Houqiang. 2021. Contextual similarity aggregation with self-attention for visual re-ranking. In Advances in Neural Information Processing Systems, Ranzato M., Beygelzimer A., Dauphin Y., Liang P. S., and Vaughan J. Wortman (Eds.), Vol. 34. Curran Associates, Inc., 3135–3148.Google Scholar
[38] Parkhi Omkar M., Andrea Vedaldi, and Andrew Zisserman. 2015. Deep face recognition. In Proceedings of the British Machine Vision Conference 2015 (BMVC 2015, Swansea, UK, September 7-10, 2015) Xianghua Xie, Mark W. Jones, and Gary K. L. Tam (Eds.). BMVA Press, 41.1–41.12.Google Scholar
[39] Peng Yuxin, Huang Xin, Qi Jinwei, Zhao Junjie, Zhang Junchao, Zhao Yunzhen, Yuan Yuxin, He Xiangteng, and Zhang Jian. 2019. PKU-ICST at TRECVID 2019: Instance search task. In Proceedings of the TRECVID Workshop. Retrieved from https://www-nlpir.nist.gov/projects/tvpubs/tv19.papers/pku-icst.pdfGoogle Scholar
[40] Peng Yuxin, Ye Zhaoda, Zhang Junchao, Sun Hongbo, Yang Dejie, and Cui Zhenyu. 2020. PKU_WICT at TRECVID 2020: Instance search task. In Proceedings of the TRECVID Workshop. Retrieved from https://www-nlpir.nist.gov/projects/tvpubs/tv20.papers/pku-wict.pdfGoogle Scholar
[41] Peng Yuxin, Ye Zhaoda, Zhang Junchao, Sun Hongbo, Yang Dejie, and Cui Zhenyu. 2021. PKU_WICT at TRECVID 2021: Instance search task. In Proceedings of the TRECVID Workshop. Retrieved from https://www-nlpir.nist.gov/projects/tvpubs/tv21.papers/pku_wict.pdfGoogle Scholar
[42] Polikar Robi. 2012. Ensemble learning. In Ensemble Machine Learning. Springer, 1–34.Google ScholarCross Ref
[43] Rao Anyi, Xu Linning, Xiong Yu, Xu Guodong, Huang Qingqiu, Zhou Bolei, and Lin Dahua. 2020. A local-to-global approach to multi-modal movie scene segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10146–10155.Google ScholarCross Ref
[44] Schroff Florian, Kalenichenko Dmitry, and Philbin James. 2015. FaceNet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 815–823.Google ScholarCross Ref
[45] Shambharkar Prashant Giridhar, Nimesh Umesh Kumar, Kumar Nihal, Du Vj Duy, and Doja M. N.. 2021. Automatic face recognition and finding occurrence of actors in movies. In Inventive Communication and Computational Technologies. Springer, 115–129.Google Scholar
[46] Simonyan Karen and Zisserman Andrew. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).Google Scholar
[47] Siqueira Henrique, Magg Sven, and Wermter Stefan. 2020. Efficient facial feature learning with wide ensemble-based convolutional neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 5800–5809.Google ScholarCross Ref
[48] Song Yinan, Yang Wenhao, Zhao Zhicheng, Zhao Yanyun, and Su Fei. 2021. BUPT-MCPRL at TRECVID 2021. In Proceedings of the TRECVID Workshop. Retrieved from https://www-nlpir.nist.gov/projects/tvpubs/tv21.papers/bupt-mcprl.pdfGoogle Scholar
[49] Sun Ke, Xiao Bin, Liu Dong, and Wang Jingdong. 2019. Deep high-resolution representation learning for human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5693–5703.Google ScholarCross Ref
[50] Tamura Masato, Ohashi Hiroki, and Yoshinaga Tomoaki. 2021. QPIC: Query-based pairwise human-object interaction detection with image-wide contextual information. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10410–10419.Google ScholarCross Ref
[51] Tran Du, Bourdev Lubomir, Fergus Rob, Torresani Lorenzo, and Paluri Manohar. 2015. Learning spatiotemporal features with 3D convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision. 4489–4497.Google ScholarDigital Library
[52] Ulutan Oytun, Rallapalli Swati, Srivatsa Mudhakar, Torres Carlos, and Manjunath B. S.. 2020. Actor conditioned attention maps for video action detection. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 527–536.Google ScholarCross Ref
[53] Vicol Paul, Tapaswi Makarand, Castrejon Lluis, and Fidler Sanja. 2018. MovieGraphs: Towards understanding human-centric situations from videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8581–8590.Google ScholarCross Ref
[54] Hung-Quoc Vo, Dung-Minh Nguyen, Tien Do, Vinh-Tiep Nguyen, Nhat-Duy Nguyen, Thanh Duc Ngo, Duy-Dinh Le, and Shin'ichi Satoh. 2020. Searching for desired person doing desired action based on visual and audio feature in large scale video database. In Proceedings of the International Conference on Multimedia Analysis and Pattern Recognition (MAPR’20). IEEE, 1–6.Google Scholar
[55] Wang Kai, Peng Xiaojiang, Yang Jianfei, Meng Debin, and Qiao Yu. 2020. Region attention networks for pose and occlusion robust facial expression recognition. IEEE Trans. Image Process. 29 (2020), 4057–4069.Google ScholarDigital Library
[56] Wang Xiaolong, Girshick Ross, Gupta Abhinav, and He Kaiming. 2018. Non-local neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7794–7803.Google ScholarCross Ref
[57] Wang Zheng, Yang Fan, and Satoh Shin’ichi. 2019. Salient time slice pruning and boosting for person-scene instance search in TV series. In Proceedings of the ACM Multimedia Asia Conference. 1–6.Google ScholarDigital Library
[58] Wen Yandong, Zhang Kaipeng, Li Zhifeng, and Qiao Yu. 2016. A discriminative feature learning approach for deep face recognition. In Proceedings of the European Conference on Computer Vision. Springer, 499–515.Google ScholarCross Ref
[59] Xu Changsheng, Wang Jinjun, Lu Hanqing, and Zhang Yifan. 2008. A novel framework for semantic annotation and personalized retrieval of sports video. IEEE Trans. Multim. 10, 3 (2008), 421–436.Google ScholarDigital Library
[60] Yanagawa Akira, Shih-Fu Chang, Lyndon Kennedy, and Winston Hsu. 2007. Columbia university.s baseline detectors for 374 LSCOM semantic visual concepts. Technical Report. Columbia University. Retrieved from http://www.ee.columbia.edu/dvmm/columbia374Google Scholar
[61] Yang Jingyao, Chen Yanrui Niu Kang’an, Fan Xinyao, and Liang Chao. 2020. WHU-NERCMS at TRECVID2020: Instance search task. In Proceedings of the TRECVID Workshop. Retrieved from https://www-nlpir.nist.gov/projects/tvpubs/tv20.papers/whu_nercms.pdfGoogle Scholar
[62] Yang Wenhao, Song Yinan, Zhao Zhicheng, and Su Fei. 2021. Instance search via fusing hierarchical multi-level retrieval and human-object interaction detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 2323–2327.Google ScholarCross Ref
[63] Yu En, Liu Wenhe, Kang Guoliang, Chang Xiaojun, Sun Jiande, and Hauptmann Alexander. 2019. Inf@TRECVID 2019: Instance search task. In Proceedings of the TRECVID Workshop. Retrieved from https://www-nlpir.nist.gov/projects/tvpubs/tv19.papers/inf_ins.pdfGoogle Scholar
[64] Zhang K., Zhang Z., Li Z., and Qiao Y.. 2016. Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Sig. Process. Lett. 23, 10 (Oct. 2016), 1499–1503.Google ScholarCross Ref
[65] Zhang Qi, Zhang Jiacheng, Zhao Zhicheng, Zhao Yanyun, and Su Fei. 2020. BUPT-MCPRL aW TRECVID 2020: INS. In Proceedings of the TRECVID Workshop. Retrieved from https://www-nlpir.nist.gov/projects/tvpubs/tv20.papers/bupt-mcprl_ins.pdfGoogle Scholar
[66] Zolfaghari Mohammadreza, Singh Kamaljeet, and Brox Thomas. 2018. ECO: Efficient convolutional network for online video understanding. In Proceedings of the European Conference on Computer Vision (ECCV’18). 695–712.Google ScholarDigital Library

Index Terms

Person-action Instance Search in Story Videos: An Experimental Study
1. Information systems
  1. Information retrieval

Recommendations

A Spatio-Temporal Identity Verification Method for Person-Action Instance Search in Movies
MultiMedia Modeling
Abstract
As one of the challenging problems in video search, Person-Action Instance Search (P-A INS) aims to retrieve shots with a specific person carrying out a specific action from massive amounts of video shots. Most existing methods conduct person INS ...
Read More
An experimental study of passive dynamic walking

A two-straight-legged walking mechanism with flat feet is designed and built to study the passive dynamic gait. It is shown that the mechanism having flat feet can exhibit passive dynamic walking as those with curved feet, but the walking efficiency is ...
Read More
A study of results overlap and uniqueness among major web search engines

The performance and capabilities of Web search engines is an important and significant area of research. Millions of people world wide use Web search engines very day. This paper reports the results of a major study examining the overlap among results ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Information Systems Volume 42, Issue 2
March 2024
897 pages
ISSN:1046-8188
EISSN:1558-2868
DOI:10.1145/3618075
Editor:
Min Zhang
Tsinghua University, China
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 7 November 2023
- Online AM: 29 August 2023
- Accepted: 11 August 2023
- Revised: 3 July 2023
- Received: 26 October 2022
Published in tois Volume 42, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Movie video
composite concepts
person-action instance search
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 157
  Total Downloads
- Downloads (Last 12 months)157
- Downloads (Last 6 weeks)7
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

View Full Text

Person-action Instance Search in Story Videos: An Experimental Study

ACM Transactions on Information Systems

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

A Spatio-Temporal Identity Verification Method for Person-Action Instance Search in Movies

An experimental study of passive dynamic walking

A study of results overlap and uniqueness among major web search engines