Abstract
Egocentric action prediction aims to predict the future actions of the performer wearing the camera, given a partial video segment. The challenge in egocentric action prediction tasks (over action recognition) is the lack of context. The definition of an egocentric action as a noun-verb combination helps to obtain the context information from the video. However, the lack of training data to train two sequential deep-learning models to detect nouns and verbs remains a challenge. We propose a model involving two sequential GAN architectures where the first one determines the noun (based on the frames), and the next determines the verb (based on the optical flow). Each of these GAN architectures consists of a teacher and a student architecture. The teacher architectures are much deeper CNNs (pre-trained on a large dataset) compared to the student architectures. We further reduce the search space for the verb by applying the Reduced Verb Space Generator (RVSGen) algorithm between the noun prediction and verb prediction processes. RVSGen helps in proposing the most suitable verbs corresponding to the noun obtained by the noun predictor. Experimentation carried out on a benchmark dataset shows the efficacy of the proposed model compared to the state-of-the-art.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Buddubariki, V., Tulluri, S.G., Mukherjee, S.: Event recognition in egocentric videos using a novel trajectory based feature. In: Proceedings of ICVGIP 2016, pp. 76:1–76:8 (2016)
Sai Suma, K., Aditya, G., Mukherjee, S.: Activity recognition in egocentric videos using bag of key action units. In: Proceedings of ICVGIP 2018, pp. 9:1–9:9 (2018)
Kahani, R., Talebpour, A., Mahmoudi-Aznaveh, A.: A correlation based feature representation for first-person activity recognition. Multimed. Tools Appl. 78(15), 21673–21694 (2019). https://doi.org/10.1007/s11042-019-7429-3
Singh, S., Arora, C., Jawahar, C.V.: Trajectory aligned features for first person action recognition. Pattern Recogn. 62, 45–55 (2017)
Damen, D., et al.: Scaling egocentric vision: the epic-kitchens dataset. In: Proceedings of ECCV (2018)
Kazakos, E., Nagrani, A., Zisserman, A., Damen, D.: Epic-fusion: audio-visual temporal binding for egocentric action recognition. In: Proceedings of ICCV (2019)
Prabhakar, M., Mukherjee, S.: First-person activity recognition by modelling subject-action relevance. In: Proceedings of IJCNN (2022)
Rodina, I., Furnaria, A., Mavroedis, D., Farinella, G.M.: First-person activity recognition by modelling subject-action relevance. Comput. Vis. Image Understanding 211, Article number 103252 (2021)
Huang, Y., Yang, X., Xu, C.: Multimodal global relation knowledge distillation for egocentric action anticipation. In: Proceedings of ACM Multimedia (2021)
Zheng, N., Song, X., Su, T., Liu, W., Yan, Y., Nie, L.: Egocentric early action prediction via adversarial knowledge distillation. ACM Trans. Multimed. Comput. Commun. Appl. 19(2), Article number 59 (2023)
Núñez-Marcos, A., Azkune, G., Arganda-Carreras, I.: Egocentric vision-based action recognition: a survey. Neurocomputing 472, 175–197 (2022)
Willems, G., Tuytelaars, T., Van Gool, L.: An efficient dense and scale-invariant spatio-temporal interest point detector. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5303, pp. 650–663. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88688-4_48
Li, Y., Liu, M., Rehg, J.M.: In the eye of beholder: joint learning of gaze and actions in first person video. In: Proceedings of ECCV (2018)
Furnari, A., Farinella, G.M.: Rolling-unrolling LSTMs for action anticipation from first-person video. IEEE Trans. Pattern Anal. Mach. Intell. 43(11), 4021–4036 (2021)
Li, H., Zheng, W.S., Zhang, J., Hu, H., Lu, J., Lai, J.H.: Egocentric action recognition by automatic relation modeling. IEEE Trans. Pattern Anal. Mach. Intell. 45(1), 489–507 (2023)
Mascaro, E.V., Ahn, H., Lee, D.: Intention-conditioned long-term human egocentric action anticipation. In: Proceedings of WACV (2023)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Mukherjee, S., Chopra, B. (2024). Egocentric Action Prediction via Knowledge Distillation and Subject-Action Relevance. In: Kaur, H., Jakhetiya, V., Goyal, P., Khanna, P., Raman, B., Kumar, S. (eds) Computer Vision and Image Processing. CVIP 2023. Communications in Computer and Information Science, vol 2009. Springer, Cham. https://doi.org/10.1007/978-3-031-58181-6_48
Download citation
DOI: https://doi.org/10.1007/978-3-031-58181-6_48
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-58180-9
Online ISBN: 978-3-031-58181-6
eBook Packages: Computer ScienceComputer Science (R0)