Skip to main content

Egocentric Action Prediction via Knowledge Distillation and Subject-Action Relevance

  • Conference paper
  • First Online:
Computer Vision and Image Processing (CVIP 2023)

Abstract

Egocentric action prediction aims to predict the future actions of the performer wearing the camera, given a partial video segment. The challenge in egocentric action prediction tasks (over action recognition) is the lack of context. The definition of an egocentric action as a noun-verb combination helps to obtain the context information from the video. However, the lack of training data to train two sequential deep-learning models to detect nouns and verbs remains a challenge. We propose a model involving two sequential GAN architectures where the first one determines the noun (based on the frames), and the next determines the verb (based on the optical flow). Each of these GAN architectures consists of a teacher and a student architecture. The teacher architectures are much deeper CNNs (pre-trained on a large dataset) compared to the student architectures. We further reduce the search space for the verb by applying the Reduced Verb Space Generator (RVSGen) algorithm between the noun prediction and verb prediction processes. RVSGen helps in proposing the most suitable verbs corresponding to the noun obtained by the noun predictor. Experimentation carried out on a benchmark dataset shows the efficacy of the proposed model compared to the state-of-the-art.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Buddubariki, V., Tulluri, S.G., Mukherjee, S.: Event recognition in egocentric videos using a novel trajectory based feature. In: Proceedings of ICVGIP 2016, pp. 76:1–76:8 (2016)

    Google Scholar 

  2. Sai Suma, K., Aditya, G., Mukherjee, S.: Activity recognition in egocentric videos using bag of key action units. In: Proceedings of ICVGIP 2018, pp. 9:1–9:9 (2018)

    Google Scholar 

  3. Kahani, R., Talebpour, A., Mahmoudi-Aznaveh, A.: A correlation based feature representation for first-person activity recognition. Multimed. Tools Appl. 78(15), 21673–21694 (2019). https://doi.org/10.1007/s11042-019-7429-3

    Article  Google Scholar 

  4. Singh, S., Arora, C., Jawahar, C.V.: Trajectory aligned features for first person action recognition. Pattern Recogn. 62, 45–55 (2017)

    Article  Google Scholar 

  5. Damen, D., et al.: Scaling egocentric vision: the epic-kitchens dataset. In: Proceedings of ECCV (2018)

    Google Scholar 

  6. Kazakos, E., Nagrani, A., Zisserman, A., Damen, D.: Epic-fusion: audio-visual temporal binding for egocentric action recognition. In: Proceedings of ICCV (2019)

    Google Scholar 

  7. Prabhakar, M., Mukherjee, S.: First-person activity recognition by modelling subject-action relevance. In: Proceedings of IJCNN (2022)

    Google Scholar 

  8. Rodina, I., Furnaria, A., Mavroedis, D., Farinella, G.M.: First-person activity recognition by modelling subject-action relevance. Comput. Vis. Image Understanding 211, Article number 103252 (2021)

    Google Scholar 

  9. Huang, Y., Yang, X., Xu, C.: Multimodal global relation knowledge distillation for egocentric action anticipation. In: Proceedings of ACM Multimedia (2021)

    Google Scholar 

  10. Zheng, N., Song, X., Su, T., Liu, W., Yan, Y., Nie, L.: Egocentric early action prediction via adversarial knowledge distillation. ACM Trans. Multimed. Comput. Commun. Appl. 19(2), Article number 59 (2023)

    Google Scholar 

  11. Núñez-Marcos, A., Azkune, G., Arganda-Carreras, I.: Egocentric vision-based action recognition: a survey. Neurocomputing 472, 175–197 (2022)

    Article  Google Scholar 

  12. Willems, G., Tuytelaars, T., Van Gool, L.: An efficient dense and scale-invariant spatio-temporal interest point detector. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5303, pp. 650–663. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88688-4_48

    Chapter  Google Scholar 

  13. Li, Y., Liu, M., Rehg, J.M.: In the eye of beholder: joint learning of gaze and actions in first person video. In: Proceedings of ECCV (2018)

    Google Scholar 

  14. Furnari, A., Farinella, G.M.: Rolling-unrolling LSTMs for action anticipation from first-person video. IEEE Trans. Pattern Anal. Mach. Intell. 43(11), 4021–4036 (2021)

    Article  Google Scholar 

  15. Li, H., Zheng, W.S., Zhang, J., Hu, H., Lu, J., Lai, J.H.: Egocentric action recognition by automatic relation modeling. IEEE Trans. Pattern Anal. Mach. Intell. 45(1), 489–507 (2023)

    Article  Google Scholar 

  16. Mascaro, E.V., Ahn, H., Lee, D.: Intention-conditioned long-term human egocentric action anticipation. In: Proceedings of WACV (2023)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Snehasis Mukherjee .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mukherjee, S., Chopra, B. (2024). Egocentric Action Prediction via Knowledge Distillation and Subject-Action Relevance. In: Kaur, H., Jakhetiya, V., Goyal, P., Khanna, P., Raman, B., Kumar, S. (eds) Computer Vision and Image Processing. CVIP 2023. Communications in Computer and Information Science, vol 2009. Springer, Cham. https://doi.org/10.1007/978-3-031-58181-6_48

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-58181-6_48

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-58180-9

  • Online ISBN: 978-3-031-58181-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics