Skip to main content

Towards Open-Set Egocentric Action Recognition with Uncertainty Estimation

  • Conference paper
  • First Online:
Pattern Recognition (ICPR 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15315))

Included in the following conference series:

  • 248 Accesses

Abstract

Egocentric Action Recognition (EAR) has gained significant attention due to its widespread applicability in lifestyle analysis, medical monitoring, and industrial robotics, among other real-world scenarios. However, existing EAR methods are built on the closed-set assumption, making it challenging to handle unknown actions inevitably present in open-world scenarios and struggling to meet the dual requirements of accuracy and reliability while providing decisions. To address the Open-set EAR problem, this paper presents a Open-set Egocentric Action Recognition (OpenEAR) framework, advancing beyond traditional egocentric action recognition methods. OpenEAR distinguishes itself by adeptly handling unknown actions in open-world scenarios, a notable limitation in conventional EAR models. Utilizing large-scale pre-trained models and refined architecture, OpenEAR excels in semantic extraction from egocentric videos. Its unique incorporation of Evidential Deep Learning (EDL) allows for uncertainty estimation, enhancing prediction reliability. This novel approach not only recognizes known actions and objects but also quantifies prediction confidence, effectively managing unknown elements. Demonstrated superior performance on EPIC-KITCHENS-55 and EGTEA Gaze+ datasets underlines OpenEAR’s robustness and practicality, marking a significant leap from existing methods. The OpenEAR framework is available at https://github.com/zou-y23/OpenEAR.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Minlong Lu, Danping Liao, and Ze-Nian Li, “Learning spatiotemporal attention for egocentric action recognition,” in Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, 2019, pp. 4425-4434

    Google Scholar 

  2. Walter J Scheirer, Anderson de Rezende Rocha, Archana Sapkota, and Terrance E Boult, “Toward open set recognition,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, pp. 1757–1772

    Google Scholar 

  3. Chuanxing Geng, Sheng-jun Huang, and Songcan Chen, “Recent advances in open set recognition: A survey,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, pp. 3614–3631

    Google Scholar 

  4. Wentao Bao, Qi Yu, and Yu Kong, “Evidential deep learning for open set action recognition,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 13349–13358

    Google Scholar 

  5. Wentao Bao, Qi Yu, and Yu Kong, “Opental: Towards open set temporal action localization,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 2979–2989

    Google Scholar 

  6. Murat Sensoy, Lance Kaplan, Federico Cerutti, and Maryam Saleki, “Uncertainty-aware deep classifiers using generative models,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2020, pp. 5620–5627

    Google Scholar 

  7. Kunchang Li, Yali Wang, Peng Gao, Guanglu Song, Yu Liu, Hongsheng Li, and Yu Qiao, “Uniformer: Unified transformer for efficient spatiotemporal representation learning,” arXiv preprint arXiv:2201.04676, 2022

  8. Murat Sensoy, Lance Kaplan, and Melih Kandemir, “Evidential deep learning to quantify classification uncertainty,” in Advances in Neural Information Processing Systems, 2018, pp. 3183–3193

    Google Scholar 

  9. Glenn Shafer, “Dempster-shafer theory,” in Encyclopedia of Artificial Intelligence, 1992, pp. 330–331

    Google Scholar 

  10. Audun Jøsang, Subjective logic, Springer, 2016

    Google Scholar 

  11. Dima Damen, Hazel Doughty, Giovanni Maria Farinella, Sanja Fidler, Antonino Furnari, Evangelos Kazakos, Davide Moltisanti, Jonathan Munro, Toby Perrett, Will Price, et al., “Scaling egocentric vision: The epic-kitchens dataset,” in Proceedings of the European Conference on Computer Vision, 2018, pp. 720–736

    Google Scholar 

  12. Yin Li, Miao Liu, and James M. Rehg, “In the eye of beholder: Joint learning of gaze and actions in first person video,” in Proceedings of the European Conference on Computer Vision, 2018, pp. 619–635

    Google Scholar 

  13. Yilin Wen, Hao Pan, Lei Yang, Jia Pan, Taku Komura, and Wenping Wang, “Hierarchical temporal transformer for 3d hand pose estimation and action recognition from egocentric rgb videos,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 21243–21253

    Google Scholar 

  14. Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 10012–10022

    Google Scholar 

  15. Ze Liu, Jia Ning, Yue Cao, Yixuan Wei, Zheng Zhang, Stephen Lin, and Han Hu, “Video swin transformer,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 3202–3211

    Google Scholar 

  16. D. Damen, H. Doughty, G. M. Farinella, A. Furnari, E. Kazakos, J. Ma, D. Moltisanti, J. Munro, T. Perrett, W. Price et al., “Rescaling egocentric vision,” arXiv preprint arXiv:2006.13256, 2020

  17. K. Grauman, A. Westbury, E. Byrne, Z. Chavis, A. Furnari, R. Girdhar, J. Hamburger, H. Jiang, M. Liu, X. Liu et al., “Ego4d: Around the world in 3,000 hours of egocentric video,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2022, pp. 18995–19012

    Google Scholar 

  18. A. Fathi, A. Farhadi, and J. M. Rehg, "Understanding egocentric activities," in Proceedings of the International Conference on Computer Vision, 2011, pp. 407-414

    Google Scholar 

  19. Nguyen, T.H.C., Nebel, J.C. and Florez-Revuelta, F., “Recognition of activities of daily living with egocentric vision: A review," Sensors, 2016, pp. 72

    Google Scholar 

  20. C. Dibyadip and S. Fadime and M. Shugao and Y. Angela, "Opening the vocabulary of egocentric actions," Advances in Neural Information Processing Systems, 2024, pp. 33174–33187

    Google Scholar 

  21. Michael Land and Benjamin Tatler, "Looking and acting: vision and eye movements in natural behaviour," Oxford University Press, 2009

    Google Scholar 

  22. A. Bulling, J. A. Ward, H. Gellersen, and G. Troster, “Eye movement analysis for activity recognition using electrooculography,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, pp. 741–753

    Google Scholar 

  23. K. Simonyan and A. Zisserman, “Two-stream convolutional networks for action recognition in videos,” Advances in Neural Information Processing Systems, 2014, pp. 568–576

    Google Scholar 

  24. C. Li, S. Li, Y. Gao, X. Zhang, and W. Li, “A two-stream neural network for pose-based hand gesture recognition,” IEEE Transactions on Cognitive and Developmental Systems, 2021, pp. 1594–1603

    Google Scholar 

  25. M. Liu, L. Ma, K. Somasundaram, Y. Li, K. Grauman, J. M. Rehg, and C. Li, “Egocentric activity recognition and localization on a 3d map,” in Proceedings of the European Conference on Computer Vision, 2022, pp. 621–638

    Google Scholar 

  26. A. Furnari and G. M. Farinella, “What would you expect? anticipating egocentric actions with rolling-unrolling lstms and modality attention,” in Proceedings of the IEEE International Conference on Computer Vision, 2019, pp. 6252–6261

    Google Scholar 

  27. Y. Huang, M. Cai, Z. Li, F. Lu, and Y. Sato, “Mutual context network for jointly estimating egocentric gaze and action,” IEEE Transactions on Image Processing, 2020, pp. 7795–7806

    Google Scholar 

  28. M. A. Arabacı, F.Ozkan, E. Surer, P. Jancovic, and A. Temizel, “Multi-modal egocentric activity recognition using audio-visual features,” arXiv preprint arXiv:1807.00612, 2018

  29. S. Singh, C. Arora, and C. Jawahar, “First person action recognition using deep learned descriptors,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 2620–2628

    Google Scholar 

  30. Yang, L.: Egocentric action recognition from noisy videos. The University of Tokyo, Diss. (2020)

    Google Scholar 

  31. F. Li and H. Wechsler, “Open set face recognition using transduction,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, pp. 1686–1697

    Google Scholar 

  32. A. Bendale and T. E. Boult, “Towards open set deep networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 1563–1572

    Google Scholar 

  33. Z. Ge, S. Demyanov, Z. Chen, and R. Garnavi, “Generative openmax for multi-class open set classification,” arXiv preprint arXiv:1707.07418, 2017

  34. L. Neal, M. Olson, X. Fern, W.-K. Wong, and F. Li, “Open set learning with counterfactual images,” in Proceedings of the European Conference on Computer Vision, 2018, pp. 613–628

    Google Scholar 

  35. L. Ditria, B. J. Meyer, and T. Drummond, “Opengan: Open set generative adversarial networks,” arXiv preprint arXiv:2006.16241, 2020

  36. P. Oza and V. M. Patel, “C2ae: Class conditioned auto-encoder for open-set recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 2307–2316

    Google Scholar 

  37. R. Yoshihashi, W. Shao, R. Kawakami, S. You, M. Iida, and T. Naemura, “Classification-reconstruction learning for open-set recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 4016–4025

    Google Scholar 

  38. X. Sun, Z. Yang, C. Zhang, K.-V. Ling, and G. Peng, “Conditional gaussian distribution learning for open set recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 13480–13489

    Google Scholar 

  39. Tran, D., Snoek, J., Lakshminarayanan, B.: Practical uncertainty estimation and out-of-distribution robustness in deep learning. Technical Report, Google Brain (2020)

    Google Scholar 

  40. G. Pang, C. Shen, L. Cao, and A. V. D. Hengel, “Deep learning for anomaly detection: A review,” ACM Computing Surveys, 2021, pp. 1–38

    Google Scholar 

  41. D. Mandal, S. Narayan, S. K. Dwivedi, V. Gupta, S. Ahmed, F. S. Khan, and L. Shao, “Out-of-distribution detection for generalized zero-shot action recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 9985–9993

    Google Scholar 

  42. A. Bendale and T. Boult, “Towards open world recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 1893–1902

    Google Scholar 

  43. S. N. Aakur and S. Kundu and N. Gunti, "Knowledge guided learning: Open world egocentric action recognition with zero supervision", Pattern Recognition Letters, 2022, pp. 38-45

    Google Scholar 

  44. S. Kuniaki and K. Donghyun and S. Kate, "Openmatch: Open-set semi-supervised learning with open-set consistency regularization", Advances in Neural Information Processing Systems, 2021, pp. 25956-25967

    Google Scholar 

  45. Y. Shu, Y. Shi, Y. Wang, Y. Zou, Q. Yuan, and Y. Tian, “Odn: Opening the deep network for open-set action recognition,” in 2018 IEEE International Conference on Multimedia and Expo. IEEE, 2018, pp. 1–6

    Google Scholar 

  46. R. Krishnan, M. Subedar, and O. Tickoo, “Bar: Bayesian activity recog- nition using variational inference,” arXiv preprint arXiv:1811.03305, 2018

  47. M. Subedar, R. Krishnan, P. L. Meyer, O. Tickoo, and J. Huang, “Uncertainty-aware audiovisual activity recognition using deep bayesian variational inference,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 6301–6310

    Google Scholar 

  48. R. Krishnan, M. Subedar, and O. Tickoo, “Specifying weight priors in bayesian deep neural networks with empirical bayes,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2020, pp. 4477–4484

    Google Scholar 

  49. P. P. Busto, A. Iqbal, and J. Gall, “Open set domain adaptation for image and action recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, pp. 413–429

    Google Scholar 

  50. A. Malinin and M. Gales, “Predictive uncertainty estimation via prior networks,” Advances in Neural Information Processing Systems, 2018, pp. 7047–7058

    Google Scholar 

  51. B. Charpentier, D. Zugner, and S. Gunnemann, “Posterior network: Uncertainty estimation without ood samples via density-based pseudo-counts,” Advances in Neural Information Processing Systems, 2020, pp. 1356–1367

    Google Scholar 

  52. W. Shi, X. Zhao, F. Chen, and Q. Yu, “Multifaceted uncertainty estimation for label-efficient deep learning,” Advances in Neural Information Processing Systems, 2020, pp. 17247–17257

    Google Scholar 

  53. F. Kraus and K. Dietmayer, “Uncertainty estimation in one-stage object detection,” IEEE Intelligent Transportation Systems Conference, 2019, pp. 53–60

    Google Scholar 

  54. W. Bao, Q. Yu, and Y. Kong, “Uncertainty-based traffic accident anticipation with spatio-temporal relational learning,” in Proceedings of the 28th ACM International Conference on Multimedia, 2020, pp. 2682–2690

    Google Scholar 

  55. A. Amini, W. Schwarting, A. Soleimany, and D. Rus, “Deep evidential regression,” Advances in Neural Information Processing Systems, 2020, pp. 14927–14937

    Google Scholar 

Download references

Acknowledgment

This work was supported in part by the National Natural Science Foundation of China, No.:62376140, and No.:U23A20315; the Science and Technology Innovation Program for Distinguished Young Scholars of Shandong Province Higher Education Institutions, No.:2023KJ128, and in part by the Special Fund for distinguished professors of Shandong Jianzhu University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Meng Liu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zou, Y., Nugent, C., Burns, M., Xi, X., Liu, M. (2025). Towards Open-Set Egocentric Action Recognition with Uncertainty Estimation. In: Antonacopoulos, A., Chaudhuri, S., Chellappa, R., Liu, CL., Bhattacharya, S., Pal, U. (eds) Pattern Recognition. ICPR 2024. Lecture Notes in Computer Science, vol 15315. Springer, Cham. https://doi.org/10.1007/978-3-031-78354-8_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-78354-8_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-78353-1

  • Online ISBN: 978-3-031-78354-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics