Abstract
Egocentric Action Recognition (EAR) has gained significant attention due to its widespread applicability in lifestyle analysis, medical monitoring, and industrial robotics, among other real-world scenarios. However, existing EAR methods are built on the closed-set assumption, making it challenging to handle unknown actions inevitably present in open-world scenarios and struggling to meet the dual requirements of accuracy and reliability while providing decisions. To address the Open-set EAR problem, this paper presents a Open-set Egocentric Action Recognition (OpenEAR) framework, advancing beyond traditional egocentric action recognition methods. OpenEAR distinguishes itself by adeptly handling unknown actions in open-world scenarios, a notable limitation in conventional EAR models. Utilizing large-scale pre-trained models and refined architecture, OpenEAR excels in semantic extraction from egocentric videos. Its unique incorporation of Evidential Deep Learning (EDL) allows for uncertainty estimation, enhancing prediction reliability. This novel approach not only recognizes known actions and objects but also quantifies prediction confidence, effectively managing unknown elements. Demonstrated superior performance on EPIC-KITCHENS-55 and EGTEA Gaze+ datasets underlines OpenEAR’s robustness and practicality, marking a significant leap from existing methods. The OpenEAR framework is available at https://github.com/zou-y23/OpenEAR.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Minlong Lu, Danping Liao, and Ze-Nian Li, “Learning spatiotemporal attention for egocentric action recognition,” in Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, 2019, pp. 4425-4434
Walter J Scheirer, Anderson de Rezende Rocha, Archana Sapkota, and Terrance E Boult, “Toward open set recognition,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, pp. 1757–1772
Chuanxing Geng, Sheng-jun Huang, and Songcan Chen, “Recent advances in open set recognition: A survey,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, pp. 3614–3631
Wentao Bao, Qi Yu, and Yu Kong, “Evidential deep learning for open set action recognition,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 13349–13358
Wentao Bao, Qi Yu, and Yu Kong, “Opental: Towards open set temporal action localization,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 2979–2989
Murat Sensoy, Lance Kaplan, Federico Cerutti, and Maryam Saleki, “Uncertainty-aware deep classifiers using generative models,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2020, pp. 5620–5627
Kunchang Li, Yali Wang, Peng Gao, Guanglu Song, Yu Liu, Hongsheng Li, and Yu Qiao, “Uniformer: Unified transformer for efficient spatiotemporal representation learning,” arXiv preprint arXiv:2201.04676, 2022
Murat Sensoy, Lance Kaplan, and Melih Kandemir, “Evidential deep learning to quantify classification uncertainty,” in Advances in Neural Information Processing Systems, 2018, pp. 3183–3193
Glenn Shafer, “Dempster-shafer theory,” in Encyclopedia of Artificial Intelligence, 1992, pp. 330–331
Audun Jøsang, Subjective logic, Springer, 2016
Dima Damen, Hazel Doughty, Giovanni Maria Farinella, Sanja Fidler, Antonino Furnari, Evangelos Kazakos, Davide Moltisanti, Jonathan Munro, Toby Perrett, Will Price, et al., “Scaling egocentric vision: The epic-kitchens dataset,” in Proceedings of the European Conference on Computer Vision, 2018, pp. 720–736
Yin Li, Miao Liu, and James M. Rehg, “In the eye of beholder: Joint learning of gaze and actions in first person video,” in Proceedings of the European Conference on Computer Vision, 2018, pp. 619–635
Yilin Wen, Hao Pan, Lei Yang, Jia Pan, Taku Komura, and Wenping Wang, “Hierarchical temporal transformer for 3d hand pose estimation and action recognition from egocentric rgb videos,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 21243–21253
Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 10012–10022
Ze Liu, Jia Ning, Yue Cao, Yixuan Wei, Zheng Zhang, Stephen Lin, and Han Hu, “Video swin transformer,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 3202–3211
D. Damen, H. Doughty, G. M. Farinella, A. Furnari, E. Kazakos, J. Ma, D. Moltisanti, J. Munro, T. Perrett, W. Price et al., “Rescaling egocentric vision,” arXiv preprint arXiv:2006.13256, 2020
K. Grauman, A. Westbury, E. Byrne, Z. Chavis, A. Furnari, R. Girdhar, J. Hamburger, H. Jiang, M. Liu, X. Liu et al., “Ego4d: Around the world in 3,000 hours of egocentric video,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2022, pp. 18995–19012
A. Fathi, A. Farhadi, and J. M. Rehg, "Understanding egocentric activities," in Proceedings of the International Conference on Computer Vision, 2011, pp. 407-414
Nguyen, T.H.C., Nebel, J.C. and Florez-Revuelta, F., “Recognition of activities of daily living with egocentric vision: A review," Sensors, 2016, pp. 72
C. Dibyadip and S. Fadime and M. Shugao and Y. Angela, "Opening the vocabulary of egocentric actions," Advances in Neural Information Processing Systems, 2024, pp. 33174–33187
Michael Land and Benjamin Tatler, "Looking and acting: vision and eye movements in natural behaviour," Oxford University Press, 2009
A. Bulling, J. A. Ward, H. Gellersen, and G. Troster, “Eye movement analysis for activity recognition using electrooculography,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, pp. 741–753
K. Simonyan and A. Zisserman, “Two-stream convolutional networks for action recognition in videos,” Advances in Neural Information Processing Systems, 2014, pp. 568–576
C. Li, S. Li, Y. Gao, X. Zhang, and W. Li, “A two-stream neural network for pose-based hand gesture recognition,” IEEE Transactions on Cognitive and Developmental Systems, 2021, pp. 1594–1603
M. Liu, L. Ma, K. Somasundaram, Y. Li, K. Grauman, J. M. Rehg, and C. Li, “Egocentric activity recognition and localization on a 3d map,” in Proceedings of the European Conference on Computer Vision, 2022, pp. 621–638
A. Furnari and G. M. Farinella, “What would you expect? anticipating egocentric actions with rolling-unrolling lstms and modality attention,” in Proceedings of the IEEE International Conference on Computer Vision, 2019, pp. 6252–6261
Y. Huang, M. Cai, Z. Li, F. Lu, and Y. Sato, “Mutual context network for jointly estimating egocentric gaze and action,” IEEE Transactions on Image Processing, 2020, pp. 7795–7806
M. A. Arabacı, F.Ozkan, E. Surer, P. Jancovic, and A. Temizel, “Multi-modal egocentric activity recognition using audio-visual features,” arXiv preprint arXiv:1807.00612, 2018
S. Singh, C. Arora, and C. Jawahar, “First person action recognition using deep learned descriptors,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 2620–2628
Yang, L.: Egocentric action recognition from noisy videos. The University of Tokyo, Diss. (2020)
F. Li and H. Wechsler, “Open set face recognition using transduction,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, pp. 1686–1697
A. Bendale and T. E. Boult, “Towards open set deep networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 1563–1572
Z. Ge, S. Demyanov, Z. Chen, and R. Garnavi, “Generative openmax for multi-class open set classification,” arXiv preprint arXiv:1707.07418, 2017
L. Neal, M. Olson, X. Fern, W.-K. Wong, and F. Li, “Open set learning with counterfactual images,” in Proceedings of the European Conference on Computer Vision, 2018, pp. 613–628
L. Ditria, B. J. Meyer, and T. Drummond, “Opengan: Open set generative adversarial networks,” arXiv preprint arXiv:2006.16241, 2020
P. Oza and V. M. Patel, “C2ae: Class conditioned auto-encoder for open-set recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 2307–2316
R. Yoshihashi, W. Shao, R. Kawakami, S. You, M. Iida, and T. Naemura, “Classification-reconstruction learning for open-set recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 4016–4025
X. Sun, Z. Yang, C. Zhang, K.-V. Ling, and G. Peng, “Conditional gaussian distribution learning for open set recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 13480–13489
Tran, D., Snoek, J., Lakshminarayanan, B.: Practical uncertainty estimation and out-of-distribution robustness in deep learning. Technical Report, Google Brain (2020)
G. Pang, C. Shen, L. Cao, and A. V. D. Hengel, “Deep learning for anomaly detection: A review,” ACM Computing Surveys, 2021, pp. 1–38
D. Mandal, S. Narayan, S. K. Dwivedi, V. Gupta, S. Ahmed, F. S. Khan, and L. Shao, “Out-of-distribution detection for generalized zero-shot action recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 9985–9993
A. Bendale and T. Boult, “Towards open world recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 1893–1902
S. N. Aakur and S. Kundu and N. Gunti, "Knowledge guided learning: Open world egocentric action recognition with zero supervision", Pattern Recognition Letters, 2022, pp. 38-45
S. Kuniaki and K. Donghyun and S. Kate, "Openmatch: Open-set semi-supervised learning with open-set consistency regularization", Advances in Neural Information Processing Systems, 2021, pp. 25956-25967
Y. Shu, Y. Shi, Y. Wang, Y. Zou, Q. Yuan, and Y. Tian, “Odn: Opening the deep network for open-set action recognition,” in 2018 IEEE International Conference on Multimedia and Expo. IEEE, 2018, pp. 1–6
R. Krishnan, M. Subedar, and O. Tickoo, “Bar: Bayesian activity recog- nition using variational inference,” arXiv preprint arXiv:1811.03305, 2018
M. Subedar, R. Krishnan, P. L. Meyer, O. Tickoo, and J. Huang, “Uncertainty-aware audiovisual activity recognition using deep bayesian variational inference,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 6301–6310
R. Krishnan, M. Subedar, and O. Tickoo, “Specifying weight priors in bayesian deep neural networks with empirical bayes,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2020, pp. 4477–4484
P. P. Busto, A. Iqbal, and J. Gall, “Open set domain adaptation for image and action recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, pp. 413–429
A. Malinin and M. Gales, “Predictive uncertainty estimation via prior networks,” Advances in Neural Information Processing Systems, 2018, pp. 7047–7058
B. Charpentier, D. Zugner, and S. Gunnemann, “Posterior network: Uncertainty estimation without ood samples via density-based pseudo-counts,” Advances in Neural Information Processing Systems, 2020, pp. 1356–1367
W. Shi, X. Zhao, F. Chen, and Q. Yu, “Multifaceted uncertainty estimation for label-efficient deep learning,” Advances in Neural Information Processing Systems, 2020, pp. 17247–17257
F. Kraus and K. Dietmayer, “Uncertainty estimation in one-stage object detection,” IEEE Intelligent Transportation Systems Conference, 2019, pp. 53–60
W. Bao, Q. Yu, and Y. Kong, “Uncertainty-based traffic accident anticipation with spatio-temporal relational learning,” in Proceedings of the 28th ACM International Conference on Multimedia, 2020, pp. 2682–2690
A. Amini, W. Schwarting, A. Soleimany, and D. Rus, “Deep evidential regression,” Advances in Neural Information Processing Systems, 2020, pp. 14927–14937
Acknowledgment
This work was supported in part by the National Natural Science Foundation of China, No.:62376140, and No.:U23A20315; the Science and Technology Innovation Program for Distinguished Young Scholars of Shandong Province Higher Education Institutions, No.:2023KJ128, and in part by the Special Fund for distinguished professors of Shandong Jianzhu University.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Zou, Y., Nugent, C., Burns, M., Xi, X., Liu, M. (2025). Towards Open-Set Egocentric Action Recognition with Uncertainty Estimation. In: Antonacopoulos, A., Chaudhuri, S., Chellappa, R., Liu, CL., Bhattacharya, S., Pal, U. (eds) Pattern Recognition. ICPR 2024. Lecture Notes in Computer Science, vol 15315. Springer, Cham. https://doi.org/10.1007/978-3-031-78354-8_14
Download citation
DOI: https://doi.org/10.1007/978-3-031-78354-8_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-78353-1
Online ISBN: 978-3-031-78354-8
eBook Packages: Computer ScienceComputer Science (R0)