Towards Open-Set Egocentric Action Recognition with Uncertainty Estimation

Zou, Yishan; Nugent, Christopher; Burns, Matthew; Xi, Xiaoming; Liu, Meng

doi:10.1007/978-3-031-78354-8_14

Yishan Zou^13,14,
Christopher Nugent¹⁴,
Matthew Burns¹⁴,
Xiaoming Xi¹³ &
…
Meng Liu¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15315))

Included in the following conference series:

International Conference on Pattern Recognition

248 Accesses

Abstract

Egocentric Action Recognition (EAR) has gained significant attention due to its widespread applicability in lifestyle analysis, medical monitoring, and industrial robotics, among other real-world scenarios. However, existing EAR methods are built on the closed-set assumption, making it challenging to handle unknown actions inevitably present in open-world scenarios and struggling to meet the dual requirements of accuracy and reliability while providing decisions. To address the Open-set EAR problem, this paper presents a Open-set Egocentric Action Recognition (OpenEAR) framework, advancing beyond traditional egocentric action recognition methods. OpenEAR distinguishes itself by adeptly handling unknown actions in open-world scenarios, a notable limitation in conventional EAR models. Utilizing large-scale pre-trained models and refined architecture, OpenEAR excels in semantic extraction from egocentric videos. Its unique incorporation of Evidential Deep Learning (EDL) allows for uncertainty estimation, enhancing prediction reliability. This novel approach not only recognizes known actions and objects but also quantifies prediction confidence, effectively managing unknown elements. Demonstrated superior performance on EPIC-KITCHENS-55 and EGTEA Gaze+ datasets underlines OpenEAR’s robustness and practicality, marking a significant leap from existing methods. The OpenEAR framework is available at https://github.com/zou-y23/OpenEAR.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.99; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Rescaling Egocentric Vision: Collection, Pipeline and Challenges for EPIC-KITCHENS-100

Article Open access 20 October 2021

Masked Video and Body-Worn IMU Autoencoder for Egocentric Action Recognition

SMART-vision: survey of modern action recognition techniques in vision

Article 21 December 2024

References

Minlong Lu, Danping Liao, and Ze-Nian Li, “Learning spatiotemporal attention for egocentric action recognition,” in Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, 2019, pp. 4425-4434
Google Scholar
Walter J Scheirer, Anderson de Rezende Rocha, Archana Sapkota, and Terrance E Boult, “Toward open set recognition,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, pp. 1757–1772
Google Scholar
Chuanxing Geng, Sheng-jun Huang, and Songcan Chen, “Recent advances in open set recognition: A survey,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, pp. 3614–3631
Google Scholar
Wentao Bao, Qi Yu, and Yu Kong, “Evidential deep learning for open set action recognition,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 13349–13358
Google Scholar
Wentao Bao, Qi Yu, and Yu Kong, “Opental: Towards open set temporal action localization,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 2979–2989
Google Scholar
Murat Sensoy, Lance Kaplan, Federico Cerutti, and Maryam Saleki, “Uncertainty-aware deep classifiers using generative models,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2020, pp. 5620–5627
Google Scholar
Kunchang Li, Yali Wang, Peng Gao, Guanglu Song, Yu Liu, Hongsheng Li, and Yu Qiao, “Uniformer: Unified transformer for efficient spatiotemporal representation learning,” arXiv preprint arXiv:2201.04676, 2022
Murat Sensoy, Lance Kaplan, and Melih Kandemir, “Evidential deep learning to quantify classification uncertainty,” in Advances in Neural Information Processing Systems, 2018, pp. 3183–3193
Google Scholar
Glenn Shafer, “Dempster-shafer theory,” in Encyclopedia of Artificial Intelligence, 1992, pp. 330–331
Google Scholar
Audun Jøsang, Subjective logic, Springer, 2016
Google Scholar
Dima Damen, Hazel Doughty, Giovanni Maria Farinella, Sanja Fidler, Antonino Furnari, Evangelos Kazakos, Davide Moltisanti, Jonathan Munro, Toby Perrett, Will Price, et al., “Scaling egocentric vision: The epic-kitchens dataset,” in Proceedings of the European Conference on Computer Vision, 2018, pp. 720–736
Google Scholar
Yin Li, Miao Liu, and James M. Rehg, “In the eye of beholder: Joint learning of gaze and actions in first person video,” in Proceedings of the European Conference on Computer Vision, 2018, pp. 619–635
Google Scholar
Yilin Wen, Hao Pan, Lei Yang, Jia Pan, Taku Komura, and Wenping Wang, “Hierarchical temporal transformer for 3d hand pose estimation and action recognition from egocentric rgb videos,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 21243–21253
Google Scholar
Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 10012–10022
Google Scholar
Ze Liu, Jia Ning, Yue Cao, Yixuan Wei, Zheng Zhang, Stephen Lin, and Han Hu, “Video swin transformer,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 3202–3211
Google Scholar
D. Damen, H. Doughty, G. M. Farinella, A. Furnari, E. Kazakos, J. Ma, D. Moltisanti, J. Munro, T. Perrett, W. Price et al., “Rescaling egocentric vision,” arXiv preprint arXiv:2006.13256, 2020
K. Grauman, A. Westbury, E. Byrne, Z. Chavis, A. Furnari, R. Girdhar, J. Hamburger, H. Jiang, M. Liu, X. Liu et al., “Ego4d: Around the world in 3,000 hours of egocentric video,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2022, pp. 18995–19012
Google Scholar
A. Fathi, A. Farhadi, and J. M. Rehg, "Understanding egocentric activities," in Proceedings of the International Conference on Computer Vision, 2011, pp. 407-414
Google Scholar
Nguyen, T.H.C., Nebel, J.C. and Florez-Revuelta, F., “Recognition of activities of daily living with egocentric vision: A review," Sensors, 2016, pp. 72
Google Scholar
C. Dibyadip and S. Fadime and M. Shugao and Y. Angela, "Opening the vocabulary of egocentric actions," Advances in Neural Information Processing Systems, 2024, pp. 33174–33187
Google Scholar
Michael Land and Benjamin Tatler, "Looking and acting: vision and eye movements in natural behaviour," Oxford University Press, 2009
Google Scholar
A. Bulling, J. A. Ward, H. Gellersen, and G. Troster, “Eye movement analysis for activity recognition using electrooculography,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, pp. 741–753
Google Scholar
K. Simonyan and A. Zisserman, “Two-stream convolutional networks for action recognition in videos,” Advances in Neural Information Processing Systems, 2014, pp. 568–576
Google Scholar
C. Li, S. Li, Y. Gao, X. Zhang, and W. Li, “A two-stream neural network for pose-based hand gesture recognition,” IEEE Transactions on Cognitive and Developmental Systems, 2021, pp. 1594–1603
Google Scholar
M. Liu, L. Ma, K. Somasundaram, Y. Li, K. Grauman, J. M. Rehg, and C. Li, “Egocentric activity recognition and localization on a 3d map,” in Proceedings of the European Conference on Computer Vision, 2022, pp. 621–638
Google Scholar
A. Furnari and G. M. Farinella, “What would you expect? anticipating egocentric actions with rolling-unrolling lstms and modality attention,” in Proceedings of the IEEE International Conference on Computer Vision, 2019, pp. 6252–6261
Google Scholar
Y. Huang, M. Cai, Z. Li, F. Lu, and Y. Sato, “Mutual context network for jointly estimating egocentric gaze and action,” IEEE Transactions on Image Processing, 2020, pp. 7795–7806
Google Scholar
M. A. Arabacı, F.Ozkan, E. Surer, P. Jancovic, and A. Temizel, “Multi-modal egocentric activity recognition using audio-visual features,” arXiv preprint arXiv:1807.00612, 2018
S. Singh, C. Arora, and C. Jawahar, “First person action recognition using deep learned descriptors,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 2620–2628
Google Scholar
Yang, L.: Egocentric action recognition from noisy videos. The University of Tokyo, Diss. (2020)
Google Scholar
F. Li and H. Wechsler, “Open set face recognition using transduction,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, pp. 1686–1697
Google Scholar
A. Bendale and T. E. Boult, “Towards open set deep networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 1563–1572
Google Scholar
Z. Ge, S. Demyanov, Z. Chen, and R. Garnavi, “Generative openmax for multi-class open set classification,” arXiv preprint arXiv:1707.07418, 2017
L. Neal, M. Olson, X. Fern, W.-K. Wong, and F. Li, “Open set learning with counterfactual images,” in Proceedings of the European Conference on Computer Vision, 2018, pp. 613–628
Google Scholar
L. Ditria, B. J. Meyer, and T. Drummond, “Opengan: Open set generative adversarial networks,” arXiv preprint arXiv:2006.16241, 2020
P. Oza and V. M. Patel, “C2ae: Class conditioned auto-encoder for open-set recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 2307–2316
Google Scholar
R. Yoshihashi, W. Shao, R. Kawakami, S. You, M. Iida, and T. Naemura, “Classification-reconstruction learning for open-set recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 4016–4025
Google Scholar
X. Sun, Z. Yang, C. Zhang, K.-V. Ling, and G. Peng, “Conditional gaussian distribution learning for open set recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 13480–13489
Google Scholar
Tran, D., Snoek, J., Lakshminarayanan, B.: Practical uncertainty estimation and out-of-distribution robustness in deep learning. Technical Report, Google Brain (2020)
Google Scholar
G. Pang, C. Shen, L. Cao, and A. V. D. Hengel, “Deep learning for anomaly detection: A review,” ACM Computing Surveys, 2021, pp. 1–38
Google Scholar
D. Mandal, S. Narayan, S. K. Dwivedi, V. Gupta, S. Ahmed, F. S. Khan, and L. Shao, “Out-of-distribution detection for generalized zero-shot action recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 9985–9993
Google Scholar
A. Bendale and T. Boult, “Towards open world recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 1893–1902
Google Scholar
S. N. Aakur and S. Kundu and N. Gunti, "Knowledge guided learning: Open world egocentric action recognition with zero supervision", Pattern Recognition Letters, 2022, pp. 38-45
Google Scholar
S. Kuniaki and K. Donghyun and S. Kate, "Openmatch: Open-set semi-supervised learning with open-set consistency regularization", Advances in Neural Information Processing Systems, 2021, pp. 25956-25967
Google Scholar
Y. Shu, Y. Shi, Y. Wang, Y. Zou, Q. Yuan, and Y. Tian, “Odn: Opening the deep network for open-set action recognition,” in 2018 IEEE International Conference on Multimedia and Expo. IEEE, 2018, pp. 1–6
Google Scholar
R. Krishnan, M. Subedar, and O. Tickoo, “Bar: Bayesian activity recog- nition using variational inference,” arXiv preprint arXiv:1811.03305, 2018
M. Subedar, R. Krishnan, P. L. Meyer, O. Tickoo, and J. Huang, “Uncertainty-aware audiovisual activity recognition using deep bayesian variational inference,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 6301–6310
Google Scholar
R. Krishnan, M. Subedar, and O. Tickoo, “Specifying weight priors in bayesian deep neural networks with empirical bayes,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2020, pp. 4477–4484
Google Scholar
P. P. Busto, A. Iqbal, and J. Gall, “Open set domain adaptation for image and action recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, pp. 413–429
Google Scholar
A. Malinin and M. Gales, “Predictive uncertainty estimation via prior networks,” Advances in Neural Information Processing Systems, 2018, pp. 7047–7058
Google Scholar
B. Charpentier, D. Zugner, and S. Gunnemann, “Posterior network: Uncertainty estimation without ood samples via density-based pseudo-counts,” Advances in Neural Information Processing Systems, 2020, pp. 1356–1367
Google Scholar
W. Shi, X. Zhao, F. Chen, and Q. Yu, “Multifaceted uncertainty estimation for label-efficient deep learning,” Advances in Neural Information Processing Systems, 2020, pp. 17247–17257
Google Scholar
F. Kraus and K. Dietmayer, “Uncertainty estimation in one-stage object detection,” IEEE Intelligent Transportation Systems Conference, 2019, pp. 53–60
Google Scholar
W. Bao, Q. Yu, and Y. Kong, “Uncertainty-based traffic accident anticipation with spatio-temporal relational learning,” in Proceedings of the 28th ACM International Conference on Multimedia, 2020, pp. 2682–2690
Google Scholar
A. Amini, W. Schwarting, A. Soleimany, and D. Rus, “Deep evidential regression,” Advances in Neural Information Processing Systems, 2020, pp. 14927–14937
Google Scholar

Download references

Acknowledgment

This work was supported in part by the National Natural Science Foundation of China, No.:62376140, and No.:U23A20315; the Science and Technology Innovation Program for Distinguished Young Scholars of Shandong Province Higher Education Institutions, No.:2023KJ128, and in part by the Special Fund for distinguished professors of Shandong Jianzhu University.

Author information

Authors and Affiliations

Shandong Jianzhu University, Jinan, Shandong, China
Yishan Zou, Xiaoming Xi & Meng Liu
Ulster University, Belfast, BT15 1AP, UK
Yishan Zou, Christopher Nugent & Matthew Burns

Authors

Yishan Zou
View author publications
You can also search for this author in PubMed Google Scholar
Christopher Nugent
View author publications
You can also search for this author in PubMed Google Scholar
Matthew Burns
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoming Xi
View author publications
You can also search for this author in PubMed Google Scholar
Meng Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Meng Liu .

Editor information

Editors and Affiliations

University of Salford, Salford, Lancashire, UK
Apostolos Antonacopoulos
Indian Institute of Technology Bombay, Mumbai, Maharashtra, India
Subhasis Chaudhuri
Johns Hopkins University, Baltimore, MD, USA
Rama Chellappa
Chinese Academy of Sciences, Beijing, China
Cheng-Lin Liu
IIT Kharagpur, Kharagpur, West Bengal, India
Saumik Bhattacharya
Indian Statistical Institute Kolkata, Kolkata, West Bengal, India
Umapada Pal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zou, Y., Nugent, C., Burns, M., Xi, X., Liu, M. (2025). Towards Open-Set Egocentric Action Recognition with Uncertainty Estimation. In: Antonacopoulos, A., Chaudhuri, S., Chellappa, R., Liu, CL., Bhattacharya, S., Pal, U. (eds) Pattern Recognition. ICPR 2024. Lecture Notes in Computer Science, vol 15315. Springer, Cham. https://doi.org/10.1007/978-3-031-78354-8_14

Download citation

DOI: https://doi.org/10.1007/978-3-031-78354-8_14
Published: 04 December 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-78353-1
Online ISBN: 978-3-031-78354-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)