Loading [a11y]/accessibility-menu.js
Human Action Recognition in First Person Videos using Verb-Object Pairs | IEEE Conference Publication | IEEE Xplore

Human Action Recognition in First Person Videos using Verb-Object Pairs


Abstract:

Human action recognition problem is important for distinguishing the rich variety of human activities in first-person videos. While there has been an improvement in egoce...Show More

Abstract:

Human action recognition problem is important for distinguishing the rich variety of human activities in first-person videos. While there has been an improvement in egocentric action recognition, the space of action categories is large and it looks impractical to label training data for all categories. In this work, we decompose action models into verb and noun model pairs and propose a method to combine them with a simple fusion strategy. Particularly, we use 3 Dimensional Convolutional Neural Network model, C3D, for verb stream to model video-based features, and we use object detection model, YOLO, for noun stream to model objects interacting with human. We present experiments on the recently introduced large-scale EGTEA Gaze+ dataset with 106 action classes, and show that our model is comparable to the state-of-the-art action recognition models.
Date of Conference: 24-26 April 2019
Date Added to IEEE Xplore: 22 August 2019
ISBN Information:
Print on Demand(PoD) ISSN: 2165-0608
Conference Location: Sivas, Turkey