Abstract:
In this paper, we introduce a novel framework to significantly reduce the computational cost of human temporal activity recognition from egocentric videos while maintaini...Show MoreMetadata
Abstract:
In this paper, we introduce a novel framework to significantly reduce the computational cost of human temporal activity recognition from egocentric videos while maintaining the accuracy at the same level. We propose to apply the actor-critic model of reinforcement learning to optical flow data to locate a bounding box around region of interest, which is then used for clipping a sub-image from a video frame. We also propose to use one shallow and one deeper 3D convolutional neural network to process the original image and the clipped image region, respectively. We compared our proposed method with another approach using 3D convolutional networks on the recently released Dataset of Multimodal Semantic Egocentric Video. Experimental results show that the proposed method reduces the processing time by 36.4% while providing comparable accuracy at the same time.
Date of Conference: 22-25 September 2019
Date Added to IEEE Xplore: 26 August 2019
ISBN Information: