Abstract:
Convolutional neural networks have pushed the boundaries of action recognition in videos, especially with the introduction of 3D convolutions. But it is an open ended que...Show MoreMetadata
Abstract:
Convolutional neural networks have pushed the boundaries of action recognition in videos, especially with the introduction of 3D convolutions. But it is an open ended question on how efficiently a 3D CNN can model temporal information? which we try to investigate and introduce a new optical flow representation to improve the motion stream. We use the baseline inflated 3D CNN networks and separate the convolutional filters into spatial and temporal, which reduces the number of parameters with minimal loss of accuracy. We evaluate our approach on NTU RGBD dataset which is the largest human action dataset and outperform the state-of-the-art by a large margin.
Published in: 2018 IEEE International Conference on Image Processing, Applications and Systems (IPAS)
Date of Conference: 12-14 December 2018
Date Added to IEEE Xplore: 09 May 2019
ISBN Information: