Abstract:
Recent approaches in RGB-based and depth-based human action recognition achieved outstanding performance respectively, which demonstrate the effectiveness of RGB and dept...Show MoreMetadata
Abstract:
Recent approaches in RGB-based and depth-based human action recognition achieved outstanding performance respectively, which demonstrate the effectiveness of RGB and depth modalities for action classification, however it is infrequent to consider them both. Currently, available multimodal-based methods of action recognition suffer from some limitations, including non-end-to-end training, violent fusion and inefficiency. In this paper, we propose a novel joint deep learning (JDL) model which is capable of: 1) jointly optimizing the object of classification and feature extraction through a novel end-to-end two-stream deep learning model, 2) refining common-specific features via introducing the constraint of similarity loss in high-level, and 3) using 2D convolution kernel instead of 3D convolution kernel during feature extraction for gaining the efficiency. The experiments on two challenging datasets show the promising performance of our architecture.
Date of Conference: 09-12 December 2018
Date Added to IEEE Xplore: 25 April 2019
ISBN Information:
Print on Demand(PoD) ISSN: 1018-8770