Abstract:
Recently, very deep two-stream ConvNets have achieved great discriminative power for video classification, which is especially the case for the temporal ConvNets when tra...Show MoreMetadata
Abstract:
Recently, very deep two-stream ConvNets have achieved great discriminative power for video classification, which is especially the case for the temporal ConvNets when trained on multi-frame optical flow. However, action recognition in videos often fall prey to the wild camera motion, which poses challenges on the extraction of reliable optical flow for human body. In light of this, we propose a novel method to remove the global camera motion, which explicitly calculates a homography between two consecutive frames without human detection. Given the estimated homography due to camera motion, background motion can be canceled out from the warped optical flow. We take this a step further and design a new architecture called Saliency-Context two-stream ConvNets, where the context two-stream ConvNets are employed to recognize the entire scene in video frames, whilst the saliency streams are trained on salient human motion regions that are detected from the warped optical flow. Finally, the Saliency-Context two-stream ConvNets allow us to capture complementary information and achieve state-of-the-art performance on UCF101 dataset.
Date of Conference: 25-28 September 2016
Date Added to IEEE Xplore: 19 August 2016
ISBN Information:
Electronic ISSN: 2381-8549