Paper
4 March 2022 Joint motion context and clip augmentation for spatio-temporal action detection
Xurui Ma, Xiang Zhang, Chengkun Wu, Chuanfu Xu, Jie Liu, Zhigang Luo
Author Affiliations +
Proceedings Volume 12084, Fourteenth International Conference on Machine Vision (ICMV 2021); 120840T (2022) https://doi.org/10.1117/12.2623446
Event: Fourteenth International Conference on Machine Vision (ICMV 2021), 2021, Rome, Italy
Abstract
This paper endeavors to leverage spatio-temporal visual cues to improve video-based action detection. As a result, a NOn-Local Action detector based on anchor-free called NOLA is proposed, which is built off a recent moving center detector (MOC) and further extends it by efficiently aggregating long-range spatio-temporal information. In detail, a significantly efficient spatio-temporal motion-aware non-local block is explored to provide global motion contexts for the entire predictive branches of MOC. This byproduct can make the large batch samples run on a resource limited device. Besides, a light-weighted data augmentation method termed clip augmentation designed for video-based tasks is proposed, which serves to improve the generalization ability of the detector with economical scale-and-addition operation. NOLA works with two above schemes in real-time as well. Experiments on two benchmark datasets show that NOLA significantly exceeds MOC. Compared to other existing methods,, NOLA reaches the state-of-the-art, in terms of video-level mean of average precision (video mAP).
© (2022) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Xurui Ma, Xiang Zhang, Chengkun Wu, Chuanfu Xu, Jie Liu, and Zhigang Luo "Joint motion context and clip augmentation for spatio-temporal action detection", Proc. SPIE 12084, Fourteenth International Conference on Machine Vision (ICMV 2021), 120840T (4 March 2022); https://doi.org/10.1117/12.2623446
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Sensors

Video

RGB color model

Optical flow

Data modeling

Feature extraction

Visualization

Back to Top