Abstract:
Spatio-temporal Hand Gesture Localization and Recognition (SHGLR) refers to analyzing the spatial and temporal aspects of hand movements for detecting and identifying han...Show MoreMetadata
Abstract:
Spatio-temporal Hand Gesture Localization and Recognition (SHGLR) refers to analyzing the spatial and temporal aspects of hand movements for detecting and identifying hand gestures in a video. Current state-of-the-art approaches for SHGLR utilize large and complex architectures that result in a high computational cost. To address this issue, we present a new efficient method based on a mixed backbone for YOLOv5. We decided to use it since it is a lightweight and one-stage framework. We designed a mixed backbone that combines 2D and 3D convolutions to obtain temporal information from previous frames. The proposed method offers an efficient way to perform SHGLR on videos by inflating specific convolutions of the backbone while keeping a similar computational cost to the conventional YOLOv5. Due to its challenging and continuous hand gestures, we conduct experiments using the IPN Hand dataset. Our proposed method achieves a frame mAP@0.5 of 66.52% with a 6-frame clip input, outperforming conventional YOLOv5 by 7.89%, demonstrating the effectiveness of our approach.
Date of Conference: 23-25 July 2023
Date Added to IEEE Xplore: 22 August 2023
ISBN Information: