Loading [a11y]/accessibility-menu.js
YOLOv5 with Mixed Backbone for Efficient Spatio-Temporal Hand Gesture Localization and Recognition | IEEE Conference Publication | IEEE Xplore
Scheduled Maintenance: On Monday, 27 January, the IEEE Xplore Author Profile management portal will undergo scheduled maintenance from 9:00-11:00 AM ET (1400-1600 UTC). During this time, access to the portal will be unavailable. We apologize for any inconvenience.

YOLOv5 with Mixed Backbone for Efficient Spatio-Temporal Hand Gesture Localization and Recognition


Abstract:

Spatio-temporal Hand Gesture Localization and Recognition (SHGLR) refers to analyzing the spatial and temporal aspects of hand movements for detecting and identifying han...Show More

Abstract:

Spatio-temporal Hand Gesture Localization and Recognition (SHGLR) refers to analyzing the spatial and temporal aspects of hand movements for detecting and identifying hand gestures in a video. Current state-of-the-art approaches for SHGLR utilize large and complex architectures that result in a high computational cost. To address this issue, we present a new efficient method based on a mixed backbone for YOLOv5. We decided to use it since it is a lightweight and one-stage framework. We designed a mixed backbone that combines 2D and 3D convolutions to obtain temporal information from previous frames. The proposed method offers an efficient way to perform SHGLR on videos by inflating specific convolutions of the backbone while keeping a similar computational cost to the conventional YOLOv5. Due to its challenging and continuous hand gestures, we conduct experiments using the IPN Hand dataset. Our proposed method achieves a frame mAP@0.5 of 66.52% with a 6-frame clip input, outperforming conventional YOLOv5 by 7.89%, demonstrating the effectiveness of our approach.
Date of Conference: 23-25 July 2023
Date Added to IEEE Xplore: 22 August 2023
ISBN Information:
Conference Location: Hamamatsu, Japan

References

References is not available for this document.