Abstract:
Multi-pose Human action recognition (HAR) is a pivotal task in computer vision with applications spanning surveillance, human-robot interaction (HRI), behaviour analysis,...Show MoreMetadata
Abstract:
Multi-pose Human action recognition (HAR) is a pivotal task in computer vision with applications spanning surveillance, human-robot interaction (HRI), behaviour analysis, etc. When applying the HAR method to the robotic platform, traditional methods often struggle with variations in pose, occlusions, and dynamic backgrounds. In this work, we propose a deep Convolutional Long Short-Term Memory (ConvLSTM) architecture for multi-pose human action recognition on a dynamic dataset RVD24, HRI-centric self-developed in the lab environment. The proposed model is comprised of 4 stacked layers of ConvLSTM with LeakyRelu and BatchNormalisation followed by a fully connected layer and two dense layers. The Softmax layer was used to predict the 24 action categories at the end of the model. Here, the challenge is how precisely human actions are recognised to interact with robots further when deployed on a robot operating system (ROS) based platform. The proposed deep ConvLSTM method leverages the spatial hierarchies captured by convolutional layers and the temporal dependencies handled by LSTM layers, making it adept at recognising complex actions across varying poses and environments. We evaluated this model on the RVD24 dataset with 24 action categories, demonstrating its robustness with 82.12% accuracy. This model achieved notable improvements in recognition accuracy compared to state-of-the-art ConvLSTM models on benchmark action recognition datasets. Our findings suggest that the proposed deep ConvLSTM-based framework is highly effective for multi-pose human action recognition, offering a reliable solution for real-world robotics applications.
Date of Conference: 08-11 September 2024
Date Added to IEEE Xplore: 10 December 2024
ISBN Information: