Conferences >2024 8th International Confer...

A Space Information-Enhanced Dense Video Caption for Indoor Human Action Recognition

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Dense video captioning tasks are used to detect interesting events and provide descriptive text for these events from untrimmed videos. This technology has the potential ...Show More

Metadata

Abstract:

Dense video captioning tasks are used to detect interesting events and provide descriptive text for these events from untrimmed videos. This technology has the potential to be used in security surveillance and human care applications. However, current methods often overlook the relationships between objects in the video, which limits their applicability and makes it challenging to adapt them to specific domains, such as video summarization for indoor human activities. In these scenarios, human activities are closely intertwined with the objects in the scene. In this paper, we propose a plug-and-play module designed to enhance existing dense video captioning methods with spatial information. Specifically, we extract spatial information about the interesting objects using Red-Green-Blue-Depth (RGB-D) images and the results of image segmentation. We then integrate this information into the captions generated by the Dense Video Captioning (DVC) method using a fine-tuned Large Language Model (LLM). We evaluate the performance of our model on a custom dataset and demonstrate that our system provides a convenient and effective approach for obtaining space-enhanced captions.

Published in: 2024 8th International Conference on Robotics, Control and Automation (ICRCA)

Date of Conference: 12-14 January 2024

Date Added to IEEE Xplore: 04 September 2024

ISBN Information:

DOI: 10.1109/ICRCA60878.2024.10649311

Conference Location: Shanghai, China