Loading [a11y]/accessibility-menu.js
A Space Information-Enhanced Dense Video Caption for Indoor Human Action Recognition | IEEE Conference Publication | IEEE Xplore

A Space Information-Enhanced Dense Video Caption for Indoor Human Action Recognition


Abstract:

Dense video captioning tasks are used to detect interesting events and provide descriptive text for these events from untrimmed videos. This technology has the potential ...Show More

Abstract:

Dense video captioning tasks are used to detect interesting events and provide descriptive text for these events from untrimmed videos. This technology has the potential to be used in security surveillance and human care applications. However, current methods often overlook the relationships between objects in the video, which limits their applicability and makes it challenging to adapt them to specific domains, such as video summarization for indoor human activities. In these scenarios, human activities are closely intertwined with the objects in the scene. In this paper, we propose a plug-and-play module designed to enhance existing dense video captioning methods with spatial information. Specifically, we extract spatial information about the interesting objects using Red-Green-Blue-Depth (RGB-D) images and the results of image segmentation. We then integrate this information into the captions generated by the Dense Video Captioning (DVC) method using a fine-tuned Large Language Model (LLM). We evaluate the performance of our model on a custom dataset and demonstrate that our system provides a convenient and effective approach for obtaining space-enhanced captions.
Date of Conference: 12-14 January 2024
Date Added to IEEE Xplore: 04 September 2024
ISBN Information:
Conference Location: Shanghai, China

Contact IEEE to Subscribe

References

References is not available for this document.