A Multimodal Frame Sampling Algorithm for Semantic Hyperlapses with Musical Alignment | IEEE Conference Publication | IEEE Xplore

A Multimodal Frame Sampling Algorithm for Semantic Hyperlapses with Musical Alignment


Abstract:

Producing visually engaging and semantically meaningful hyperlapses presents unique challenges, particularly when integrating an audio track to enhance the watching exper...Show More

Abstract:

Producing visually engaging and semantically meaningful hyperlapses presents unique challenges, particularly when integrating an audio track to enhance the watching experience. This paper introduces a novel multimodal algorithm to create hyperlapses that optimize semantic content retention, visual stability, and the alignment of playback speed to the liveliness of an accompanying song. We use object detection to estimate the semantic importance of each frame and analyze the song's perceptual loudness to determine its liveliness. Then, we align the most important segments of the video—where the hyperlapse slows down—with the quieter parts of the song, signaling a shift in attention from the music to the video. Our experiments show that our approach outperforms existing methods in semantic retention and loudness-speed correlation, while maintaining comparable performance in camera stability and temporal continuity.
Date of Conference: 30 September 2024 - 03 October 2024
Date Added to IEEE Xplore: 18 October 2024
ISBN Information:

ISSN Information:

Conference Location: Manaus, Brazil

Contact IEEE to Subscribe

References

References is not available for this document.