Loading [a11y]/accessibility-menu.js
AudioVisual Video Summarization | IEEE Journals & Magazine | IEEE Xplore

AudioVisual Video Summarization


Abstract:

Audio and vision are two main modalities in video data. Multimodal learning, especially for audiovisual learning, has drawn considerable attention recently, which can boo...Show More

Abstract:

Audio and vision are two main modalities in video data. Multimodal learning, especially for audiovisual learning, has drawn considerable attention recently, which can boost the performance of various computer vision tasks. However, in video summarization, most existing approaches just exploit the visual information while neglecting the audio information. In this brief, we argue that the audio modality can assist vision modality to better understand the video content and structure and further benefit the summarization process. Motivated by this, we propose to jointly exploit the audio and visual information for the video summarization task and develop an audiovisual recurrent network (AVRN) to achieve this. Specifically, the proposed AVRN can be separated into three parts: 1) the two-stream long-short term memory (LSTM) is used to encode the audio and visual feature sequentially by capturing their temporal dependency; 2) the audiovisual fusion LSTM is used to fuse the two modalities by exploring the latent consistency between them; and 3) the self-attention video encoder is adopted to capture the global dependency in the video. Finally, the fused audiovisual information and the integrated temporal and global dependencies are jointly used to predict the video summary. Practically, the experimental results on the two benchmarks, i.e., SumMe and TVsum, have demonstrated the effectiveness of each part and the superiority of AVRN compared with those approaches just exploiting visual information for video summarization.
Published in: IEEE Transactions on Neural Networks and Learning Systems ( Volume: 34, Issue: 8, August 2023)
Page(s): 5181 - 5188
Date of Publication: 25 October 2021

ISSN Information:

PubMed ID: 34695009

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.