Loading [MathJax]/extensions/MathZoom.js
Exploring CNN-Based Architectures for Multimodal Salient Event Detection in Videos | IEEE Conference Publication | IEEE Xplore

Exploring CNN-Based Architectures for Multimodal Salient Event Detection in Videos


Abstract:

Nowadays, multimodal attention plays a significant role in many machine-based understanding applications, computer vision and robotic applications, such as action recogni...Show More

Abstract:

Nowadays, multimodal attention plays a significant role in many machine-based understanding applications, computer vision and robotic applications, such as action recognition or summarization. In this paper, we present our approach to the problem of audio-visual salient event detection based on visual and audio modalities by employing modern Convolutional Neural Network (CNN) based architectures. In this way, we extend our previous work, where a hand-crafted frontend was examined, an energy based synergistic approach, where a nonparametric classification technique was used for the classification of salient vs. non-salient events. Our comparative evaluations over the COGNIMUSE database [1], consisting of movies and travel documentaries, as well as ground-truth data denoting the perceptually mono- and multimodal salient events, provided strong evidence that the CNN-based approach for all modalities (i.e., audio, visual and audiovisual), even in this task, manages to outperform the hand-crafted frontend in almost all cases, accomplishing really good average results.
Date of Conference: 10-12 June 2018
Date Added to IEEE Xplore: 30 August 2018
ISBN Information:
Conference Location: Aristi Village, Greece

Contact IEEE to Subscribe

References

References is not available for this document.