Abstract:
Video memorability is a cornerstone in social media platform analysis, as a highly memorable video is more likely to be noticed and shared. This paper proposes a new fram...Show MoreMetadata
Abstract:
Video memorability is a cornerstone in social media platform analysis, as a highly memorable video is more likely to be noticed and shared. This paper proposes a new framework to fuse multi-modal information to predict the likelihood of remembering a video. The proposed framework relies on late fusion of text, visual and motion features. Specifically, two neural networks extract features from the captions describing the video’ s content; two ResNet models extract visual features from specific frames, and two 3DResNet models, combined with Fisher Vectors, extract features from the video’ s motion information. The extracted features are used to compute several memorability scores via Bayesian Ridge regression, which are then fused based on a greedy search of the optimal fusion parameters. Experiments demonstrate the superiority of the proposed framework on the MediaEval2019 dataset, outperforming the state-of-the-art.
Date of Conference: 19-22 September 2021
Date Added to IEEE Xplore: 23 August 2021
ISBN Information: