Predicting audio-visual salient events based on visual, audio and text modalities for movie summarization | IEEE Conference Publication | IEEE Xplore