Authors:
Albert Jiménez
1
;
Lluís Gómez
2
and
Joan Llobera
1
Affiliations:
1
IZI C/casp 40 Ppal 1, 08010 Barcelona, Spain
;
2
Computer Vision Center, Campus UAB, Edifici O, 08193 Cerdanyola del Vallès, Spain
Keyword(s):
Automated Video Edition, Computer Vision, Synchronized Recordings, Multi-camera Recordings, Camera Selection, Attention Mechanism, Pointer Networks.
Abstract:
We propose a computer vision model that paves the road towards a system that automatically creates a video of a live concert by combining multiple recordings of the audience. The automatic edition system divides the edition problem in three parts: synchronize recordings with media streaming technology, selection of the scene cut position, and the selection of the next shot among the different contributions using an attention-based shot prediction model. We train the shot prediction model using camera transitions in professionally-edited videos of concerts, and evaluate it with both an accuracy metric and a human judgement study. Results show that our system selects the same video source as the ground truth in 38.8% of the cases when challenged with a random number of possible sources ranging between 5 and 10. For the rest of the samples, subjective preference among the selected image and the ground truth is at chance level for non-experts. Image editing experts do reflect better-than
-chance performance, when asked to predict the following shot selected.
(More)