Abstract:
Virtual cinematography refers to automatically selecting a natural-looking normal field-of-view (NFOV) from an entire 360^{\circ} video. In fact, virtual cinematography...Show MoreMetadata
Abstract:
Virtual cinematography refers to automatically selecting a natural-looking normal field-of-view (NFOV) from an entire 360^{\circ} video. In fact, virtual cinematography can be modeled as a deep reinforcement learning (DRL) problem, in which an agent makes actions related to NFOV selection according to the environment of 360^{\circ} video frames. More importantly, we find from our data analysis that the selected NFOVs attract significantly more attention than other regions, i.e., the NFOVs have high saliency. Therefore, in this paper, we propose an attention-based DRL (A-DRL) approach for virtual cinematography in 360^{\circ} video. Specifically, we develop a new DRL framework for automatic NFOV selection with the input of both the content, and saliency map of each 360^{\circ} frame. Then, we propose a new reward function for the DRL framework in our approach, which considers the saliency values, ground-truth, and smooth transition for NFOV selection. Subsequently, a simplified DenseNet (called Mini-DenseNet) is designed to learn the optimal policy via maximizing the reward. Based on the learned policy, the actions of NFOV can be made in our A-DRL approach for virtual cinematography of 360^{\circ} video. Extensive experiments show that our A-DRL approach outperforms other state-of-the-art virtual cinematography methods, over the datasets of Sports-360 video, and Pano2Vid.
Published in: IEEE Transactions on Multimedia ( Volume: 23)